fMRI investigation of cross-cultural music comprehension

The popular view of music as a “universal” language ignores the privileged position of the cultural insider in comprehending musical information unique to their own tradition. The purpose of this study was to test the hypothesis that listeners would demonstrate different neural activity in response to culturally familiar and unfamiliar music and that those differences may be affected by the extent of subjects’ formal musical training. Just as familiar languages have been shown to use distinct brain processes, we hypothesized that an analogous difference might be found in music and that it may depend in part on subjects’ formal musical knowledge. Using fMRI we compared the activation patterns of professional musicians and untrained controls reared in the United States as they listened to music from their culture (Western) and from an unfamiliar culture (Chinese). No overall differences in activation were observed for either subject group in response to the two musical styles, although there were differences in recall performance based on style and there were activation differences based on training. Trained listeners demonstrated additional activation in the right STG for both musics and in the right and left midfrontal regions for Western music and Chinese music, respectively. Our ﬁndings indicate that listening to culturally different musics may activate similar neural resources but with dissimilar results in recall performance.


Introduction
Music is found in all human societies. An impressive array of musical abilities relating to perception, attention, memory, and behavior develops in all normal children from a very early age without conscious effort (Trehub, 2001). These musical abilities develop through exposure to the particular melodic, harmonic, rhythmic, or timbral features of the surrounding musical culture in the same way children naturally learn language. Cultural differences in music perception and performance have been explored by ethnomusicologists and to a lesser extent by cognitive psychologists (Carterette and Kendall, 1999), but little exploration has been undertaken at the neurological level. We chose to employ fMRI methodology to explore whether listeners utilized different neural resources when interacting with music that was culturally familiar and well understood compared to music that was culturally distant. We hypothesized that subjects would exhibit a distinct "musical comprehension" response when hearing music of their own versus an unfamiliar culture much in the way that listeners have been observed to respond to a familiar versus unfamiliar language (Schlosser et al., 1998). While musical comprehension, unlike linguistic comprehension, allows for multiple interpretations of meaning, there are within a culture shared structures, gestures, and expectancies that may mediate perception and recall for the encultured listener.
The evidence from neuropsychology suggests that there are specialized neural networks dedicated to music processing (Peretz, 2001). Imaging studies have supported this view for the processing of musical pitch which involves specific areas of the right auditory cortex (Zatorre, 2001). The few neurological studies involving musical culture as a variable support our hypothesis of a distinct "first music" response. Genç and associates reported a case in which epileptic seizures presented by a 48-year-old Turkish female with no musical training were precipitated by both familiar and unfamiliar Turkish arabesques but not by other musical styles or by musical stimuli presented out of context (Genç et al., 2001). Similarly, ERP data have identified an increase in P3 amplitude, interpreted as indicating attention allocation and memory updating, among Turkish listeners when hearing a familiar instrument (ney) rather than an unfamiliar instrument (cello) (Arikan et al., 1999). Culture-specific factors outside the domain of music may also influence music cognition processes. It has been speculated that strategies employed in processing pitch may be different among speakers of tonal languages (Klein et al., 2001).
Many cultures distinguish among their own membership according to the degree of specialized musical training individuals may have received. Studies comparing musically trained and untrained subjects have identified differences in both brain function (Koelsch et al., 1999;Pantev et al., 1998;Russeler et al., 2001;Tervaniemi et al., 2001) and physiology (Schlaug et al., 1995a(Schlaug et al., , 1995b, suggesting the possibility that training may be a significant influence on the neurobiology of music. Specific training factors that have been proposed to affect neural development include the age at which training commenced (Ohnishi et al., 2001) as well as the instrument studied (Hund-Georgiadis and von Cramon, 1999;Pantev et al., 2001). However, even Western subjects with no formal musical training responded strongly to stimuli that violated rules of Western musical syntax (Koelsch et al., 2000(Koelsch et al., , 2002a, demonstrating that expertise in music listening-at least for certain tasks-may be the product of informal as well as formal experience. Conversely, it may be that an individual with advanced training in one musical culture approaches culturally unfamiliar music as a novice, similar to a fluent English-speaker encountering a language she has never studied. By studying the responses of both trained and untrained listeners to culturally familiar and unfamiliar music, we can explore the interactions of training and culture in musical understanding.

Participants
Subjects were six Western-trained professional violinists and violists (four female, two male; mean age ϭ 38.3 years) and six musically untrained listeners (four female, two male; mean age ϭ 34.2 years). Two of the female professional musicians were left-handed; examination of these subjects' scans did not reveal any differences compared to those of the right-handers. Untrained listeners were defined as individuals with fewer than 2 years of participation in an instrumental or choral ensemble and less than 1 year of private performance instruction. All subjects were native English-speakers and possessed no Cantonese language skills. None of the trained musicians possessed absolute pitch. All subjects were fully informed of the nature and procedures of the study and gave written consent for their participation in accordance with the guidelines of the University Human Subjects Division.

Stimuli
We selected six musical examples, three Western and three Chinese. Western examples were selected from the second or third movements of Alessandro Scarlatti's "Sonata Terza in C minor for Treble Recorder, Strings and Basso Continuo." Baroque music was chosen in an effort to avoid a piece in the string players' active symphonic repertoire. We wanted the musicians to be familiar with the musical tradition, but not with the specific example used. No subjects reported recognizing the piece. Chinese examples were selected from a version of the traditional piece, "Liu Qin Niang." Excerpts of between 25 and 33 s in length were edited to begin and end at a logical musical point and were generally matched for tempo, texture, and instrument type. Two slow-tempo examples (mm 60 -80) and one fast-tempo example (mm 96 -120) were chosen for each type of music. All performances featured a small instrumental ensemble of approximately 10 players that included flute-like instruments (recorder and guan), high string (violin and erhu), low string (cello and gehu), and plucked chording instrument (harpsichord and zheng). All excerpts featured a full ensemble performance and a solo melody/ accompaniment texture. No examples included percussion or vocals.
Since our hypothesis was based in part on findings from language research, we also selected six speech examples, three English and three Cantonese, so that subjects' responses in both modalities could be directly compared. In order to match speech samples to the greatest extent possible, we chose news broadcasts presented by female reporters so that the examples would have a similar narrative character and generally equivalent speaking range. English examples were selected from recent news broadcasts archived by a local National Public Radio affiliate. Cantonese examples were selected from recent news broadcasts archived by Radio Television Hong Kong. Beginning and ending points for excerpts were identified such that each relayed a complete and sensible news story. The resulting examples were between 26 and 31 s in length.
Thirty-second gaps of silence were inserted between each of the six music and six speech examples. Two presentation orders were randomly constructed for each condition and, along with spoken instructions, were burned onto an audio CD.

Scan acquisition
Structural and functional scanning were performed on a 1.5-T MR imaging system (General Electric, Waukesha, WI). Scanning included a 21-slice axial high-resolution set of anatomical images in plane with functional data. Anatomical series were followed by separate functional MRI series for music and speech using two-dimensional gradient echo echoplanar pulse sequence (TR/TE 2500/50 ms, 21 slices; 5 mm thick with 1 mm gap; 3.75 ϫ 3.75-mm inplane resolution; 158 volumes; 6:35 scan time). All subjects heard the music condition first followed by the speech condition. Subjects were randomly assigned to one of two orders within each condition. To maintain high sound quality, stimuli were delivered in stereo through purpose-built electrostatic headphones with 30-dB noise reduction that were integrated into the head coil.

Procedure
Following a structural scan, subjects completed a music scan followed by a separate speech scan. During the music scans, subjects heard three Western classical instrumental music excerpts alternated with three examples of Chinese traditional music. During the speech scans, subjects heard six alternating excerpts of English-language and Cantoneselanguage news broadcasts. For both music and speech scans stimuli were presented in a randomly determined order and were alternated with rest (magnet noise).
Prior to entering the scanner subjects completed four sample recognition items to familiarize them with the task they would complete at the conclusion of the scanning procedure. Following the scan subjects completed a poststudy recognition test in which they listened to a series of short music and speech excerpts and identified whether they were heard during the scan. Subjects were also asked to indicate the degree of confidence they had in each identification response using a 5-point scale anchored by "unsure" and "completely sure." Analysis fMRI scans were analyzed using MEDx 3.4.1 (Sensor Systems, Sterling, VA). The data were motion corrected and linear detrended, and t tests were performed contrasting the conditions within each scan, with results expressed as z scores. Each subject's z map was spatially smoothed with a 4-mm Gaussian filter and converted to standard stereotaxic space (Talairach and Tournoux, 1988) using FLIRT (www.fmrib.ox.ac.uk/fsl/). Maps showing significant activation for each subject group were generated (Bosch, 2000) and significantly activated clusters-along with size and center of activation-were identified on these group maps using a threshold of z Ն 3 (Friston et al., 1994). This approach considers the significance of activation in the voxel of interest as well as in adjacent voxels to identify a voxel as significantly activated and also corrects for multiple comparisons. In order to compare z maps across two groups, we computed standardized mean differences by calculating a z map contrasting respective values using the two-sample test statistic for comparison of means, z ϭ [(mean z1) Ϫ (mean z2))/sqrt(1/n1 ϩ 1/n2)], where mean z1 ϭ mean z map for highly trained subjects, mean z2 ϭ mean z map for untrained subjects, n1 ϭ n of highly trained group (6) and n2 ϭ n of untrained group (6). Significant (z Ն 3) clusters were identified on these difference maps.

Results
We presented musician and control subjects with audio examples drawn from two different musical traditions, one (Western classical) that was culturally familiar and one (Chinese traditional) that was culturally unfamiliar to test the hypothesis that subjects would exhibit a neurologically distinct comprehension response for familiar music. Subjects differed in their degree of musical training, allowing us to test the additional hypothesis that formal training may influence patterns of activation in response to culturally familiar and unfamiliar music. Subjects also heard portions of English and Cantonese language news broadcasts. Following the fMRI scans, subjects completed a recognition test in which they identified brief music and speech excerpts recalled from the scanning procedure.
When contrasting subjects' responses to Western classical music and Chinese traditional music, we observed no differences in activation between the conditions for either musicians or controls. Analysis of group significance maps revealed no significant clusters of activation (z Ն 3) unique to either music condition. These observations do not support the main hypothesis of the study regarding a distinct music comprehension response for culturally familiar music.
Differences based on training emerged when the control subjects' group significance maps were compared to those of musicians for the two music versus rest comparisons. Significant activation present only among the musicians was observed in the right STG for both the Western music vs rest comparison and the Chinese music vs rest comparison (Table 1 and Fig. 1 ). To examine this difference further, we analyzed the group significance maps of musicians and controls separately in response to both of the music versus rest conditions. In the Western music vs rest and Chinese music vs rest comparisons, all subjects showed significant clusters of activation in the right transverse temporal gyrus and left superior temporal gyrus (Table 1). Musicians also demonstrated additional significant clusters of activation centered in the right middle frontal gyrus in the Western music vs rest comparison and the left middle frontal gyrus in the Chinese music vs rest comparison that had not appeared in either of the previous analyses (Table 1 and Fig.  2 ). These observations support the hypothesis that formal training influences patterns of activation in response to culturally familiar and unfamiliar music.
Recognition scores on the behavioral measure were analyzed using one-tailed single-sample t tests with a hypothesized mean of 3 of a possible score of 6 (the score one would expect by chance). Results revealed that all subjects were significantly more successful at identifying the Western music excerpts ( Table 2). Analysis of confidence scores indicated that the trained performers were more assured of their responses to the Western excerpts than to the Chinese excerpts. Control subjects' mean confidence score was identical for Western and Chinese music.
When contrasting English with Cantonese speech for all subjects in averaged group maps, a significant cluster of activation was centered in the left insula bordering the STG ( Fig. 3 and Table 1). Smaller foci of activation were also observed in the left STG and MTG with limited homotopic activation in the right STG (Fig. 3). As with the music recognition scores, posttest mean speech recognition scores demonstrated that subjects were significantly more successful recognizing English speech than Cantonese speech (Table 2). Subjects' confidence in their responses was greater for the English excerpts than for Cantonese excerpts.

Discussion
The primary question of this study was whether listeners would exhibit a difference in brain activation in response to music from their own culture when compared to music from an unfamiliar culture. Data revealed that the degree of cultural familiarity did not influence the pattern of brain activation among Western-born musicians and controls. These results do not support the hypothesis that listeners demonstrate a distinct response to music constructed ac-cording to a familiar and well understood rule system. In contrast, subjects' responses to linguistic stimuli revealed clear comprehension-based differences. Since fMRI demonstrates brain activity changes on a fairly macroscopic level, it is possible that more minute differences in processing were present for the two types of music, differences that may only be revealed through measures involving more specific judgment tasks or targeted to a single region. If such differences do exist, they are clearly less robust than those involved in responses to familiar and unfamiliar languages. Successful comprehension of language is largely a right-orwrong task. Even in cases where the listener may only partially comprehend the spoken information, they are approaching a single correct interpretation. In the case of music, however, one may potentially "make sense" of it in a variety of ways. While this understanding may not be identical to that of a cultural insider, the listener may comprehend the music in a way that is satisfactory to them perhaps by imposition of a familiar, if inappropriate, rule system. In other words, music is not so much a universally understood language as a universally understandable language (Swain, 1997).
Higher recognition scores for Western music excerpts on the behavioral task suggest that cultural familiarity is an important element in music recognition even if not apparent as differences in brain activation during fMRI scanning. There are certainly musical cultures that differ more dramatically in surface characteristics than the two considered here. Yet, even when controlling as much as possible for such features as tempo, texture, and instrument type, it is clear from the recall measure that cultural unfamiliarity was sufficient to interfere with task performance. Similarly, linguistic research among bilinguals has found that similarity in neural activation patterns between first and second language does not necessarily equate with competence on behavioral tasks (Perani et al., 1998). It has been suggested that second-language tasks may be imperfectly, although satisfactorily, filtered through networks usually devoted to the primary language.
When comparing both Western and Chinese music to rest, all subjects demonstrated bilateral superiotemporal activity with more robust activation occurring on the right side. Prior research has identified areas in Heschl's gyrus adjacent and proximal to the right primary auditory cortex as critical to the processing of tonal information beyond basic pitch perception (Koelsch et al., 2002a;Zatorre, 2001). Professional musicians, however, differed from untrained controls in both the strength and the location of activation; these training-based differences were observed for both culturally familiar and unfamiliar music. This extends previous findings of differences in neural activity for trained and untrained subjects (Koelsch et al., 1999(Koelsch et al., , 2002bTervaniemi et al., 2001) to non-Western musical stimuli.
The only training-based difference that emerged in direct comparisons of subjects' music vs rest responses was a stronger activation for professional musicians in the right Table 1 Stereotaxic location (Talairach and Tournoux, 1988) (Fig. 1), supporting the importance of the right hemisphere as critical to tonal processing (Koelsch et al., 2002a;Zatorre et al., 1994;Zatorre, 2001). Pantev et al. (1998) reported expanded neural representation among highly trained musicians for musical, opposed to sinusoidal, tones. However, as Pantev employed exclusively right side audi-  tory stimulation, activation differences were reported only for the left hemisphere, differences that were not observed in the present study. For trained listeners alone, additional activity was observed in right mid-frontal areas for the Western music vs rest comparison. In a recent article, Zatorre (2001) presented a new analysis of previous data on melodic perception (Zatorre et al., 1994) in which subjects' responses when comparing the first two pitches of an unfamiliar melody were contrasted with their responses when comparing the first and last pitch of an unfamiliar melody. For both conditions, Zatorre observed an array of frontal activation with particular concentrations of activity in the right frontal regions, confirming the involvement of the right frontal lobe in the storage and retrieval of tonal information in working memory. Specific differences in locus of activation were attributed to the contrast in memory load between the twonote and first/last-note conditions. The present results might be interpreted as reflecting the trained listeners' use of tonal information to identify characteristics of the culturally familiar excerpts they heard. They might have relied on this stored information regarding features such as melodic structure to aid in the recall of excerpts during the posttest recognition task. Although prior research in this area was conducted among untrained listeners, the researchers specifically directed subjects to complete pitch-related tasks (Zatorre et al., 1994). In the present study, subjects were engaged only in focused attending for later recall, leaving the strategies to be employed up to each subject. Musicians, then, perhaps as a result of their extensive training in such areas as tonal analysis and aural skills, may have more quickly turned to assessment of tonal relationships as a strategy to facilitate recall.
In contrast, the left mid-frontal activation observed for the Chinese music vs rest comparison is difficult to interpret, given the association of this region with verbal working memory, although it may reflect subjects' attempts to apply verbal descriptors to less familiar musical stimuli. Among 13-to 14-year-old students, left frontotemporal activity has been associated with verbal, rather than activitybased, music instruction (Altenmueller et al., 1997). The differences in frontal activation found in both music vs rest comparisons did not appear in the direct comparisons of trained and untrained listeners responses to music vs rest, suggesting that any strategic difference may be one of degree rather than kind. This is consonant with recent findings suggesting that many fundamental processes of music cognition are similar for all normal humans regardless of training (Bigand and Pineau, 1997;Bigand et al., 1999;Regnault et al., 2001).
One of the challenges of examining cross-cultural musical responses is separating the response to cultural differences from responses to surface musical differences. Indeed, a listener may interpret deeply imbedded musical differences as a mere surface difference-for example, identifying a pitch as "slightly out of tune" rather than as a part of an unfamiliar tonal hierarchy-and attempt to accommodate it within his or her own system of musical understanding. In the present study we chose to match examples from the two cultures for tempo, general instrument timbre, and texture for purposes of experimental control. All examples were instrumental to avoid the confounding linguistic effect of lyrics. Examination of more extreme contrasts of musical culture might yield a clearer noncomprehension response, but could be confounded by reactions to surface differences in the music. Future research should focus on examining neural activity during specific musical judgments designed to better isolate points of difference between musical cultures, and in so doing exploring the limits of cross-cultural musical understanding and accommodation.