Language experience shapes fusiform activation when processing a logographic artificial language: An fMRI training study

: The significant role of the left midfusiform cortex in reading found in recent neuroimaging studies has led to the visual word form area (VWFA) hypothesis. This hypothesis suggests that years of experience reading native language change the visual expertise of this region to be especially sensitive to the visual form of native language. The present study aimed at testing this hypothesis by exploring the role of language experience in shaping the fusiform activation. We designed a logographic artificial language (LAL) using the visual form and pronunciation of Korean Hangul characters (but their correspondence was shuffled) and assigning arbitrary meanings to these eScholarship provides open access, scholarly publishing services to the University of California and delivers a dynamic research platform to scholars worldwide. The significant role of the left midfusiform cortex in reading found in recent neuroimaging studies has led to the visual word form area (VWFA) hypothesis. This hypothesis suggests that years of experience reading native language change the visual expertise of this region to be especially sensitive to the visual form of native language. The present study aimed at testing this hypothesis by exploring the role of language experience in shaping the fusiform activation. We designed a logographic artificial language (LAL) using the visual form and pronunciation of Korean Hangul characters (but their correspondence was shuffled) and assigning arbitrary meanings to these characters. Twelve native Chinese Mandarin speakers (6 male and 6 female, 18 to 21 years old) with no prior knowledge of Korean language were trained in the visual form of these characters for 2 weeks, followed by 2 weeks each of phonological and semantic training. Behavioral data indicated that training was effective in increasing the efficiency of visual form processing and establishing the connections among visual form, sounds, and meanings. Imaging data indicated that at the pre-training stage, subjects showed stronger activation in the fusiform regions for LAL than for Chinese across both one-back visual matching task and the passive viewing task. Visual form training significantly decreased the activation of bilateral fusiform cortex and the left inferior occipital cortex, whereas phonological training increased activation in these regions, and the right fusiform remained more active after semantic training. Increased activations after phonological and semantic training were also evident in other regions involved in language processing. These findings thus do not seem to be consistent with the visual-expertise-induced-sensitivity hypothesis about fusiform regions. Instead, our results suggest that visual familiarity, phonological processing, and semantic processing all make significant but different contributions to shaping the fusiform activation.


Introduction
Benefited from the development of neural imaging techniques, one striking advance in our understanding of language representation in the brain is the discovery of left midfusiform cortex's involvement in reading. The activation of this region has been consistently reported across various kinds of reading tasks, as well as across different language systems (for reviews, see Bolger et al., 2005;Cohen and Dehaene, 2004;Feiz and Petersen, 1998;Jobard et al., 2003;Price, 2000;Xue et al., 2005). With the increase of reading skills, this region becomes more critical in the recognition of printed words (Booth et al., 2001;Shaywitz et al., 2002;Turkeltaub et al., 2003). Children with reading difficulties have abnormal fusiform function compared to their normal counterparts (see Habib, 2000 for a review). These findings have triggered the reevaluation of the neuropsychological data, as well as the revision of the morethan-one-century-old neural model of reading by incorporating the left midfusiform region in the reading network (e.g., Jobard et al., 2003;Price, 2000).
Despite the widespread consensus on the fusiform's involvement in visual language processing, there are also debates on its function and on its being labeled as the visual word form area (VWFA) by Cohen and his colleagues (Cohen et al., 2000(Cohen et al., , 2002. The debates are carried out on two interconnected fronts. The first is related to the functional computation that is implemented in VWFA. It is suggested that VWFA is responsible for featureinvariant (like location, size, font, color and case), pre-lexical, visual word recognition, i.e., the extraction of abstract visual word form. Others tend to suggest the VWFA might also be involved in lexical, multimodal word processing (Kronbichler et al., 2004;Hillis et al., 2005 for most recent neuropsychological results), or in integrating phonology and visual information during both word and picture processing (Price and Friston, 2005).
Another line of controversy is related to the functional properties of the left midfusiform cortex. By labeling this area as visual word area, it implies that neurons in this region have some specific functional properties that are especially suitable for visual word processing. Cohen and his colleagues provide two major lines of evidence: word-specific sensitivity and case-invariant computation (see Cohen and Dehaene, 2004 for a review). Several studies have reported word-or letter-sensitive response in the left ventral visual system by contrasting words with false fonts (Petersen et al., 1990), words or pseudowords with consonant strings or false fonts (Cohen et al., 2002;Price et al., 1994Price et al., , 1996, letters with digits (Polk and Farah, 2002; also see Cohen and Dehaene, 2004 for a review).
However, the link of these findings to the word-specific sensitivity hypothesis in VWFA is less clear. First, existing results do not seem to show a consistent picture of the location of the socalled word-sensitive region, which varied across studies from extrastriate cortex (e.g., Petersen et al., 1990) to the midfusiform cortex (Cohen et al., 2002) and to the occipitotemporal area (Allison et al., 1994). Second, some studies did not reveal a wordsensitive region in the left ventral visual system, by using either passive viewing tasks (e.g., Indefrey et al., 1995Indefrey et al., , 1997 or oneback matching tasks (Tagamets et al., 2000). These results also suggest that task difficulty is an important factor that needs to be further explored when examining the word sensitivity hypothesis. Third, Cohen and his colleagues proposed that portion of the fusiform might be tuned to be sensitive to the whole words (e.g., Cohen and Dehaene, 2004;Dehaene et al., 2005), which is not consistent with the stronger activation in the midfusiform area for pseudowords than for words (see Mechelli et al., 2003 for a review). Finally, because a wide neural network of the classical language areas was activated even in simple implicit reading tasks, the difference between words and pseudowords in fusiform activation might reflect the modulation of semantics and phonology (Price et al., 1996). For the same reason, it is hard to attribute the different activation between words/pseudowords and consonant strings to the orthographic constraints per se because they differ in semantics and phonology as well as in orthography.
Regarding the case-insensitive processing in VWFA, Cohen and Dehaene (2004) showed: (1) the VWFA responses were equally robust to words in upper-case (''TABLE''), lower-case (''table'') or even in mixed case (''tAbLe'') format; and (2) the VWFA showed repetition priming regardless whether the two words were printed in the same or in different case (e.g., ''table'' followed by ''TABLE'') (Dehaene et al., 2001). This functional property is certainly beyond the generic principle of invariant-view in the ventral visual cortex (Riesenhuber and Poggio, 1999), which leads Cohen and his colleagues to argue that the cross-case priming likely reflects the cultural constraint and the effect of language experience.
But it is not conclusive what may contribute to the crosscase repetition priming effect. Because the two words share the same phonology and semantic identity, it is possible that the priming effect occurs at the phonological and/or semantic level, but not at the pre-lexical abstract visual word level. Consistent with this view, previous research also showed cross-language (Chee et al., 2003) and cross-script (Nakamura et al., 2005) priming effect in the fusiform region, but as well as in several other language areas. Thus, the exact mechanisms for how language experiences modulate the priming effect in fusiform cortex need to be elucidated.
To summarize, existing evidence raises questions about the VWFA hypothesis' claim on how language experiences shape the midfusiform activation in visual word processing. Particularly, two major questions need to be further explored. First, though the word-sensitivity hypothesis has been tested under certain conditions (e.g., compared with nonwords or false fonts, using passive-viewing tasks with brief stimulus exposure such as 100 ms), it is not clear whether this can be extended to other experimental conditions (e.g., different visual matches, longer stimulus exposure, and using comparison tasks). Second, pertaining to the idea of visual-expertise-induced sensitivity to words in the VWFA hypothesis, it is important to disentangle the role of visual familiarity, phonology and semantics in shaping the VWFA activation. Previous results from word -nonword comparisons or the priming paradigm most likely reflect a combined effect of those factors. The present study aimed at addressing these questions with two major methodological considerations.
The first consideration is to find an ideal visual match for one's native language in testing the language-sensitivity hypothesis. Existing literature with alphabetic languages generally used word/ pseudoword vs. nonword comparisons (e.g., Cohen et al., 2002;Petersen et al., 1990;Price et al., 1994Price et al., , 1996. In the case of Chinese language, however, this strategy seems to be less effective. The orthographic regularity of Chinese characters largely depends on the positional regularity of the lexical radicals, and a common way to construct Chinese nonwords is to put the radicals in illegitimate positions (Chen et al., 1996). There are several limitations. First, the radicals in the nonwords are still familiar units for subjects, which will affect the pattern of visual processing (Chen et al., 1996). Second, most radicals in Chinese characters convey semantic (semantic radicals) or phonological (phonological radicals) information, which may be activated during the processing of nonwords. Third, each Chinese character is a well-designed figure, and the change of positional regularity may destruct the harmony and integrity of the character, which may also influence the cognitive and neural process. The last point might also apply to alphabetic scripts. Consequently, we decided to use Korean Hangul characters, and compared them to Chinese characters. Korean Hangul characters are logographic, formed hierarchically with strokes and units (Fig. 1). The high extent of similarity in spatial patterns between Hangul characters and Chinese characters enables a strict match in terms of the visual integrity and visual complexity (i.e., number of strokes, units and spatial organization).
Another consideration is how to disentangle the role of visual familiarity, phonology and semantics, which are mixed in the comparisons between native language characters and foreign characters. Unlike the natural reading acquisition, in which visual form, phonology and semantics are usually taught all at once, this study adopted an artificial language training paradigm: We first trained subjects with the visual form then added in the phonology and semantics. We hoped this would help to partly disentangle these effects on visual word processing and midfusiform activation.
Following previous studies, a passive viewing task was administered across the training stages, but the duration of presentation was extended to 750 ms to enable full processing of these characters (Indefrey et al., 1995(Indefrey et al., , 1997. The same paradigm with Chinese characters was included in the same scan session as a control task to account for the possible instability of the MRI measurement across times/sessions (Poldrack, 2000). At the pretraining stage, we also used a one-back visual matching paradigm to explore the effect of task difficulty. With this design, the comparison of the activation between Chinese and LAL in the pretraining scans was used to reexamine the word-sensitivity hypothesis, whereas the training data would be used to explore the role of different aspects of language experience in shaping fusiform activation.

Subjects
Twelve Chinese Mandarin-speaking college students (6 male and 6 female), aged from 18 to 21 years, with normal or correctedto-normal eyesight, were recruited for this experiment. All were strongly right-handed as judged by the handedness inventory developed by Snyder and Harris (1993). None of them had any formal knowledge of Korean language. They gave written consent according to the guidelines set by the MRI Center at Beijing 306 Hospital. One hundred and twenty Chinese characters and 120 Korean Hangul characters were selected for this study. The Chinese stimuli were all high-frequency characters (higher than 90 per million according to the Chinese word frequency dictionary) with 3 -9 strokes, and 2 -3 units according to the definition by Chen et al. (1996). The Hangul characters were strictly matched with Chinese characters in visual complexity (i.e., number of strokes and units).

Materials and cognitive tasks
In order to facilitate the comparison between the subjects' native language (Chinese) and a foreign language, we decided to design a logographic artificial language (LAL) that matched the Chinese characters in terms of two important aspects that may affect the neural correlates of language processing: the visual pattern and the grapheme-phonology correspondence (GPC) rules (e.g., Paulesu et al., 2000;Siok et al., 2004). Chinese characters usually consist of several strokes that are packed into a square shape. They map onto meaningful morphemes rather than phonemes, and thus do not follow the GPC rules (Siok et al., 2004). With such considerations, the LAL was created by borrowing the writing and sound 1 of 120 Hangul characters, but the visual forms were not paired with their original pronunciation to avoid the GPC rules that are obvious in Korean Hangul characters (Taylor and Taylor, 1995). Some of these characters (64 characters) were used for semantic training, and they were assigned with an arbitrary meaning. These 64 characters were used for a passive viewing task administered four times across the whole training sessions (pre-training and after each of the three types of training). The rest of the stimuli that were matched in visual complexity and word frequency (Chinese characters' word frequency) with the above 64 characters were used for the one-back visual matching task administered only at the pre-training stage.
Three different fonts were used in this visual form training: gulim, gungsuh and a handwritten font written by a research assistant. The sounds of the characters were recorded from four native Korean speakers (two males, two females). All the sounds were normalized to the same length (600 ms) and loudness. Sixtyfour pictures showing the meaning of the characters and their corresponding Chinese translations were used for the semantic training. All characters were assigned concrete nouns, with half belonging to natural category (e.g., sun), and the others belonging to manmade category (e.g., desk). were administered before LAL training. Chinese and Korean blocks were arranged into one scanning session, and the sequence of the two kinds of blocks was counter-balanced (c). Training was divided into three stages (d), i.e., visual form training, phonological training and semantic training. Please note that only passive viewing task was administered at the end of each training stage. ''S'' represents fMRI scan.
Training procedure

Visual form training
The visual form training program included 20 h of training over 2 weeks, with 2 h per day and 5 days per week. On each training day, subjects were required to finish six sessions of a delayed matching task and one writing task. For the delayed matching task, 120 Korean (LAL) characters were randomly organized into 80 pairs. In half of the 80 pairs, the two characters were identical, but in the other half, they were not. Subjects were asked to decide whether the two characters sequentially presented were the same or not. Furthermore, the three types of fonts (i.e., gulim [A], gungsuh [B], and a handwritten font [C]) allowed for six ways to present the pairs of characters: AA, BB, CC, AB/BA, BC/CB and AC/CA. There was one training block for each kind of font pair. For the writing task, subjects were asked to copy all 120 characters three times. These manipulations would help subjects to acquire the abstract visual form of these characters. With the progress of the training, the difficulty of the delayed matching task was gradually increased by decreasing the presentation duration while increasing the between-stimulus interval.

Phonological training
The phonological training was administered at the second stage of the training after 2 weeks of visual form training. This training also lasted for 2 weeks with 2 h per day and 5 days per week. At the early stage of this training, subjects were asked to carefully listen, imitate, and compare their own pronunciation with the standard one, which proved to be very helpful for them to acquire the correct pronunciations of the LAL characters. After that, dictation (selecting a character corresponding to the sound they heard) and fast-naming tasks were introduced to increase the automaticity of connections between the sound and the visual form.

Semantic training
Subjects were trained to learn the semantics of 64 Hangul characters whose visual form and sounds are now familiar to the subjects. Because semantic learning was found to be easier in a pilot study and the number of characters to be trained was decreased, the semantic training program was shortened to 10 h over 2 weeks (1 h per day and 5 days per week). Several types of learning tasks were designed, including pairassociation (i.e., Chinese -LAL association, and LAL -picture association), forced choice (i.e., selecting one from two Chinese characters or pictures that matches the meaning of the LAL character, or selecting one from two LAL characters that matches the meaning of the picture or the Chinese character), free recall (i.e., translating from LAL to Chinese), and cued-recall (i.e., picture naming, translating from Chinese to LAL).

Behavioral tasks administered before training and after each training stage
We adopted a simultaneously presented same-different judgment task (Chen et al., 1996;Eichelman, 1970) to examine the behavioral effect of training. Participants were asked to decide whether the paired characters were identical or different. This task was able to reflect the efficiency in visual analysis and recognition (Henderson, 1974).
The procedure for the behavioral task was as follows: First, a pair of stimuli appeared in the central positions. Subjects pressed the right ''Shift'' key on the keyboard to indicate a ''yes'' response (i.e., they match), and pressed the left ''Shift'' key to indicate a ''no'' response (i.e., they do not match). The characters would disappear after subjects' response. If no responses were made in 3 s after stimuli's presentation, the stimuli would also disappear. In either case, the next set of stimuli would appear after a 1-s interval. Prior to the main experiment, there were 10 pairs of practice stimuli for each task. The LAL and Chinese sessions were administered separately, and the order was counterbalanced across subjects. Subjects performed the tasks four times, once before the training and once at the end of each training stage.
Naming task and semantic judgment task were administered at the end of phonological training and semantic training respectively to ensure that subjects had successfully acquired the phonology and semantics of these characters. During the naming task, subjects were asked to name the characters presented on the screen as soon as possible. If no responses were made in 3 s after stimuli's presentation, the stimuli would disappear. The next set of stimuli would appear after a 1-s interval. The reaction time and voice responses were recorded by the program. Two research assistants evaluated the correctness of the naming results independently according to the correct pronunciation assigned to each character. When the two evaluators disagreed on the correctness of a particular response, they discussed together to reach an agreement. The semantic judgment task was administered while subjects were undergoing an fMRI scan. Each character was presented in the center of the screen for 1650 ms, followed by a blank of 750 ms. Subjects were asked to press a button with their right thumb if they thought the presented word describes a man-made artifact and to press a button with their left thumb if it describes a natural kind.

fMRI tasks, paradigm and parameters
Subjects were scanned four times, one before the training, and one right after each of the three training stages (with a maximum delay of 2 days). Block design was used for both passive viewing task and one-back visual-matching task. In each scanning session, four 24-s Chinese blocks and four 24-s Korean blocks were arranged into one scanning session, and the sequence of the two types of blocks was counterbalanced. Each experimental block was preceded by an 18-s control block. At the beginning of the scanning session, there was a 15-s fixation to allow for stability in magnetization, and these images were excluded from analysis. At the end of each scanning session, there was a 9-s fixation to compensate for the delay of haemodynamic response. The total scanning session lasted 6 min.
Stimuli were programmed with DMDX on an IBM-compatible laptop and were projected onto a translucent screen via a projector. Subjects viewed the stimuli through a mirror attached to the head coil. The stimuli were presented in black color on white background. In the experimental block, each character was presented in the center of screen for 750 ms, followed by a blank of 750 ms. In the passive viewing task, subjects were asked to silently view the word and no behavioral responses were required. In the one-back visual matching task, subjects were required to continuously judge whether the present character was identical to the previous one. Subjects indicated a ''yes'' response by pressing the button corresponding to the thumb of their right hand, and a ''no'' response by pressing the button corresponding to the thumb of their left hand. In the control blocks, a fixation cross was presented and no overt responses were required.

Scanning parameters
The scans were performed on a 2.0 T GE/Elscint Prestige whole-body MRI scanner (Elscint Ltd., Haifa, Israel) with a standard head coil at the MRI Center of Beijing 306 Hospital. Single-shot T2*-weighted gradient-echo, EPI sequence was used for the functional imaging acquisition with the following parameters: TR/TE/u = 3000 ms/60 ms/90-, FOV = 375 Â 210 mm, matrix = 128 Â 72, and slice thickness = 6 mm. Eighteen contiguous axial slices parallel to AC -PC line were obtained to cover the whole cerebrum and partial cerebellum. The anatomical MRI was acquired using a T1-weighted, three-dimensional, gradient-echo pulse-sequence. The parameters for this sequence were: TR/TE/u = 25 ms/6 ms/28-, FOV = 220 Â 220 mm, matrix = 220 Â 220, and slice thickness = 2 mm. Eighty-nine axial slices parallel to AC -PC line were acquired to provide a high-resolution of the anatomy of the whole brain.

Data analysis
Image preprocessing and statistical analyses were performed with Statistical Parametric Mapping (SPM2, Wellcome Department of Cognitive Neurology, London, UK) which is implemented in Matlab (Mathworks Inc. Sherborn, Mass., USA). The first 5 images were excluded from analysis. Functional images were realigned, unwarped, normalized to MNI template (Friston et al., 1995), and smoothed with an 8 mm FWHM Gaussian filter. General linear model was used to estimate the condition effect of individual participants (Friston et al., 1994). Boxcar convolved with HRF was selected as reference function.
We first contrasted each visual task during the pre-training scans with fixation for each individual subject. A conjunction analysis based on the procedure suggested by Nichols et al. (2005) was performed at the group level to examine the neural network with significant activation for both tasks in both languages. The main effect of task (one-back visual matching vs. passive viewing) and language (Chinese vs. LAL) and their interaction at the pre-training scan were examined by defining proper contrasts for each subject. These contrasts were inputted into a random effect model in order to compute the group effect.
To explore the training effect, we also did the following subtraction at the individual level to minimize the effect caused by repeated measure or time (Poldrack, 2000;Poldrack and Gabrieli, 2001): (LAL after -Chinese after ) À (LAL before -Chinese before ). Group results were computed with a random-effects model. In order to avoid the negative effect caused by deactivation in subtraction, the analysis of training effect was masked with the group-averaged activation maps. That is, for training-induced increases in neural activation, brain regions significantly activated in the after-training task were adopted as the inclusive mask, whereas for the training-induced decreases in neural activation, brain regions significantly activated at the before-training stage were used. Results were corrected for multiple comparisons (P < 0.05) using false discovery rate (FDR, Genovese et al., 2002). For the effect of visual form training, we used P < 0.001(uncorrected) as the threshold because the FDR is too conservative when the overall signal is weak (Genovese et al., 2002). All activated locations were converted from MNI to Talairach space (Talairach and Tournoux, 1988) with MNI2TAL tool (http://www.mrc-cbu. cam.ac.uk/Imaging/Common/downloads/MNI2tal/).

Behavioral performance during the pre-training fMRI scans
There was no behavioral index for passive viewing task. For one-back visual matching, we recorded behavioral responses while subjects were performing the task in the scanner. Paired t tests revealed that subjects did significantly better in their native language (i.e., Chinese) than in a new language (i.e., LAL) in terms of both reaction time (RT) (522 ms vs. 540 ms; t (11) = 3.59, P < 0.005) and accuracy (88% vs. 81%; t (11) = À2.51, P < 0.05).

The acquisition of names and semantics of the LAL characters after training
Behavioral performance in the naming task and the semantic judgment task after phonological training and semantic training respectively indicated that subjects had a good mastery of the phonology and semantics of the LAL characters. The accuracy for LAL was around 86% in the naming task and 80% in the semantic judgment task. As expected, subjects' fluency in LAL was not as high as that in Chinese. Paired sample t test indicated that in the naming task, subjects performed much better in Chinese than in LAL in terms of both accuracy (98% vs. 86%; t (11) = 4.27, P < 0.001) and RTs (751 ms vs. 1326 ms; t (11) = À20.72, P < 0.001). The same was true for the semantic judgment task: Accuracy for Chinese and LAL were 92% and 80%, respectively, t (11) = 4.75, P < 0.001; and corresponding RTs were 846 ms and 1130 ms, t (11) = À9.59, P < 0.001).

Efficiency of visual form processing across all the training stages
Training also significantly increased the efficiency in processing the visual form of the LAL characters. Fig. 2 shows the behavioral changes in the simultaneous same-different judgment task administered across different training stages. Overall, the reaction time decreased quickly during the visual form training stage, and remained relatively constant in the following training stages. Training (before vs. after training) by language (LAL vs. Chinese) ANOVA of RT showed that during the visual form training stage, there were significant main effects of training (F(1,11) = 43.86, P < 0.001) and language (F(1,11) = 87.29, P < 0.001). There was also a significant language-by-training interaction (F(1,11) = 49.43, P < 0.001), suggesting the increased behavioral performance for LAL characters is not merely a general increase in motor response or setting up of a task-specific routine (Karni and Sagi, 1993) that can be transferred across tasks in different languages. ANOVA of accuracy revealed a similar pattern (language: F(1, 11) = 3.88, P = 0.07; training: F(1, 11) = 4.81, P < 0.05; interaction: F(1, 11) = 2.88, P = 0.12).
After phonological training, analysis of RT showed no significant change from that measured after visual form training (F(1, 11) = 1.13, P = 0.31), but the main effect of language was still salient (F(1, 11) = 53.16, P < 0.001). Meanwhile, there was a marginal interaction between language and training (F(1, 11) = 3.53, P = 0.09). Analysis of the accuracy showed a significant training effect (F(1, 11) = 89.69, P < 0.001), but the main effect of language and the interaction were not significant.
After the semantic training stage, there was a significant decrease in RT (F(1, 11) = 4.72, P < 0.05) compared to that after phonological training. The main effect of language in RT was significant (F(1, 11) = 49.43, P < 0.001). Meanwhile, we found a slight decrease in accuracy (F(1, 11) = 3.74, P = 0.08), suggesting the decrease in RT might be due to speed-accuracy tradeoff. No other effects were significant.
To examine the combined effect of phonological and semantic training, we contrasted the behavioral performance measured after semantic training and that measured after visual form training. Analysis of RT showed significant language effect (F(1, 11) = 99.86, P < 0.001), but no training effect (F(1, 11) = 0.022, P = 0.885) or language by training interaction (F(1, 11) = 0.194, P = 0.668). Analysis of accuracy showed significant training effect (F(1, 11) = 19.56, P < 0.001), but no language effect (F(1, 11) = 0.037, P = 0.851) or language by training interaction (F(1, 11) = 0). The absence of interaction effect suggested phonological and semantic training did not further change the relative efficiency of LAL visual word recognition as compared to that of the Chinese characters.
In summary, behavioral data showed that our extensive training program was effective. After training, subjects generally mastered the semantics and the phonology of the LAL characters. Meanwhile, the efficiency of recognizing the LAL characters was significantly improved, but the improvement mainly occurred during the visual form training stage and the efficiency remained relatively stable thereafter.

Overall neural activation at the pre-training stage
Conjunction analysis revealed that a wide neural network consisting of the bilateral ventral visual stream, the left dorsal visual stream, and the left inferior frontal cortex were involved in the processing of both Chinese and LAL characters (Table 1 and Fig.  3a). Of particular interest to the present study, we found that the foci of left midfusiform cortex (À39, À59, À12) were very close to the so-called VWFA (À39, À57, À9) proposed by Cohen et al. (2002). This result suggests that the VWFA is not merely responsible for linguistic visual form, but also for non-linguistic stimuli (LAL).
Examination of the main effect of language revealed even stronger activation in this region for Korean Hangul than for Chinese characters (Table 2, Fig. 3b). Other regions that showed significantly more activation for Korean Hangul than for Chinese characters included the bilateral inferior and middle occipital gyrus (BA18/19), right fusiform gyrus (BA37/19), bilateral inferior parietal lobule (BA40), right superior parietal lobule (BA7), bilateral precuneus (BA7), as well as the right inferior frontal gyrus (BA46) and right thalamus. No area was found to be activated more by Chinese characters than by Korean Hangul characters.
Stronger activation was found for the one-back visual matching task than for the passive-viewing task in the bilateral fusiform cortex (Table 2, Fig. 3c). Other regions that showed stronger activation for the one-back matching task included the cingulate cortex, bilateral inferior frontal lobe (BA44/45), precentral gyrus (BA6/8), left supramarginal gyrus (BA40), right inferior parietal lobule (BA40), bilateral precuneus (BA7) and several subcortical regions. No region showed more activation in the passive-viewing task than in the one-back visual matching task.

Neural changes associated with the visual form training
Only passive viewing task was used to evaluate the training effect. Compared to pre-training activation, three regions showed significant decreases as a result of the visual form training. These included bilateral fusiform (BA37) and left inferior occipital gyrus (BA19). No significant increase was found after the visual form training (Table 3 and Fig. 4).

Neural changes associated with the phonological training
In contrast with the decreased activation resulting from the visual form training, phonological training resulted significant increases in activation across a wide neural network involved in visual word processing. Comparisons between activation after the phonological training and that after the visual form training revealed increased activation in the bilateral fusiform cortex (BA37) and left inferior occipital cortex (BA19). Increased activation could also be found in the left inferior frontal cortex (BA44), bilateral precentral cortex (BA6), the cingulate cortex (BA32/23), and the left precuneus (BA7). The insula (BA13), left caudate, and the left hippocampus also showed stronger activation after phonological training. No brain region was less active after the phonological training than after the visual form training (Table 3 and Fig. 4).

Neural changes associated with semantic training
The comparison between activation pattern right after the phonological training (before the semantic training) and that after the semantic training revealed no significant increases or decreases in any regions that survived the threshold of P < 0.001 (uncorrected). To explore the combined effects of the phonological and semantic training on visual word processing, we compared the activation map acquired after semantic training and that after visual form training. This comparison revealed additional significant increases in brain activation in the more anterior portion of left inferior frontal cortex (BA47), the right inferior frontal gyrus (BA44), the left inferior parietal lobule (BA40), the left putamen, and the left globus pallidus. No region showed a significant decrease (Table 3 and Fig. 4). In terms of the fusiform areas, only the right fusiform showed significantly increased activation. The level of activation in the left fusiform after the semantic training was somewhere between (but not significantly different from) the level after the visual word training and that after the phonological training.
It should be noted that, when directly contrasting the activation obtained after either the phonological or the semantic training with that obtained at the pre-training scan, we did not find a significant neural change in the fusiform cortex. This result was due to the opposite effects of the visual form training vis-à-vis the phonological/semantic training. Given these results, it is possible that if subjects were trained simultaneously in visual forms, phonology, and semantics, there may not be any training effects on fusiform activation.

Discussion
The present study aimed at exploring the role of language experience in shaping the left midfusiform activation in visual Fig. 3. Group-averaged results from the pre-training scans. Activation was overlaid onto a standard MNI template provided by SPM2. Clusters that survived an uncorrected P < 0.001 with spatial extent !10 contiguous voxels were considered as statistically significant. The number on each slice indicates the relative position to the anterior -posterior intercommissural line according to the MNI coordinates. The color bar indicated the t value. L: left hemisphere; R: right hemisphere. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) word processing. We found that left midfusiform showed no special sensitivity to characters in native language as compared to foreign words in both passive viewing and one-back visual matching tasks. In addition, we found that visual form training had an effect opposite of the effect of phonological and semantic training in shaping the fusiform activation: visual form training decreased the activation in bilateral fusiform cortex, whereas phonological and semantic training increased activation in these regions as well as in a wide neural network involved in language processing. These results altogether do not seem to support the VWFA hypothesis that would predict native-language sensitivity and visual form training-induced sensitivity in the fusiform cortex.
Is VWFA sensitive to words?
As discussed in the Introduction section, the finding of word sensitivity in the left midfusiform is one of the major reasons to label this area as VWFA. But the empirical evidence so far has not been consistent. In response to the mixed findings, Cohen and Dehaene (2004) argued: ''This [sensitivity to words] holds quite generally, at least in passive viewing conditions or with tasks that require equal attention to words and to consonant strings; no difference, or even a reversal, can be observed if the task is more difficult to perform with consonant strings than with words'' (P471, footnote 3).
The present study directly examined this idea using a task-bystimulus-type factorial design and a native-foreign language comparison paradigm instead of the word -nonword comparison paradigm. Inconsistent with Cohen and Dehaene's (2004) argument, the present study did not obtain a significant task-bystimulus interaction. On the contrary, our results, together with other studies (Kronbichler et al., 2004;Mechelli et al., 2003;Tagamets et al., 2000), suggest that the so-called VWFA is involved in the processing of linguistic materials and non-linguistic materials (insomuch as Korean characters can be considered as non-linguistic for Chinese speakers), and the extent of activation is G. Xue et al. / NeuroImage 31 (2006) 1315-1326 modulated by the task requirement, as shown by the stronger fusiform activation for less familiar visual forms in these studies.

Visual form training and fusiform activation
During the first 2 weeks of training, subjects were trained to have visual familiarity with the LAL characters. Behavioral data indicated significant improvement in subjects' performance on the simultaneously presented same-different judgment task of the LAL characters. Correspondingly, fMRI data revealed a significant decrease of activation in the bilateral fusiform cortex and the left inferior occipital cortex during the passive viewing task, suggesting a possible sparser representation of the newly learnt LAL word form.
Neuroimaging studies on visual perceptual learning have obtained controversial results. On the one hand, some studies obtained decreased activation in the visual cortex compared to the pre-training state in visual orientation discrimination (e.g., Schiltz et al., 1999. On the other hand, there is also ample evidence showing expertise-related increase of fusiform activation in object recognition, including face (Rossion et al., 2001), Greeble (Gauthier et al., 1999b), car and bird (Gauthier et al., 2000a), and mirrored text (Poldrack and Gabrieli, 2001;Poldrack et al., 1998).
We think various domains of visual objects used in these studies might contribute to the mixed results. Aside from this, an examination of cognitive changes with learning would also help to resolve the controversy. For example, object expertise will increase the level of categorization (i.e., from the basic level ''bird'' to the subordinate level ''sparrow''), which is a necessary factor for the increased activation in the right fusiform cortex with the increase of expertise (Gauthier et al., 1997(Gauthier et al., , 2000bTarr and Gauthier, 2000). This may help to explain the expertise effect on face processing (Gauthier et al., 1999a(Gauthier et al., , 2000b, Greeble recognition (Gauthier et al., 1999b), and car and bird processing (Gauthier et al., 2000a). Moreover, with the increase in expertise, the visual objects usually become more meaningful to the subjects, like the face (Rossion et al., 2001), car and bird (Gauthier et al., 2000a), as well as the mirror text (Poldrack and Gabrieli, 2001;Poldrack et al., 1998). The increase in meaning would modulate the visual object recognition (Palmeri and Gauthier, 2004).

Phonological and semantic training and fusiform activation
Following the visual form training, subjects also took part in a 2week training on phonology and a 2-week training on semantics of the LAL. Though behavioral measurement did not reveal obvious improvement in the efficiency of character identification, fMRI data indicated significant change of neural activation after phonological and semantic training. This suggests that the fMRI technique is more capable of detecting the experience-induced cognitive and neural changes in visual character processing than the traditional behavioral test.
Our results confirm the automatic activation of semantic and phonological information in implicit reading tasks (MacLeod, 1991;Price et al., 1996), and extend this finding to the learning of a new artificial language. Consistent with the anterior -posterior division of left inferior frontal cortex in semantic and phonological processing (Poldrack et al., 1999), we found that phonological training relative to visual form training induced significantly more activation in the precentral gyrus and par opercularis regions, and semantic training relative to phonological training caused increased activation in the anterior portion of inferior frontal gyrus (BA47). This pattern of frontal activation was also consistent with previous results on nonfluent bilinguals (Chee et al., 2001;Xue et al., 2004a,b).
We obtained significant increases in bilateral fusiform regions after phonological training relative to after visual form training, and the right fusiform cortex remained more active after semantic training. These findings signified the important impact of linguistic attributes in shaping fusiform activation, in ways different from that of visual form training. Intuitively, this increase might reflect a topdown mechanism, due to the automatic phonological and semantic processing. This explanation is consistent with the connectionist perspective (e.g., Seidenberg and McClelland, 1989), and is also consistent with results from several neuroimaging studies (e.g., Nobre et al., 1998;Price et al., 1996). Alternatively, because subjects in the present study also learnt new sounds in addition to new visual forms, our results are also consistent with the notion that the fusiform is involved in phonological processing, or in integrating high-order visual form with phonology (Price and Devlin, 2003). In fact, many existing observations might be compatible with both the top-down modulation hypothesis and the lexical processing hypothesis, including the word frequency effect (Kronbichler et al., 2004), the cross-language and cross-script priming effect (Chee et al., 2003;Nakamura et al., 2005). Further studies on patients with focused lesion in this area would help to clarify this issue.
The important role of phonology and semantics in visual word processing is further supported by existing behavioral studies. It is indicated that, in the classic word superiority paradigm (Reicher, 1969), bottleneck in word recognition is the retention/memory rather than the visual perception, because, as shown with the partial report technique, subjects can percept 9 -10 letters presented for 4 ms (Kriegman and Biederman, 1980). The word superiority effect is more pronounced by adding a visual mask right after the stimulus presentation (Johnston and McClelland, 1973). One explanation for the mask effect is that a higher-order structure, e.g., phonological and/or semantic, might help to retain the words/ pseudowords but is not available to nonwords. Consistent with this idea, when two masked presented words were homophone, the correct ratio was just around guess level (Hawkins et al., 1976).
In addition to the potential connection between fusiform and phonological processing, VWFA's assumption of fusiform's specific visual form processing has been challenged by a lack of visual expertise in visual word processing. For example, Pelli et al. (in press) found older (thus more proficient) readers did not perform better in letter identification task than younger readers, and 3-yearold children learnt ABC's just as quickly as young readers learnt foreign alphabets. Moreover, although training would increase the efficiency of letter identification (Pelli et al., in press), it develops quickly (within several h and several thousands of trials) and is limited to single-letter level. In fact, even very fluent readers cannot recognize words beyond the level of individual letters (i.e., holistic recognition), suggesting that the word-superiority effect might not be attributed to visual expertise (Pelli et al., 2003).
One may argue that the increase of activation in fusiform after phonological and semantic training might only reflect the training/ consolidation effect of visual form training. For several reasons, we think this is not likely to be the major cause. First, if the increase of activation in fusiform was only caused by extended visual form (e.g., during phonological and semantic training stages), we would expect further decrease the activation in the fusiform cortex. Although nonlinear change with training has been reported in other domains of learning, like motor sequence training (e.g., Karni et al., 1995;Ungerleider et al., 2002), but so far it has not been reported in visual perceptual training. Moreover, studies that found nonlinear changes usually reported a quick diminution of activation within 1 -2 training sessions, and followed by a long-lasting increase (e.g., Karni et al., 1995;Ungerleider et al., 2002). But the present study found decreased activation after 10 days of training. Second, the increased fusiform activation was accompanied by increased activation in a wide neural network, including the classic phonological and semantic areas, as well as the attention network. It is thus hard to attribute this response increase merely to the visual form training. Nevertheless, further studies should examine the potent consolidation effect in visual perceptual learning by examining a wider range of time-window (from as short as one or two sessions to as long as several weeks) (Kelly and Garavan, 2005). Furthermore, a between-subjects design counterbalancing the learning sequence of different components (visual form, phonology and semantics) or learning different part(s) of the LAL language (i.e., visual form vs. visual form and phonology vs. visual form, phonology and semantic) would also help to more efficiently separate the effect of each component.

Summary
Written language combines visual form, phonology, and semantics, and reading involves the compulsive co-activation and complex interactions of all components. Correspondingly, reading development involves the acquisition of these components and their connections. Consistent with these facts, the present artificial language training study for the first time shows that different aspects of the language experience (e.g., visual familiarity, phonology and semantics) actually all have important but different impacts on the neural activation in the so-called visual form area during language learning. These results emphasize the importance of taking an integrative perspective when elucidating the mechanisms of how language experience shapes brain activation.