Long-term prediction of academic achievement of American, Chinese, and Japanese adolescents

Representative samples of 729 American, Chinese, and Japanese 1st graders were given achievement and cognitive tests. Mothers were interviewed. Ten years later, 475 of the students participated in a follow-up study in which they were interviewed and given achievement tests. Results revealed high stability of achievement relationships within all 3 societies. Measures of early cognitive abilities were consistently related to the families' socioeconomic status and exerted their influence on later achievement either through 1st-grade achievement scores or through evaluations made by their mothers. The percentage of variance in achievement scores accounted for by the path models was between 49% and 59% at 1st grade and between 38% and 51% at 1 lth grade. Despite statistical differences in mean scores on the achievement tests, the associations between early predictors and later achievement were similar in the 3 cultural groups, indicating that differences in mean scores may not be accompanied by differences in interrelationships.

from different cultures included a small sample of American and Japanese children who were followed through elementary school (Hess et al., 1986). We know of no comparative studies that examined the prediction of academic achievement over longer periods of time. Such research is necessary to clarify the roles of basic psychological processes and of environmental and social factors in influencing academic achievement.
On the basis of analyses of large sets of data from various ethnic and racial groups in the United States, Rowe, Vazsonyi, and Flannery (1994) have found much similarity in developmental processes across racial and ethnic groups. The relevance of such research to our understanding of the universality of developmental processes during childhood and adolescence is limited, however, by the fact that as Americans the various groups shared many common experiences resulting from their attendance at school, exposure to the media, and residence in similarly structured communities. An approach that should reduce these problems would involve studying persons residing in societies with markedly different characteristics.
In this study, we examine data from a 10-year longitudinal study involving American, Chinese, and Japanese students. We investigated the contributions of cognitive abilities, home environment, parental evaluations, and academic achievement that were measured at 1st grade to the prediction of achievement in mathematics, reading, and general information measured at 11th grade. We focus on these early predictors of adolescents' academic achievement both because they have been shown to be related to concurrent academic achievement in different cultural groups (e.g., Stevenson et al., 1990;Stevenson et al., 1985) and because they have been found to be effective in predicting academic achievement of American adolescents. Stevenson and Newman (1986), for example, found that scores on cognitive tests given before kindergarten and parents' and teachers' evaluations of children's abilities when the children were in preschool were related to students' academic achievement at 10th grade. Other researchers have found that adolescents' academic achievement could be predicted from measures of early home environment (Bradley, Caldwell, & Rock, 1988) and parenting style (Halpern-Felsher, 1994). 1 A recent report of a 20-year longitudinal study (Baydar, Brooks-Gunn, & Furstenberg, 1993) provides additional evidence of the importance of both early family environment and cognitive abilities in the prediction of literacy of young adults.
This report focuses on three issues related to the prediction of high school students' academic achievement. First, we examine whether children's early cognitive abilities are equally effective in predicting later achievement in three markedly different cultures. In an earlier cross-sectional study of the relations between cognitive abilities and mathematics and reading achievement of 1st graders, Stevenson et al. (1985) found that the measures of cognitive ability predicted students' concurrent test scores equally well in all three cultures. The present study extends the earlier findings to evaluate long-term relations between scores on the cognitive tests and on tests of academic achievement of these same students when they were in 1 lth grade.
A second purpose of this study is to examine long-term relations between early home environment and later academic achievement. The aspects of early home environment included (a) demographic characteristics, such as parental education and family structure; (b) home environment (i.e., parental involvement in the child's academic activities and provision of a supportive and stimulating environment); and (c) parents' evaluations of their children's academic achievement. Factors such as these have been found to be related to achievement in various other studies (e.g., Bradley et al., 1988;Dornbusch & Wood, 1989;Parsons, Adler, & Kaczala, 1982;Seginer, 1983).
Third, we attempt to determine whether early predictors (i.e., children's cognitive abilities and home environment) are the same for three different types of achievement. Measures of achievement in three areas were included: mathematics, reading, and general information. In general, knowledge of mathematics and reading are acquired primarily through formal school instruction. General information, which includes knowledge about cultural-historical events and scientific phenomena, can be obtained both in and out of school. The inclusion of these three areas should provide a broad indication of the students' levels of achievement.

Participants
The study was conducted in the metropolitan areas of Minneapolis, Minnesota (United States), Taipei (Taiwan), and Sendai (Japan). In 1980, representative samples of approximately 240 1st graders were selected in each metropolitan area. We first selected representative samples of 10 schools in each location and then selected two lst-grade classrooms at random from within each school. We then selected 6 boys and 6 girls at random from each classroom. This procedure yielded 241 American, 241 Chinese, and 247 Japanese participants. (A detailed description of the participants and analyses of lst-grade data are available in Stevenson et al., 1985, andStevenson et al., 1990. Analyses of some of the llth-grade data are presented in Stevenson et al., 1993.) A follow-up study was conducted 10 years later, when the participants were in the 11th grade. Participation varied according to location: We were able to gain the cooperation of 213 American, 169 Chinese, and 93 Japanese students. (A small number of the follow-up participants were not enrolled in school: 15 Chinese and 6 Japanese.) The major reason for the low participation of the Japanese was their dedication to studying for the impending college entrance examinations. They and their parents were reluctant to take out-of-school time for participation in the study. Others did not participate because they were deceased, ill, or could not be found.
To evaluate whether the 1 lth-grade samples were representative of the original lst-grade samples, comparisons were made between the follow-up and noncontinuing original participants in terms of their lst-grade scores on tests of mathematics, reading, and cognitive abilities, as well as other key variables used in this study (e.g., parental education, home environment, parental evaluations; see a later section for definitions of these variables). These comparisons revealed that, with one exception, there were no statistical differences in Taipei or Sendai between the follow-up sample and those who did not participate in the llth-grade study. The exception was that, in Sendai, the follow-up sample had a slightly higher level of parental education than the students we could not follow up, f(234) = 2.04, p < .05. Although many of the original Japanese participants refused to participate on the grounds that they were spending long hours after school preparing for the college entrance examination they would take the following year, we were able to obtain a follow-up sample that was not systematically biased. For American participants, however, the 28 students we were unable to include in the follow-up study scored statistically lower on the achievement and cognitive ability tests than did those of the 213 follow-up students, rs(239) = 2.19 to 3.40, ps < .05. The effect size, d, ranged from .40 to .55. Similarly, the former tended to get lower evaluations from their parents and tended to come from homes with less ideal environments, fs(221 and 232) = 2.37 and 2.42, ps < .05. The levels of parental education, however, did not differ between the two groups.
All analyses in this report were based on the sample for whom both 1st-and llth-grade data were available. Table 1 summarizes the major demographic characteristics of the sample at both times. Slightly more American and Chinese girls than boys participated in the follow-up study, whereas in Japan the reverse was the case. American parents had the highest levels of education and Chinese parents had the lowest. Because most parents had finished their own schooling by the time their child was in first grade, there was little change in the parents' average educational level during the 10-year period. The biggest change in family demographics was the increase in the number of mothers who were employed. In terms of their marital status, the percentage of married parents was higher among the two Asian groups than among the Americans at both times of the data collection.

Test Materials
The mathematics tests given at 1st and 11th grades and the reading test given at 1st grade were based on detailed analyses of textbooks used in the three locations (see Stevenson & Bartsch, 1992;. The elementary school mathematics test had 54 items arranged in order of difficulty. Some items required only computation; others required the application of mathematical principles to word problems. All 1st graders were tested individually in one-on-one sessions in which the examiner read each problem to the student in order to avoid confounding mathematics ability and reading ability. The criterion for stopping the test was four successive items answered incorrectly. The Cronbach alphas for the mathematics test were .93 (Minneapolis), .93 (Taipei), and .92 (Sendai).
The llth-grade mathematics test contained 46 items ranging from ones that were quite easy, such as knowledge of percentages, to those that were very difficult, such as finding the intersection of two-and three-dimensional figures. Each student worked independently on the test and a 40-min time limit was imposed. All questions on the llth-grade tests were open-ended, and were scored as 0 (incorrect) or 1 (correct). Reliabilities of the test were high in all three cultures, with Cronbach alphas ranging from .86 to .95. To explore potential differences in early predictors of different types of mathematics knowledge, we also analyzed the data according to two subtests: general mathematics (including 21 items on arithmetic and algebra) and advanced mathematics (including 25 items on geometry, trigonometry, and calculus). The internal consistency of these two sub-tests was as high as that of the whole test: Cronbach alphas ranging from .83 to .95. Furthermore, the two subtests were highly intercorrelated: rs = .87 (Minneapolis), .90 (Taipei), and .79 (Sendai).
First graders were given a reading test consisting of three sections: reading aloud a list of words, reading aloud meaningful text, and comprehension of text. The three portions of the test were highly intercorrelated, rs ranging from .83 to .96. The overall coefficients of concordance were .94 (Minneapolis), .91 (Taipei), and .93 (Sendai). It is apparent that the test is highly reliable. Only data from the comprehension test, however, were used in this study because the 1 lth-grade reading test was one of reading comprehension. Three types of comprehension items were included: (a) phrases or sentences describing one of three pictures; (b) sentences in which certain key words were omitted, but three alternatives were available; and (c) paragraphs about which questions were asked.
Eleventh graders were asked to read five paragraphs and answer three questions about each paragraph. Easy paragraphs ranged from an Aesop fable appropriate for elementary school children to a complex paragraph from War and Peace. The contents of the tests were judged by colleagues in each country to be familiar to students in their culture. The internal consistencies of the 1 lthgrade reading tests were only moderate: Cronbach as = .50 in Minneapolis, .54 in Taipei, and .48 in Sendai.
The general information test tapped students' knowledge about everyday life, cultural and historical events, geography, and basic physical science that was likely to be acquired through everyday interactions as well as through explicit instruction in school. Examples of the questions are "Why did the Egyptians build pyramids?" "What causes an eclipse?" "Why do blankets keep us warm?" "In which continent is Ethiopia?" These questions were chosen from a large number of potential questions submitted to colleagues in each culture for evaluation of their relevance and appropriateness for students in all their culture. The final test for the elementary school level contained 26 items; the llth-grade test, 12. The test was administered to the 1st graders in one-on-one testing and the child's responses were recorded by the tester. The 11th graders were given the test and the students wrote their answers on an answer sheet.
All general information tests were coded independently by two coders, both of whom were native speakers of the language in which the test was given. Responses were scored as 2 (correct), 1 (partially correct), and 0 (incorrect). When a disagreement arose between the two coders, a final score was decided at a group meeting involving coders from all three cultures. Reliability statistics for the general information test ranged from .79 to .91 for the elementary school version; at the 1 lth-grade level, they were high for the American (.82) and Chinese samples (.80), but somewhat lower for the Japanese sample (.67).
Nine culturally appropriate cognitive tests were constructed (for detailed information, see Stevenson et al., 1985). Some tests were adapted from well-known intelligence tests, and others were devised especially for this study. The tests included coding (applying a code in which a series of nine symbols represented the numerals from 1 to 9), spatial relations (choosing a shape from among four alternatives to fit into a target shape), perceptual speed (matching line drawings with one of four alternatives), auditory memory (reconstructing a sequence of atonal sounds of different duration), serial memory for words (recalling lists of words), serial memory for numbers (recalling lists of numbers), verbal-spatial representation (following verbal directions for drawing lines or shapes), verbal memory (answering questions about a story just read to the child), and vocabulary (defining words). Some tests were timed (usually 2-3 min); for others, testing was discontinued following the failure of a child to give correct answers to three or four successive questions. The tests were scored according to a clear and detailed scoring system. Reliabilities were high for coding, spatial relations, perceptual speed, auditory memory, verbalspatial representation, and vocabulary; Cronbach alphas ranged from .71 to .98 (M = .83). Reliabilities for verbal memory were lower: .63 (Americans), .55 (Chinese), and .57 (Japanese). The small number of items in serial memory for words and numbers precluded computation of the reliability statistics for those tasks. These nine cognitive tests were combined into three summary scores: performance tests (i.e., coding, spatial relations, auditory memory, and perceptual speed), verbal tests (i.e., verbal memory, vocabulary, serial memory for numbers, and serial memory for words), and verbal-spatial test (i.e., verbal-spatial representations). These three summary scores were moderately correlated with one another (mean r = .44, with a range of .35 to .54, ps < .001).

Interviews
Mothers were interviewed at both times of the data collection. The following information from the interviews is used in this report: 1. Demographic information included parents' educational level, parental occupation, and family structure.
2. Home environment information included parental involvement in the child's academic activities and their provision of a supportive and stimulating environment. When the students were in first grade, mothers were asked whether either of the parents read to their child regularly, had taught their child the alphabet, words, phrases, or sentences, and read newspapers daily; whether they expected their child to attend college; and whether the mother had taken academic or nonacademic lessons during the past year. The internal consistency of this 9-item checklist was satisfactory for one group and modest for the other two: Cronbach as = .53 (Americans), .71 (Chinese), and .46 (Japanese). This may be a reflection of the multifaceted nature of home environment. Therefore, to identify potential aspects of home environment that are especially relevant to children's learning, data based on the checklist were analyzed both according to individual items and to aggregate levels.
3. Parental evaluation information included, at the time of the first data collection, mothers' comparisons of their children with others of the same age in terms of school performance, potential for future academic achievement, achievement motivation, and intellectual abilities. This 11-item scale had high internal consistency: Cronbach as = .89 (Americans), .90 (Chinese), and .87 (Japanese). Table 2 shows the means and standard deviations for the achievement tests given at 1st and 1 lth grades. Chinese and Japanese students at both grades received significantly higher mathematics scores than the American students (Scheffe contrasts, ps < .001). Further analyses of the 1 lth-grade mathematics scores showed that the Chinese and Japanese students' advantage was not limited to any particular domain of mathematics (see Figure 1). They scored higher than did their American counterparts on a vast ma- 44.61*** 7.37** 1.64 Note. Sets of 3 (culture) X 2 (gender) analyses of variance were conducted. Two significant gender differences were found: At first grade, girls scored higher on the reading test than did boys, F(l, 469) = 7.45, p < .01, and at 1 lth grade, boys scored higher than girls on general information, F(\, 433) = 14.46, p < .001. None of the Culture X Gender interactions were significant. *p<.05 **p < .01 ***p<Ml. jority of the items, whether they represented general or advanced levels of mathematics. As is evident in Table 2, cross-cultural differences among the reading scores were smaller; Chinese students received higher scores than Japanese students in 1st grade, and both American and Chinese students received significantly higher scores than the Japanese students at both grades (Scheff6 contrasts, ps < .05). Scores on the general information test also differed among the lst-grade students from the three locations: American children received higher scores than did Chinese children (Scheffe" contrast, p < .05). There were no significant differences among the llth-grade students in the three locations.

Intercorrelations of Achievement Tests
Although the results from the mathematics, reading, and general information tests showed different patterns among the three locations, all were highly intercorrelated within each location at each testing period: rs ranged from .48 to .71, with a mean of .56 (all ps < .001). Furthermore, with only one exception, scores on all three tests given at first grade were significantly related to scores on the three tests given at 1 lth grade (see Table 3). This level of interrelations among different domains of academic achievement and its stability over a 10-year period are even more noteworthy when we consider the moderate level of reliability of some of the tests. Taken together, these results strongly suggest that a relatively stable common factor lies behind the level of performance on all three types of achievement tests in all three locations.

Cognitive Abilities and Later Achievement
As Stevenson et al. (1985) have reported, the scores of the students from the three locations differed on the individual cognitive tests, but the differences did not consistently favor one location over another. American children received the highest scores on verbal memory and vocabulary, Chinese children on serial memory for numbers and coding, and Japanese children on spatial relations and auditory memory.
Within each cultural group, the aggregated standardized scores for the cognitive tests were significantly related to first-grade scores on all three achievement tests. The correlations ranged from .30 to .63, with a mean of .46 (all ps < .001). As Table 4 shows, children's cognitive abilities at 1st grade continued to be predictive of their academic achievement at 1 lth grade. In fact, most of the correlations were significant at the .001 level. We further examined whether the three aspects of cognitive abilities were differentially related to the two subtests of llth-grade mathematics. Results showed that the correlations with the subtest of general mathematics were only slightly higher than those with the subtest of advanced mathematics: The differences in correlations ranged from .00 to .06, not statistically significant based on the t test after Fisher r-to-z transformation. Such a lack of differentiation is perhaps due to the high relations found between scores on the two subtests of mathematics as well as the close relations found among the three summary scores of cognitive tests.

Family Factors
Demographic factors. Table 5 shows the correlations between family demographic factors measured at 1st grade and achievement scores at 1 lth grade. Paternal educational and occupational status and mothers' educational status were statistically related to achievement scores in each location and for each type of achievement. With one exception (birth order in Sendai), other demographic factors were unrelated to achievement scores in Minneapolis or Sendai. The relation was significant in Taipei, however, for several of these variables. Further analyses revealed that this finding was due to a close relation in Taipei between these factors and families' socioeconomic status (SES) as indexed by a summary score of parental education and father's occupation. For example, the correlation between number of siblings and family SES was -.41 (p < .001) in Taipei, but only -.06 in Minneapolis, and .00 in Sendai.
Home environment. The average score on the checklist of factors reflecting the quality of the home environment at 1st grade did not differ statistically according to location (p > .05): Ms = 4.9 (Americans), 4.5 (Chinese), and 5.1 (Japanese). There were few systematic relations between individual home environment variables measured at lstgrade and llth-grade achievement (see Table 6). The only consistently significant variable was parental aspirations for their child's education. Although the relations between other home environment variables and later achievement were relatively unsystematic, correlations between the summary home environment score and scores on achievement tests were statistically significant for all three tests in all three locations.
Parental evaluations. American mothers rated their first graders' academic abilities and potential higher than did the two groups of Asian mothers (Scheffe contrasts, ps < .05). The mean ratings on the 9-point scale were 6.2 (Minneapolis), 5.8 (Taipei), and 5.6 (Sendai), F(2, 448) = 16.44, p < .001. Within each group, mothers' ratings when the children were in first grade were statistically related to the children's achievement 10 years later. The correlations for all three groups and all three tests ranged from .32 to .51, with a mean of .40 (all ps < .01).

Path Analyses
The analyses presented above help to identify some of the major predictors of later achievement. A more readily accessible organization of these relations is available through path analyses. These analyses were conducted to answer three questions: (a) How much of the variance in 11thgraders' achievement was accounted for by the early predictors? (b) Did factors such as early home environment have any unique long-term effects in addition to their effects exerted on children's earlier academic achievement? (c) What were the possible "paths" the early predictors might take in influencing later achievement? Because of the high correlations among the three types of achievement and the similar patterns of correlations between them and the early predictors, scores on the three types of achievement were combined (averaged standardized scores) for use in the path analyses. Also, the total score on cognitive tests was used.
LISREL 8 was used for the path analyses. Because no prior hypotheses were made to eliminate any possible paths, a saturated model was applied to data for each location. Results are summarized in Figure 2. Between 49% and 59% of the variance in 1st graders' achievement scores and between 38% and 51% of the variance in 11th graders' achievement scores were accounted for by the models based on the lst-grade measures. The most consistent path found among all three groups was the one in the following sequence: family SES -* cognitive abilities -* achievement at 1st grade -» achievement at 11th grade. The effects of family SES, early home environment, and early cognitive ability on later achievement were all indirect, exerting their .  effects either through parental evaluations or through achievement scores at 1st grade. Other than lst-grade achievement, the only one of the early predictors that made a unique contribution to later achievement was the evaluations made by the American and Chinese mothers.
Although there were some differences among the three cultural groups in the statistical significance of several path coefficients (e.g., paths from home environment to lstgrade achievement were statistically significant in Minneapolis and Sendai, but not in Taipei, and paths from home environment to parental evaluations were significant in Minneapolis and Taipei, but not in Sendai), such differences were actually quite small. We used LISREL 8 to test the equality of regression slopes for the three cultural groups in terms of all paths to lst-grade achievement and those to 1 lth-grade achievement. Results showed that data from the three cultural groups fit the same model for lst-grade achievement, x 2 (8, N = 395) = 5.69, p = .68. The same was true for predicting 1 lth-grade achievement, x 2 (12, N = 395) = 8.62, p = .74.

Discussion
The results of this study extend the discussion of Rowe et al. (1994) concerning the similarity of developmental processes among different racial and ethnic groups. Rather than include members of the same general culture as was the case in their research, we studied the processes in three widely different cultures: American, Japanese, and Chinese. Not only did the cultures differ, but the level of academic achievement of students in the three locations also differed markedly. Nevertheless, there were great similarities among the three societies in the prediction of later academic First, for all cultural groups, the three different domains of achievement measured in this study-mathematics, reading, and general information-were closely interrelated, suggesting the existence of a common underlying factor. The primary candidate is the children's cognitive ability. Path analyses suggested that this factor, which is statistically related to the family's SES, was the most effective single predictive variable in all three cultures. Second, the early predictors were equally effective in predicting later achievement in the three locations. More specifically, three of the four factors measured at first grade (i.e., family SES, home environment, and cognitive abilities) did not have long-term, unique effects on later achieve-ment in any of the three locations. Rather, they exerted their effects through children's first-grade achievement or through parental behavior related to their evaluations of their children's abilities.
Third, the effective long-term prediction of adolescents' academic achievement was evident in the finding that from 38% to 51% of the variability in llth-graders' achievement scores could be accounted for by measures obtained when the students were in 1st grade. Possible bases of such high predictability in academic achievement are hard to evaluate, for they may be a result of environmental influences or of psychological factors such as intelligence (see Sameroff, Seifer, Baldwin, & Baldwin, 1993). It should be pointed out, however, that despite the relative stability and predictability of achievement and after the explained variance and error variance due to imperfect measures have been considered, at least a substantial portion of the variance of 1 lthgrade achievement was accounted for by variables other than those included in this study.
These cross-cultural similarities in developmental processes involved in academic achievement may account for the persistence of cross-cultural differences in level of academic achievement among different populations. In other words, when children in different cultures display different levels of achievement at an early age (in this case, at first grade) and when the influence of social and psychological factors are similar, it is little surprise that these children differ in their achievement at a later age. The implications of these results appear to be that, to close the "learning gap" (Stevenson & Stigler, 1992), intervention must start when children are at an age when cross-cultural differences in achievement are just beginning to emerge. By revealing the cross-cultural similarities in developmental processes and the relative stability in academic achievement both across time and across cultures, this study not only extends a long tradition of research on the stability of human characteristics (see, e.g., Bloom, 1964) beyond the cultural boundaries, but also raises new research questions. For example, what are the mechanisms behind the cross-cultural similarities in the stability of achievement relationships, especially when the cultures apparently differ in the factors in the larger environment that would seem to be related to children's achievement? Do the same factors that account for stability of individual differences within a cultural group account for stability of differences across cultures?
We should also mention that we intended to examine the domain-specificity in long-term prediction of achievement in three different cultures. Given the strong pattern of generality across the three domains of achievement (mathematics, reading, and general information), as well as the moderate reliability of some specific subtests, this study could not address such a question adequately. Moreover, our attempt to evaluate the differential predictability for the two types of mathematical knowledge (general vs. advanced mathematics) was hampered by the high interrelation between these two indices of mathematical knowledge. It seems, therefore, that in subsequent studies new efforts will be necessary for examining domain specificity in the face of strong intercorrelations among measures.