English vocabulary trajectories of students whose parents speak a language other than English: steep trajectories and sharp summer setback

In this study, I used individual growth modeling methods to examine the English word-learning trajectories of adolescent students (N = 278) whose parents speak English at home (n = 210) and those whose parents speak a language other than English (n = 68). Sixth- (n = 130) and seventh-grade (n = 148) students attending an urban middle school took part in the study, with each student contributing up to four occasions of vocabulary-achievement data across three school years. I used the group reading and diagnostic evaluation (GRADE), a 40-item, group-administered assessment to measure vocabulary achievement. Students also provided information about the amount of time they spent reading independently during the summer and during the school year. Principal predictor variables included days between assessments, student home language, student free and reduced lunch status, time spent independent reading, and a dummy variable for the number of summers experienced between testing periods. On average, middle-school students experienced a loss of vocabulary over the summer, however students who spoke a language other than English at home had more pronounced summer setback and steeper learning trajectories, even when controlling for well-known predictors of vocabulary like independent reading and predictors of summer loss like free and reduced lunch status. These findings corroborate research showing low-income students experience summer loss, but suggest that in urban schools serving mostly low-income students, home-language status may be a stronger predictor of summer loss than socio-economic status or reading amount.


Introduction
Vocabulary knowledge becomes increasingly important as children enter secondary school (Snow, Porche, Tabors, & Harris, 2007) and is a key component of skilled adolescent reading (Kamil, 2003;Snow & Biancarosa, 2003). English vocabulary may be a particularly strong predictor of academic outcomes for second-language learners reading in English (García, 1991;Hutchinson, Whiteley, Smith, & Connors, 2003;Kieffer & Lesaux, 2008), and providing instruction to help older ELLs develop academic language has been an increasing focus of interventionists (August, Branum-Martin, Cardenas-Hagan, & Francis, 2009;Proctor et al., 2009;Snow, Lawrence, & White, 2009;Townsend & Collins, 2009;Vaughn et al., 2009). Despite these advances in understanding how to improve vocabulary instruction for students from language-minority homes, there are still significant gaps in our knowledge about the trajectories of vocabulary growth of older students across middle-school grades. There is evidence that during summer months students from low-income homes regress in their vocabulary knowledge (Heyns, 1978), but it is not clear what the predictors of growth and summer loss are in urban schools, which in America often serve a majority of students from low-income families and also large numbers of students who speak a language other than English at home. In this research, I examined how well-known predictors of vocabulary development (base-line reading achievement, free and reduced lunch status, and time spent reading independently) relate to word-learning trajectories in this sample. Additionally, I used individual growth modeling and the multilevel model for change to investigate how language-minority home (LMH) (n = 68) and English-home (EH) (n = 210) adolescents learn words during the summer and school year controlling for other more well-researched predictors.
Background and context Biemiller and Slonim (2001) studied differences in the number of words known by children in kindergarten through fifth grades (N = 108). They found that knowledge of lower and higher frequency words grew at each grade level, and argued that the order of words learned was roughly consistent across children. They found that there were differences in word learning rates for students from different home backgrounds. Interestingly, Biemiller and Slonim found evidence of heterogeneity of vocabulary learning rates in younger children, but little heterogeneity in learning rates of older students, although these analyses could not be conducted with great specificity in their cross-sectional study. Snow et al. (2007) conducted a longitudinal study of students in kindergarten through sixth grade (n = 68) using individual growth modeling. Children in their study tended to have steeper receptive vocabulary growth trajectories when they had strong support for literacy at home, heard rare words at home and school, and engaged in extended discourse in school. Vocabulary increasingly correlated with reading comprehension as children got older. Like Biemiller and Slonim (2001), Snow et al. found that student vocabulary learning increased across grade levels. However, unlike Biemiller and Slonim, Snow et al. found limited heterogeneity in vocabulary-learning rates; in fact, because of this limited heterogeneity, no variance component associated with the rate of vocabulary learning was used in the final fitted model. This study did not explore differences in vocabulary that may result from seasonal school attendance patterns, or examine if home-language status was a predictor of differences in vocabulary-learning trajectories.

Summer learning
Research on summer learning has tended to focus on the relationship between socioeconomic status and achievement, and if this relationship is the same during the summer as it is during the school year. Heyns (1978) found that the reading vocabulary subtest of the Metropolitan Achievement Tests was the most highly reliable subtest and the one most highly correlated with others subtests including reading comprehension. She used spring and fall assessments to understand the effects of schooling, summer activities such as music lessons or participation in organized sports, and socioeconomic status on word-learning achievement during the summer and the school year. In her study of sixth and seventh graders (N = 2,978), she found that family income was a statistically significant predictor of fall vocabulary scores, controlling for spring scores, parental education, race and IQ, but did not predict vocabulary outcomes at the end of the school year, controlling for fall baseline scores and the same covariates. In short, she found that student socio-economic status (SES) influenced student vocabulary-learning achievement during the summer more than it did during the school year, a time during which all students had access to regular academic instruction. Heyns also showed that students who spent more time reading had better fall vocabulary scores, controlling for the spring scores, family income, parental education, and household size. Although Heyns' research suggests that urban districts serving large numbers of students from low-income families need to consider the impact of summer loss, it does not provide much guidance for school leaders working in those districts to help them determine which low-income students may experience the strongest effect of summer setback. Hansen (1989) used three waves of data collection (fall, spring, fall) to examine changes in the achievement of students (N = 117) from low-income Spanishdominant homes in second through fifth grade. Hansen found that while measures of auditory vocabulary improved as much during the school year as they did during the summer, measures of reading comprehension improved during the school year but plateaued during the summer. In a second series of analyses, Hansen determined that while home language predicted changes in reading comprehension during the school year, it did not do so during the summer months. Studies of change in students performance from wave to wave, such as those by Hansen (1989), Heyns (1978), and others (Mousley, 1973;Wintre, 1986) did not use information about individual trajectories even when the data were available. Multilevel models allow for the modeling of change over three or more waves of data more precisely (Singer & Willett, 2003).
There are studies of seasonal effects on student learning in domains other than vocabulary, some of which use longitudinal methods. Alexander, Entwistle, and Olson (2001) analyzed a representative random sample of children (N = 790) from the Beginning School Study for whom fall and spring achievement data were available from first through fifth grades. They found that individual changes in literacy skills (measured by the California Achievement Test) were best described when a summer-setback term was included in the model. This setback term represented the difference between student fall achievement level and the level that would have been expected if they had continued to learn at the school-year rate during the summer; in other words, it represented how much the summer vacation set students back from the trajectory that might have otherwise been expected. Like Heyns (1978), these researchers found that children of lower socioeconomic status had larger summer setback each year than their wealthier peers, but more similar rates of learning during the school year. Using longitudinal methods, Alexander and colleagues (Alexander, Entwisle, & Olson, 2007;Entwisle, Alexander, & Olson, 1997) replicated some of Heyns' findings and demonstrated that summer setback accumulates across summers to explain variation in academic achievement. A metaanalysis of summer learning studies has largely supported these findings, and demonstrates that as students get older the the impact of summer loss grows for literacy-related outcomes (Cooper, Nye, Charlton, Lindsay, & Greathouse, 1996). These studies suggest that summer loss should be considered in longitudinal models of literacy-related outcomes and that language status should be examined as a predictor of summer loss.

Independent reading
There are good reasons to think that reading is an important conduit for vocabulary learning. The simplest argument starts with the premise that high school students know between 25,000 and 50,000 words, but that ''this many words could not be acquired from direct instruction nor from looking them up in a dictionary. There is only one other possible source of knowledge: inference based on context'' (Nagy, Herman, & Anderson, 1985, p. 325). Many studies have confirmed students' ability to infer the meaning of newly encountered words in text, although this is a difficult task and there are important individual differences in children's ability to accomplish it successfully (Cain, Oakhill, & Elbro, 2003;Cain, Oakhill, & Lemmon, 2004;Fukkink, Blok, & de Glopper, 2001;Lawrence, 2009;McKeown, 1985;Nagy et al., 1985;Swanborn & de Glopper, 1999;van Daalen-Kapteijns, Elshout-Mohr, & de Glopper, 2001); second language learners in particular may struggle at this task (Nagy, McClure, & Montserrat, 1997;Stoller & Grabe, 1995). If reading is the central conduit for children's vocabulary learning, then we would expect to find variation in annual word-learning rates that mirror the enormous variation found in children's reading diets. Anderson, Wilson, and Fielding (1988) conducted a time-allocation study of 155 fifth-grade students. They estimated that whereas children in the 80th percentile of independent book reading will have encountered more than 2,800,000 words a year, children in the 20th percentile will have read just over 150,000 words, and that reported reading amounts correlate with student vocabulary knowledge. Consistent with the argument that students learn words when reading, several researchers have found a relationship between reading amount and vocabulary knowledge. Allen, Cipielewski, and Stanovich (1992) asked students (N = 63) to complete two vocabulary checklists and the Peabody Picture Vocabulary Test, and to keep activity logs during the first 15 min of class for 15 days. They found that the students who reported reading many books in a book-reading diary did better than their peers on a vocabulary checklist task (r = 0.41, p \ .05). Using a the title-recognition test as a proxy for reading amount, Cunningham and Stanovich (1991) estimated even stronger correlations between reading and receptive vocabulary (r = 0.46, p \ .05) in a sample of fourth-, fifth-, and sixth-grade students (N = 134). Attempts have been made to understand these correlations with multivariate regression controlling for baseline measures. In their multiple regression analysis, Anderson et al. (1988) found that students who read books had better vocabulary knowledge (book reading predicted 10% of the variation in fifth-grade reading scores). However, book reading no longer explained variation in fifth-grade vocabulary scores once a control for second-grade achievement was introduced. Interestingly, none of these studies examined home language-status as a moderator of student vocabulary learning from independent reading. Kim (2004Kim ( , 2006 has been a pioneer in using randomized trials to evaluate the effects of voluntary summer reading. Kim (2004) looked at home language in his study of summer setback in the reading comprehension of students attending ethnically diverse elementary schools in a suburban mid-Atlantic school district. He found that students whose primary language was not English scored lower on measures of literacy after summer vacation, controlling for spring baseline literacy achievement and a host of demographic variables. He also found that students who read infrequently during the summer had larger losses in literacy skills on average than more frequent readers. In a subsequent randomized field study of voluntary summer reading, Kim (2006) found that the benefits of participating in the program (which included reading lessons and books being sent home) were largest for minority students, less-fluent readers, and those who had fewer books at home. Kim and White (2008) examined the performance of students in three intervention conditions relative to a control group: students given books; students given books plus oral reading scaffolding; and students give books plus oral reading and comprehension scaffolding. Students who had books plus scaffolding improved during the summer relative to students in the book-only or control conditions. Kim and Guryan (in press) examined the effect of providing books and family support to mostly Latino students from low-income homes and found that the program had no impact on student vocabulary or reading comprehension ability. Like his earlier study (Kim, 2004), this research again demonstrated that summer setback is sharp for LMH students, underscoring the importance of examining the learning trajectories of LMH students across the calendar year. Kim's studies have also repeatedly demonstrated that students do not benefit equally from independent reading and that instructional support for reading may be necessary for students to benefit from time allocated to reading, findings that are consistent with research on individual differences in word learning from incidental encounters with text (Fukkink et al., 2001;Swanborn & de Glopper, 1999).

Language status and vocabulary-learning
English vocabulary has both proximal and distal relationships with English reading comprehension for LMH students (Proctor, Carlo, August, & Snow, 2005) and is critical for skilled reading development (August, Carlo, Dressler, & Snow, 2005). We know that there are many cross-linguistic relationships in the abilities of secondlanguage learners (Genessee, Geva, Dressler, & Kamil, 2006), that some but not all vocabulary skills transfer between languages (Nagy, García, Durgunoglu, & Hancin-Bhatt, 1993;Ordonez, Carlo, Snow, & McLaughlin, 2002;Proctor, August, Carlo, & Snow, 2006), and transfer ability may be moderated by age (Uccelli & Paez, 2007). These studies suggest a complicated picture of vocabulary development, especially for students who do not speak English at home but receive school instruction only in English. Although the challenges faced by students learning in these submersion environment has been acknowledged (Aarts & Verhoeven, 1999;Cummins, 1991), little research has examined the effect of the seasonal variation in language exposure that these students experience across the calendar year.
Jean and Geva (2009) conducted a longitudinal vocabulary study with 213 fifthand sixth-grade students, including English-home (EH) students (n = 63) and students for whom English was a second language (n = 149). These researchers found that at the start of the study fifth-grade EH students had better receptive vocabulary (measured on the Peabody Picture Vocabulary Test-Revised), but did not have better knowledge of root word vocabulary than language-minority students. A year later, however, there were differences favoring EH students on both vocabulary measures, with relative improvements in the root word vocabulary test having been achieved by EH students in both basic words (those that would be known by most students in second and fourth grades) and more challenging words. These results show that in the winters of fifth and sixth grade, English-only students had stronger vocabularies than students who spoke English as a second language. Furthermore, EH children ''continue to improve their knowledge of Grade 2 and 4 academic words at a faster pace than ESL children'' (Jean & Geva, 2009, p. 176). This study was original in the attention it paid the English-language vocabulary development of students who spoke a language other than English at home, and the results provide evidence about the different kinds of challenges that LMH students face in learning vocabulary in their second language (L2). Since the study only used annual measures of vocabulary, it is not clear if differences between LMH and EH students were due to summer setback or different learning trajectories during the school year.
These studies highlight important gaps in the current research literature. Although the impact of summer setback has been clearly documented for low-SES students using regression and longitudinal methods (Alexander et al., 2001(Alexander et al., , 2007Heyns, 1978) these studies did not examine home-language status. Summer setback in vocabulary knowledge has been shown to be connected to language status (Hansen, 1989), but only for younger children and without controls for independent reading activity. Studies by Kim (2004) and Kim and Guryan (in press) have shown that language-minority students have large summer setbacks and that reading may help ameliorate summer loss, but these studies do not connect changes in achievement during the summer to ongoing learning across the following year. Therefore, the research questions for this study are: 1. What are the average vocabulary trajectories of students from English-speaking homes attending an urban middle-school serving mostly students from lowincome families? Are there differences in the trajectories of students who start the study with higher or lower reading abilities? 2. In an urban middle school serving mostly students from low-income families, are there differences between the vocabulary learning trajectories of students who are eligible for free and reduced lunch and those who are not? Are there differences in learning trajectories between students who report reading more or less frequently? 3. Controlling for lunch status and reading amount, are there differences in the summer setback and school-year vocabulary learning trajectories of students from language-minority homes and students from English-language homes?

Research design
District setting This research was conducted in a large urban district in the Northeast United States that served large numbers of minority students and students from low-income families (Table 1). Despite being considered a very strong urban districts in national comparisons (Lutkus, Rampey, & Donahue, 2005), the average achievement in the district on standardized test measures was well below the state average.

School setting
The study was conducted in a mid-sized urban middle school. This school was an extended-day school. Students at this school were required to attend school for roughly four hours longer each week than students attending other schools in the district. Average academic achievement at this school was somewhat better than achievement levels of other middle schools in the district (on standardized measures), suggesting that during the school year there were not only more hours of school but also at least adequate academic instruction. At the same time, a majority of the students in the school were from low-income families (83.6% of students were eligible for free or reduced-price lunch), and there were large numbers of students from language-minority homes (Table 1).
In the second year of the study, the school experienced changes in enrollment that were partly the result of the district's movement toward K-8 schools. Since many K-8 schools did not have enough staff at each grade level to accommodate substantially separated classrooms, greater numbers of students with learning difficulties and special needs were assigned to traditional middle schools, such as the research site. Partly in response to the changing demographic profiles of the students whom they served, and partly because of previous experience with the program, the school leadership adopted a somewhat scripted curriculum to deliver to its larger numbers of students in substantially separate classrooms, including students with special needs and students with limited English proficiency. However, because the new curriculum had obligatory testing and scheduling provisions, the faculty and administration decided it would be better if these classes did not participate in this study, since participation would require additional time for assessment and survey completion. Therefore, no data were collected from limited-English-proficient students or severely learning disabled students for this study, with the ancillary result that there are a smaller proportion of Hispanic students and LMH students participating in the study compared to the proportion of these students in the school at large (Table 1).
For these reasons, the LMH students who participated in this study were those who received their instruction in English without language support. Some of the students were in bilingual classes earlier in their academic careers, while others were proficient in English when they entered kindergarten. Some of these students immigrated to the United States early in their schooling and these students likely had parents or guardians from a range of national and professional backgrounds (Suárez-Orozco, Suárez-Orozco, & Todorova, 2008). Although most of LMH students came from Spanish speaking homes (n = 58), the LMH designation also identified students who spoke languages other than English or Spanish at home (n = 10). For the purposes of this study the defining features of LMH students is that their parents identified themselves as speaking a language other than English at home and they requested that the district contact them in that language.

Participants
Complete data were not provided by the district for every student that was eligible to participate in the study. In order to compare competing models, it was necessary to create a dataset that included only students who had provided data on every predictor variable of interest. Therefore, if I did not have data on a student's eligibility for free or reduce lunch (n = 8) or home language status (n = 6), I dropped that case from this analysis. As a result, from a sample of 291 students, only 278 were used. To ensure that students who were missing data were not significantly different from students who provided it, I compared students who provided data in each category with those that did not according to lunch status, standardized test scores, home-language status, time spent reading, and baseline vocabulary. There were no differences between the students who were dropped and those who were maintained on these indicators at an alpha level of 0.05 or lower. Additionally, it was not possible to collect survey data from all students at each of the two waves of data collection. Since there were far greater numbers of students who did not complete the first (n = 37) or the second (n = 53) survey, these students were maintained in the analytical sample. Models that include reported reading amount as a predictor cannot be compared directly with models that do not include these data using the deviance statistics, as will be detailed in the ''Findings'' section. I have completed the entire analysis reported here in the larger data set with no dropped cases and a smaller data set with only data from the 209 student who had complete data on every predictor variable and came to the same conclusions for each research question.
Procedure I summarize data collection procedures in Table 2. On May 18, 2006, the first wave of vocabulary-achievement data were collected, which was also the first day of the study for analytical purposes. Upon returning to school the following September, students took a time-allocation survey that included questions asking about how much time they spent independently reading during the summer. Students completed the second vocabulary assessment on September 26, 2006 (day 133 of the study). In April, students were again asked how much time they spent doing out-of-school activities, including reading different kinds of texts, during the month of March (a month during which there were no school vacations or days off other than weekends). Students completed the third vocabulary assessment on April 27, 2007 (day 345 of the study). For roughly half the students, this was the last wave of data they contributed to the study, since the older cohort was in eighth grade and ready to graduate to high school, where further vocabulary data could not be collected. The younger cohort (who began the study when they were in sixth grade) returned to complete the last vocabulary assessment on October 25, 2007.
The amount of data contributed by students in each grade-level cohort are presented in Table 3. This table demonstrates that the older students (who started the study when they were in seventh grade) did not contribute data in the fall of 2007, because they had graduated from the research site. Note that students who contributed as little as one wave of vocabulary data were still included in the analysis (consistent with the flexible use of data longitudinal methods facilitate, Singer & Willett, 2003).

Measures
I used the multilevel model for change in this analysis, which not only permits the accurate modeling of individual growth but also the analysis of time-varying predictors. In order to conduct this study, the data were prepared in a person-period dataset in which data for each student was arrayed in four rows. Time-varying variables were used in the level-1 model (specified below). For time-invariant predictors like language status and grade-level cohort, the value of each of the four rows is the same for each student, since language status and cohort status of the students did not change during the study; these variables were used in the level-2 model.  Outcome Vocabulary Up to four waves of vocabulary data were collected from each child and used to create a time-varying continuous level-1 outcome variable VOCAB. The group reading and diagnostic evaluation (GRADE) is a standardized reading assessment that includes four subtests at the middle school level (Level M), one of which is a vocabulary assessment (Williams, 2001a). The GRADE Level M test was designed and normed for use across grades six through eight. In norming this test, raw scores convert to the same scaled score regardless of grade level (see the tables on pages 19, 23, and 27, Williams, 2001b). As a result, raw scores are intrinsically meaningful. In contrast, when different forms are given for each grade level, researchers and educators must take the intermediate step of converting a raw score to a scale in order to compare meaningfully across grade levels; the design of GRADE Level M makes that step unnecessary. While it is true that raw scores do not have the advantage of scaled scores in that one point more or less does not convert to the same increment in the underlying ability measured, they do have the advantage of being easily interpreted. And although many choose to use grade equivalents as a means of adhering to a scaled score while using a more easily interpreted metric, they have been criticized for misleading consumers of research in that they are pure extrapolations outside the norming sample. Even within the limits of a norming sample, there is substantial evidence that a month's or year's worth of learning is not the same across different grades (at least not any more so than an item is; see Bloom, Hill, Black, and Lipsey (2008) for a comparison of growth across grades K-12 on seven nationally normed achievement tests.) The process of standardization across different learning periods results in equal score variance across age, but ''sacrifices individual differences in growth'' (Espy, Molfese, & DiLalla, 2001, p. 51). The reliability coefficients for forms A and B of the Level M assessment were acceptably high for seventh (corrected coefficient = 0.88) and eighth graders (corrected coefficient = 0.82). The vocabulary test itself is composed of a sentence or sentence fragment, with a target word printed in bold type. Students are asked to select a synonym for the target word from a list of five. There are 40 items on each form, with equal numbers of nouns, verbs, adjectives, and adverbs as target words in each form. Target words include some high-frequency words (which occur on the General Service List; West, 1957) such as seldom, classify, and resisted, as well as general academic words (which occur in the new academic word list; Coxhead, 2000) such as primary, passive, and compiled. Additionally, there are also a number of low-frequency words such as petrified, render, and enthralled. Mandated comprehensive assessment system The school district provided the English Language Arts mandated comprehensive assessment system (MCAS) scores of each participating student. This score was centered on the lowest score in the sample (214) creating the time-invariant variable MCAS with a sample minimum of zero and the maximum of 64.
Language-minority home status The school district provided information about the language in which parents wished to be contacted by the school in. If parents indicated that they wished to be contacted by the school in a language other than English, then children were assigned a value of one on the level-2 languageminority home status variable, LMH.
Free or reduced lunch The district provided information about the LUNCH status of 247 students at the start of the study, and provided additional data about another 31 students at the completion of the study. Although it is possible that the follow up information provided for the 31 students reflected concurrent but not earlier income status, these data were treated as reflecting family income during the course of the study. All students were either coded as one (eligible for free or reduced lunch) or zero (not eligible) on this level-2 bivariate variable.

Time allocated to reading during the summer
Students completed surveys of their time allocation on two occasions, once in September and once in April. The survey itself was based on one designed to learn about students out-of-school activities during the school year (Moje et al., 2005), and comparative results from a large study examining student reading and other activities across multiple sites is available (Moje, Overby, Tysvaer, & Morris, 2008). In the first survey, students were asked a series of questions that required them to think about what they had been doing during the previous month of summer vacation. For instance, one of the questions was, ''How often did you read for pleasure?'' Students were also asked to report on how often they read specific kinds of reading materials such as narrative texts (novels, short stories, biographies, religious books, poetry), expository texts (information books, research reports, instructions on how to do something, maps, bus and airline schedules, newspapers), teen cultures texts (comic books, magazines, music lyrics) and computer-based reading (e-mail and websites). Students could select from seven answers: (1) never, (2) once a month, (3) 2-3 times a month, (4) once a week, (5) 3-4 times week, (6) every day for less than 1 hour, and (7) every day for more than 1 hour. These data were constituted the level-2 predictor SUM_READ (time spent summer reading), SUM_NAR_READ (time spent reading narrative text), SUM_EXP_READ (time spent summer reading expository text), SUM_TEEN_READ (time spent summer reading teen culture texts), and SUM_COMP_READ (time spent reading on the computer in the summer).
Time allocated to reading during the school year In the second survey was administered in April and asked students a series of questions about what they had read outside of school during the previous month, answering on the same Likert scale used in the summer survey. These data constituted the level-2 predictor SCH_READ (time spent independent reading during the school year), SCH_NAR_ READ (time spent independent reading narratives during the school year), SCH_EXP_READ (time spent independent reading expository text during the school year), SCH_TEEN_READ (time spent independent reading teen texts during the school year), and SCH_COMP_READ (time spent independent reading on the computer during the school year).

Covariates
Gender The school district provided the gender of each participating student, which was turned into a bivariate variable FEMALE indicating if the student was female (female = 1, male = 0).

Grade-level cohort (covariate)
The school district provided the grade level of each participating student, which was converted into a variable GRADE7 that identified whether a student was in grade 6 (GRADE7 = 0) or grade 7 (GRADE7 = 1) at the start of the study.

Data-analytic plan
All of my research questions were addressed by fitting a multilevel model for change (Singer & Willett, 2003). Because I only had four waves of longitudinal data, I had to impose strong assumptions about model specification: (1) that growth in vocabulary knowledge was linear, (2) that the size of summer setback (if detected) was identical in both summers, and that (3) the impact of summer setback was cumulative. Additionally, the effect of the passage of time (the rate of change parameter) was fixed across all children, as its variability proved negligible in all fitted models (unconditional growth model f 0i = 5.2 9 10 -22 , p = n.s.). The resultant level-1, level-2 model specifications are as follows: where e ij $ N 0; r 2 e À Á

Steep learning and sharp summer loss 1125
In these equations, subscript i denotes individual students and subscript j denotes the number of days since the start of the study (i.e., 0, 133, 345, or 526 days). Because GRADE7 is centered on the younger cohort (those who started the study in sixth grade), and MCAS is centered on the lowest MCAS score in the sample, the regression parameters have the following interpretations: c 00 is the population average for sixth-grade English-home students at the start of the study who scored at the bottom of the sample on MCAS; c 01 GRADE7 i represents the difference in baseline vocabulary scores between students in different grade-level cohorts; c 02 MCAS i is the difference in true baseline vocabulary scores predicted by one point difference on the MCAS; c 10 is the average rate of change for English-home children, after controlling for reading and summer setback; and c 20 represents the average setback in the vocabulary predicted by each summer experienced by English-home sixth graders, after controlling for reading amount. Additionally, there are a series of parameters that estimate the differences between intercept, slope, and summer setback for LMH and EH students (c 04 LMH i , c 11 LMH i , and c 21 LMH i , respectively). The differences between the intercept, slope, and summer setback for students based on lunch eligibility are similarly parameterized (by estimates of c 05 LUNCH i , c 12 LUNCH i , and c 22 LUNCH i , respectively).
Each of the research questions was answered with attention to a specific parameter in the final fitted model. The intercept (c 00 ), growth rate (c 10 ), grade-level covariate (c 01 GRADE7 i ), MCAS (c 02 MCAS i ), gender (c 03 FEMALE i ), and summer setback terms (c 20 ) were examined to answer research question one. Research question two was answered by examining the estimate of the parameters associated with differences in intercept, slope and summer setback by lunch status (c 05 LUNCH i , c 12 LUNCH i , c 22 LUNCH i ) and reading amount (c 06 SCH READ i , c 13 SCH READ i , c 23 SUM READ i ). Research question three was answered by examining the parameter estimates associated with home language status (c 04 LMH i , c 11 LMH i , c 21 LMH i ). Table 4 presents the four waves of GRADE vocabulary data for students by gradelevel cohorts, lunch status, home-language status, and gender. Looking down the first column of data (Spring 2006) suggests that there were large baseline differences between students in sixth and seventh grade, and very small differences between students based on lunch or home language status. The baseline scores for both grade levels suggest that this sample was roughly normative. The average raw score of 16.06 and 19.40 for students in the spring of sixth and seventh grade corresponds to a normative grade equivalent score of 6.6 and 7.6, respectively (on Form A for Spring testing), suggesting that participants may have been three months behind the national norming sample in vocabulary knowledge at the start of the study (Williams, 2001b). Looking from left to right across each row of data suggests that although students in every category demonstrated strong improvement during the 2006-2007 school year, students in each category regressed in their vocabulary knowledge during the summer months on average. Table 5 presents results from the survey of student pleasure reading during the summer and the school year. Students self reported spending more time reading for pleasure during the school year than during the summer on average, although seasonal differences were larger in time allotted to reading narrative (M summer = 2.40, M school year = 3.27) and expository texts (M summer = 2.33, M school year = 3.23) than teen genres (M summer = 3.23, M school year = 3.43) and computer-based reading (M summer = 4.45, M school year = 4.75). Independent samples t tests showed there were no differences between the self-reported time allocated to reading these general text genres by lunch or language status at a Bonferroni adjusted significance level (a \ 0.01), although there were differences in time allocated to reading specific text types like newspapers and biographies by student who spoke English or another language at home (p \ 0.001). I explored each of the summer and school year reading variables as predictors of vocabulary in preliminary models. The model using the overall time allocated to pleasure reading produced better models than those which explored narrative reading (-2LL = 3,966.6), expository reading (-2LL = 3,969.3), teen culture reading (-2LL = 3,965.7), or computer based reading (-2LL = 3,967.8) controlling for home language and lunch status (compare with Model F below). Therefore, I used the survey item which asked students to indicate their overall time spent reading as the best measure of reading in the longitudinal analysis described below.

Findings
1. What are the average vocabulary trajectories of students from English-speaking homes attending an urban middle-school serving mostly students from lowincome families? Are there differences in the trajectories of students who start the study with higher or lower reading abilities? Table 6 presents the series of multilevel models for change predicting GRADE vocabulary performance across four waves of data collection. The first six parameters listed in the third column of Table 6 represent core parameters used in each model. All fitted models had roughly similar estimations of the parameters associated with the intercept (c 00 ), days since the start of the study (c 10 ), grade level cohort (c 01 GRADE7 i ), baseline ELA MCAS score (c 02 MCAS i ), gender    Table 6) demonstrates that the true average scores on the first wave of the GRADE vocabulary test for sixth-grade students from English homes with minimal MCAS scores was only slightly above chance (c 00 = 10.676, p \ .001).
Students who began the study in seventh grade (but were otherwise similar) started with higher average scores (c 01 GRADE7 i = 2.482, p \ .001), which they maintained throughout the study (i.e., there was no interaction between grade and days). There was a strong correlation between baseline vocabulary and MCAS scores (r = 0.619, p \ .01). Students with higher MCAS scores began the study with better vocabulary on average (c 02 MCAS i = 0.268, p \ .001), such that the difference between students in the 25th and 75th percentile on MCAS was 3.55 points across the study; this difference was larger than the difference between gradelevel cohorts. Although girls started the study with slightly higher average scores, being female predicted lower baseline vocabulary controlling for MCAS scores and grade level in the final fitted model (c 03 FEMALE i = -0.836, p \ .05). Figure 1 presents the trajectories of prototypical EH students with higher and lower MCAS scores. All students demonstrated significant growth over the course of the study on average (c 10 = 0.01559, p \ .001), and experienced a setback effect (c 20 = -2.317, p \ .001) in the number of vocabulary items they answered correctly at the end of the summer relative to what would be expected had they continued learning at a constant rate. While the model (Eq. 1) specifies a one-time drop in vocabulary knowledge at the first data collection point after each summer, in actuality we have no information about the trajectory of student word maintenance and loss during summer months, and so the plot line for each of the prototypical students is left blank during the summer to underscore this fact (Fig. 1). Fig. 1 Fitted trajectories of prototypical sixth-and seventh-grade ELH students who scored poorly (25th percentile) or well (75th percentile) on the baseline English Language Arts administration of the State Mandated Comprehensive Assessments System 2. In an urban middle school serving mostly students from low-income families, are there differences between the vocabulary learning trajectories of students who are eligible for free and reduced lunch and those who are not? Are there differences in learning trajectories between students who report reading more or less frequently?
Model D demonstrated that there was no difference between the growth trajectories (c 12 = -0.002, p = n.s.) or summer setback (c 22 = 0.875, p = n.s.) of student who were eligible for free or reduced lunch and those that were not. Model E includes parameter estimates for the impact of reading on vocabulary growth (c 13 SCH READ i = 0.001, p = n.s.) and summer setback (c 23 SUM READ i = 0.060, p = n.s.). Although the goodness-of-fit statistic (-2 LL) is greatly reduced by the inclusion of these terms, this is the result of cases being excluded from the model because of missing data which reduced the total amount of unexplained deviance in the model. I also fit models D and C using a data set that included only the 209 students who provided complete data on every predictor and confirmed what the significance values of the lunch status and reading predictors suggests, that neither lunch status nor independent reading improved models predicting vocabulary growth. Furthermore, there were no significant interactions between reported time spent reading, free and reduced lunch eligibility, gender, or baseline MCAS scores with time or the summer setback estimate predicting vocabulary across the four waves of data.
3. Controlling for lunch status and reading amount, are there differences in the summer setback and school-year vocabulary learning trajectories of students from language-minority homes and students from English-language homes?
Model G, the final fitted model, examined differences in the slope and summer setback for students from language-minority homes. This model shows that students from language-minority homes learned vocabulary more rapidly than students from English language homes (c 11 LMH i = 0.008, p \ .001) and that LMH students had more pronounced summer setback (c 21 LMH i = -2.004, p \ .001), after controlling for grade level, baseline MCAS scores, and gender. In order to estimate these parameters controlling for free or reduced lunch status and reading amount, Model F ( Table 6) includes all of the predictors and demonstrates that the effect of home language status persists after controlling for these and other predictors such as grade level, MCAS, and gender. There were no significant interactions between reported time spent reading and home language status, free and reduced lunch eligibility, gender, or baseline MCAS scores with time or the summer setback estimate in modeling vocabulary achievement across the four waves of data. Figure 2 presents fitted trajectories of prototypical sixth-grade boys from language-minority (thin lines) and English (thick lines) homes who scored poorly (dashed lines) or well (solid lines) on the MCAS in the spring of 2006. Both strong and struggling students from language-minority homes (thin solid line and thin dashed line respectively) had steeper trajectories across the course of the study than students from English homes (thick solid and thick dashed lines). However, because of the larger summer setback experienced by students from LMH each summer, LMH students ended the study with roughly the same vocabulary levels as students from English homes. Post hoc general linear hypothesis (GLH) tests demonstrated that differences existed between the groups before the first summer (June 6) favoring LMH students (v 2 = 8.93, p \ .05), after the first summer (September 6) favoring EH students (v 2 = 5.55, p \ .05), and at the end of the second school year (June 6) favoring LMH students (v 2 = 4.77, p \ .05). There was no difference between the two groups on the last day of data collection (November 7; v 2 = 0.11, p \ n.s.).

Discussion
This study presents a longitudinal analysis of the vocabulary development of mostly low-income adolescent students including large numbers from language-minority homes. There are five key findings from this study. The first finding was that most students in this sample experienced a summer setback from anticipated vocabulary learning rates. The second finding was that family income, as measured by eligibility for free or reduced lunch, did not predict differences in vocabulary learning rates or summer setback in this relatively income-homogeneous sample. The third finding was that students from language-minority homes experienced a deeper summer setback and had steeper school-year vocabulary learning trajectories than their EH peers. The fourth finding was that higher self-reports of reading during the summer and school year did not predict improved student vocabulary trajectories, a finding which I interpret with caution below. The fifth finding was that there was limited heterogeneity in vocabulary-learning rates.
The current study is the first, to my knowledge, to use longitudinal methods to study the effect of summer setback in vocabulary knowledge of middle-school students. I found that setback during summer was an important component of all models of student vocabulary learning. This finding needs to be considered within the context of the setting where the study was conducted. Even though there was some variation in income as measured by eligibility for free and reduced lunch, variation in SES on this measures was limited: the majority of students in the sample, school, and district were from low-income families. Previous research has demonstrated that students from low-income families are more likely to experience summer setback than students from more wealthy homes (Heyns, 1978).
Given the relative homogeneity of SES in this sample, it is not surprising that eligibility for free or reduced lunch was not a predictor of vocabulary setback during the summer. Lunch status is a poor proxy of SES (Harwell & LeBeau, 2010), and in the current setting it is not clear that differences in lunch eligibility represent the large differences in income that we might expect in a more income-heterogenous sample. Therefore, this finding should only be interpreted as applying to schools or districts that have large number of low-SES students and very few higher-income families. This is exactly the population served by many urban districts and schools in America (National Center for Education Statistics, 2010). For many educational leaders in these districts, low-income status will not provide much guidance in identifying students who are at risk for greater summer setback.
My results show that language-minority students have greater summer setback on average than English-home students even controlling for demographic factors, baseline achievement data and self-reports of amount of time spent reading. One explanation for this finding is that LMH students tend to converse and recreate with their parents and family in a language other than English, and while these opportunities may support L1 vocabulary learning and maintenance, they may limit student opportunities to learn new English words. Another related explanation is that LMH students may live in communities where a language other than English tends to be spoken, and English vocabulary learning opportunities in the community may therefore be reduced (Alba, Stults, Logan, & Lutz, 2002;Arriagada, 2005;Stevens, 1992). Both these explanations suggest that there may be a tradeoff in opportunities to learn L1 and L2 vocabulary during the summer months, with the complication that the larger summer setback in English is also associated with steeper learning trajectories during the school year.
There are several explanations as to why students from language-minority homes would learn English vocabulary more rapidly than their EH peers during the school year, even though the students were in exactly the same instructional environments. As we have seen, survey data suggests that for the most part reading preferences of LMH students and EH students were the same, however there were some differences (in non-fiction reading for instance) that might partially explain divergent vocabulary trajectories. It is important to note, however, that there were no three way interactions between language status, reading, and time. This means that although there was evidence of some limited difference between the reading habits of EH and LMH students, there was no evidence that the relationship between reading and vocabulary was different for students by language status. Another possible explanation is that LMH students may utilize first language (L1) oral and literacy skills in their acquisition of L2 vocabulary (Nagy et al., 1993;Ordonez et al., 2002;Proctor et al., 2006;Uccelli & Paez, 2007) so that students with home L1 language support are better able to learn academic English vocabulary when they encounter new words during class or when reading. I found that time spent independent reading is not a predictor of vocabulary learning across the sample. This finding might seem surprising in light of the fact that there are many studies that show strong correlations between independent reading and vocabulary (Allen et al., 1992;Anderson et al., 1988;Cain et al., 2004;Cunningham & Stanovich, 1991;Heyns, 1978;Snow et al., 2007). However, when baseline controls for reading achievement were introduced into the analysis, Anderson et al. (1988) found that the relationship between book reading amount and vocabulary was eliminated. Similarly, in the current study there was no relationship found between reported reading amount and vocabulary growth. Even so, there are other correlational studies (Heyns, 1978) and intervention studies (Kim, 2006) that have found relationships between vocabulary and independent reading even with careful controls for baseline achievement (although also see Kim and Guryan, in press). The current study explores independent reading as a very well recognized predictor of vocabulary learning, but it was not designed primarily to explore this question. There are at least two considerations that could not be explored in this study that should be acknowledged in interpreting this null finding.
The ability to infer the meaning of a newly encountered word is very difficult and there are important differences in how well different students do so independently. Some students learn new words from independent reading while others do not. The current analyses were only able to test this possibility in a limited way by exploring how well students with better or worse baseline vocabulary scores learned new words. This analysis suggests that there were no differences in how well students incidentally learned words from reading based on their baseline vocabulary knowledge. However, there are many other individual differences that might explain ability to infer newly encounter words in text (Swanborn & de Glopper, 1999), including passage comprehension (Cain et al., 2003(Cain et al., , 2004 and the ability to complete a cloze task (Lawrence, 2009). Unfortunately, these skills are closely related to vocabulary and it is necessary to use clearly exogenous predictors in longitudinal models. Therefore, individual differences were not explored in depth in this analysis, although other analyses of these data suggest that they are important in fully understanding how students learn new words from independent reading. The null finding for the main effect of reading in this study should not be interpreted as a contradiction of these results.
The null finding for the impact of reading also needs to be interpreted with consideration of the fact that teens in this sample reported reading a wide range of texts including magazines, comics, websites, novels and newspapers, and that the opportunities for vocabulary learning (and developing other reading skills) are not uniform across text types. Results from my earlier study using these data suggests that at least during the summer months the impact of time spent reading is moderated by the type of texts that students read (as well as the reading ability of the student). Results from the current study are probably best interpreted in this light, suggesting that although some students chose to independently read texts that support vocabulary learning, and had the skills to benefit from this leisure activity, others chose texts that did not support vocabulary development. Future studies should include factor analyses of these data, and explorations of the relationships between factor components and vocabulary achievement. In the current study such analyses were impractical since the factor components at each wave of data collection were incompatible.
The last finding from this study is that there was limited heterogeneity in students word learning rates. Students at higher grade levels and with higher MCAS scores started with better vocabulary knowledge and maintained their advantages over their peers across the study. Variance in learning was so limited that the effect of time was fixed in the longitudinal models, although this was no doubt a function of underlying variance, the words being tested, the number of waves of data collection and spacing of the waves of data. Nonetheless, it is notable that studies of word learning in very young children have found vast heterogeneity in children's rates of learning (Anglin, 1994). The current findings are more in line with other longitudinal and cross-sectional studies of school-aged children which suggest that differences in vocabulary learning rates between children attenuate as children age (Biemiller & Slonim, 2001;Snow et al., 2007).
This study has implications for future research and instruction. Heyns (1978) demonstrated the importance of looking at the impact of instruction across the calendar year; this study suggests that understanding the long-term impact of instruction is especially important if the instruction is intended to help students who speak a language other than English at home. Researchers interested in developing interventions to support the vocabulary of LMH students should certainly pay attention to these results. Different rates of vocabulary learning for LMH and EH students participating in an intervention may reflect spontaneous differences found in the wider population, so care needs to be taken to examine learning in treatment and comparison schools. Similarly, follow-up analysis should be done to see how well all students learned and maintained vocabulary and if there are differences in long-term vocabulary consolidation by language status (Lawrence, Capotosto, Branum-Martin, White, & Snow, under review). For these sorts of analyses, it is essential that multiple assessments of vocabulary knowledge be administered each year. Future research should also examine item-level data to see what words LMH students learn better than EH students, and if LMH students seem to access knowledge of Spanish-English cognates, learning strategies, morphological cues or other cognitive skills better than their EH peers. This study suggests that more research needs to be done to determine which aspects of semantic knowledge are most easily maintained and which aspects are more likely to decay; assessment items that evaluate different aspects of semantic knowledge may help us reach this understanding.
This study has implications for practice and policy. Scores on standardized assessments taken after summer vacation or other long absences from school need to be treated with caution. In this study, spring-to-spring scores correlated better than temporally closer spring-to-fall scores; LMH students coming back from summer break are likely to test poorly relative to their vocabulary potential. Secondly, vocabulary instruction should focus on helping students to learn, and maintain word knowledge. Learning a word well enough to answer a quiz may or may not provide a semantic grounding stable enough to retain knowledge of the word and then build on and consolidate that knowledge through subsequent encounters with the word in discussion or in text. Third, this study suggests that independent reading will not help all students learn words on average. This finding needs to be interpreted with reference to studies that show reading text type interacts with reading amount and student ability in predicting vocabulary learning (Lawrence, 2009) and that summer reading should be scaffolded (Kim & White, 2008).
There are several important limitations of this study. Because only four waves of data were collected, a linear trajectory for student' word learning was imposed on the model and the main effect of summer setback was assumed to be the same from summer to summer. Future studies should consider collecting more waves of data, including vocabulary-achievement data during the summer and additional waves of data during the school year. Item-level data from the GRADE were not available, so it was impossible to get reliability coefficients for the vocabulary test for subgroups. Because of conditions at the research site, no data from students with restrictive special education plans or students with limited English proficiency who were still getting L1 language support in school participated in the study; Kieffer (2008) has shown the importance of looking at limited-English-proficient students as a subgroup within LMH students. This research was conducted at a school with an extended-day program, so these findings may not generalize to other urban middle schools. Another limitation of this study is that that language status measure was based on parent self report, and there may have been reasons why parents might have wished to be contacted in a language that was not usually spoken at home. Although reading measures were based on student self-report, the questions from the reading survey were based on items used in another study of adolescent students (N = 1,045) in a different urban district (see Moje et al., 2008). In general, the trends reported in each district using the instrument were similar (for fuller details including reliability coefficients of the measures see Lawrence, 2009), which suggests that this instrument provides valid measures of reading habits, at least of urban adolescents.
Despite the limitations, the current study fills an important gap in the research literature. During the summer, LMH students experience deeper summer setback. During the school year LMH students have steeper vocabulary learning trajectories than students from English-speaking homes after controlling for grade level, baseline reading achievement, gender, free and reduced lunch eligibility, and independent reading. Although the overall vocabulary achievement was similar at the start and the end of the study, annual measurement masks different trajectories that are revealed by examining student learning during the summer and school year with longitudinal methods.