Learner corpus research has expanded from focusing primarily on English as a second language (L2) to include languages such as L2 Spanish, reflecting the growing importance of corpus linguistics in second language acquisition (SLA) research. In this context, and because prompts are the means by which learner corpora gather their texts, it has become necessary to consider the impact of prompt characteristics on text features. This dissertation examines how prompts, genre, and narrative voice affect the lexical and syntactic features of L2 Spanish learner writing, using the COWS-L2H corpus (Davidson et al., 2020; Yamada et al., 2020).The research explores the influence of different prompts and narrative voices on the Measure of Textual Lexical Diversity (MTLD) and the rate of subject pronoun presence (SPP) errors across various proficiency levels. Mixed-effects regression models reveal that first-person texts, particularly self-descriptions, tend to have higher MTLD scores and fewer SPP errors compared to third-person descriptions of special or famous individuals. A similar pattern emerges in narrative texts, where first-person narratives exhibit greater MTLD scores than third-person narratives, demonstrating the impact of personal connection and emotional resonance in learner writing.
Contrary to expectations, no significant differences in MTLD are found between descriptions of a close person and a famous person, suggesting that emotional closeness alone does not drive lexical diversity in learner texts. Additionally, while descriptive texts show significant effects of narrative voice and course level on SPP errors, narrative texts do not, emphasizing the key role of genre in determining linguistic accuracy.
This dissertation contributes to the understanding of how corpus design choices impact the study of L2 language and offers insights into the broader implications for SLA research, including areas such as language testing and pedagogy. By examining the relationships between prompts, genre, and narrative voice, it offers practical guidance for corpus developers and researchers, aiming to enhance the validity and reliability of learner corpora and inform future research directions in SLA.