Language models work well in English, but in just about every other language, they work much worse. In this dissertation, I use theories and methods from linguistics and psycholinguistics to contribute to the understanding of how language models work for different languages and how they work in multilingual settings. As languages differ greatly in how they encode information, many researchers have asked to what extent those crosslinguistic differences impact language model performance. I investigate the role of training data size and tokenizers in those differences. I find that cross-linguistic differences which have been described in terms of typological features can instead be attributed to differences in effective dataset size.In multilingual settings, language models may use some of the same representations to encode information for multiple languages. This allows for efficient usage of the models' parameters, while also yielding benefits for the models' ability to generalize across and between languages. I use a psycholinguistic experimental paradigm, crosslinguistic structural priming, to probe these shared representations and characterize how and when models learn these representations. These results also contribute to our understanding of how bilingual people use shared representations to store information about multiple languages.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.