One of the fundamental problems in language testing is the lack of adequate generalizability between what a test is measuring and what fulfills the learners' real world language use needs. It is important to recognize that no matter how precise a test measures a construct, if the way that a construct is defined and the way that test tasks are specified do not correspond to the domain of generalization in a meaningful way, test scores may never become adequate indicators of what learners can do with English in real life. This study investigated constructs and tasks of the General English Proficiency Test (GEPT) high-intermediate reading test to explicate the issues involved in generalizing test scores to non-test situations.
The study identified and demonstrated quantitatively and qualitatively the way and extent to which two distinct ways of conceptualizing reading constructs, the trait/curriculum-based and the task/domain-based approaches, could lead to divergent construct specifications, difficulty levels, item/text characteristics, and underlying factor structures, using approaches of expert judgments and confirmatory factor analysis. A total of 242 university students and six trained raters participated in the study. All the participants took the GEPT reading test and a task-based reading test developed based on the can-do statements in the Common European Framework of Reference (CEFR) and its designated Target Language Use (TLU) domains.
It was found that when items are more task-based and workplace specific, the less similarity they share with trait/curriculum based test items. The nature and the constituents of the reading comprehension construct shift. Not only do task-based and workplace specific items require a significantly higher amount of complex propositional content to be interpreted rather than recognized, they also demand a wider range and extent of language abilities (ideational, functional, and sociolinguistic) and strategic competence when making such interpretations in relation to context. Among all the combinations of language abilities, that of manipulative function and strategic demand appear to have the most effect on the complexity of reading construct. The ability to comprehend texts then is different from the ability to comprehend texts in context. The very nature of contextualization changes the nature and constituents of the comprehension construct.
Using Bachman and Palmer's (2010) Assessment Use Argument (AUA) framework, this study strongly suggests that the GEPT is not as meaningful or generalizable as the Language Training and Testing Center (LTTC) claims it is. GEPT test scores do not provide stakeholders with sufficient information about the ability to be assessed in the TLU domain, and the GEPT tasks do not have a sufficient degree of correspondence to the TLU tasks. Due to inadequate sampling of the target constructs and its task characteristics, GEPT test scores do not appear to generalize to performance in the target domain.