Multilevel Item Factor Analysis and Student Perceptions of Teacher Effectiveness
- Author(s): Kuhfeld, Megan Rebecca
- Advisor(s): Cai, Li
- et al.
Measures of teacher effectiveness have become a major research and policy issue due to the increased focus on teacher accountability during the past decade. Growing concerns about the variability in the quality of teaching and traditional approaches to measuring teacher effectiveness led to federal and state policies calling for more rigorous measures of teacher effectiveness (Kane & Cantrell, 2010; Weisberg et al., 2009). One of the increasingly used teacher effectiveness measures is student surveys of instructional practice. These surveys are now being given in grades K-12 for accountability purposes, to provide teachers with feedback to improve their teaching, and to guide professional development (Bill & Melinda Gates Foundation, 2012). Given student surveys are widely used to assess and improve teacher effectiveness, it is important to examine the reliability and validity of these measures.
This dissertation focused on the secondary Tripod Survey, which is the most widely used off-the-shelf student survey instrument for use in middle and high schools (Ferguson, 2010). The Tripod survey asks students to provide feedback on teacher practices and student behavior, which are operationalized as the Tripod 7Cs framework of teacher effectiveness. The seven domains are Care, Control, Clarify, Challenge, Captivate, Confer and Consolidate (Ferguson, 2012). According to the survey developer, over 100,000 teachers have received feedback using Tripod surveys (Tripod Project, 2016). Despite this widespread use, astonishingly little has been published regarding the psychometric properties of the instrument, the reliability of subscales, or the predictive validity of the survey (Camburn, 2012).
In this dissertation, I describe an innovative methodological approach for exploring the dimensionality and collecting validity evidence to support the use of the Tripod survey as a measure of teacher quality. This approach uses a multilevel extension of full-information item factor analysis models. Item factor analysis (IFA) models are widely used in educational measurement research (Wirth & Edwards, 2007), though these models have traditionally ignored the hierarchical, nested structure of educational systems and treated all individuals as independent. Multilevel IFA models enable the data to be treated in an appropriate manner, instead of reducing our inferences to a single level. However, multilevel IFA models have not yet been widely applied in educational contexts due to computational challenges associated with the dimensionality and complexity of these models.
The aims of this dissertation are two-fold. First, I provide an introduction to the multilevel item factor analysis (IFA) modeling framework, and demonstrate the flexibility and efficiency of this model in various educational settings. It is essential to establish that the multilevel IFA model can be estimated under realistic data conditions prior to using this modeling technique to answer important educational policy questions regarding student surveys. Second, I use multilevel IFA models to examine the dimensionality, reliability, and validity of the Tripod student survey.
More specifically, I investigate the following research questions:
1. Can I efficiently and accurately estimate multilevel IFA models in the context of educational assessment and survey data?
2. Is possible to detect sources of model misfit in multilevel IFA models using a newly developed goodness-of-fit statistic?
3. Can I use the multilevel IFA model to produce estimates of teacher practice scores that clarify the degree to which the seven dimensions of teacher practice measured by the Tripod survey simultaneously predict student learning?
4. Using data from six urban school districts collected by the Measures of Effective Teaching (MET) Project, is there validity evidence that supports the use of the Tripod survey for summative and formative teacher evaluation purposes?
The findings from this dissertation contribute to methodological and substantive bodies of work. Methodologically, I demonstrate that the multilevel IFA model can be used to make reliable group-level inferences across a variety of educational contexts. Additionally, I propose a limited-information goodness-of-fit statistic for multilevel IFA models to address the current limitation of these models that there is no established consensus on how to assess the model fit.
In addition, this dissertation contributes to the field of teacher evaluation by analyzing the validity of the secondary Tripod survey. This work represents the first systematic review of the psychometric and validity properties of the Tripod survey. The findings call into question whether the current practice of reporting feedback in terms of the 7Cs is warranted. In particular, the gathered evidence does not support distinguishing among the six of the 7Cs teacher practices (Care, Clarify, Consolidate, Confer, Challenge, and Captivate). Therefore, I propose combining the items from these sub-domains into a single Teacher Support scale. Both Support and Control scores are found to be related to teacher observation scores, but only teachers’ level of Control is predictive of student achievement. In summary, this study provides promising evidence that the widely used Tripod survey is a useful tool for measuring two important dimensions of teacher effectiveness.