Do End-of-Rotation and End-of-Shift Assessments Inform Clinical Competency Committees’ (CCC) Decisions?

Introduction Clinical Competency Committees (CCC) require reliable, objective data to inform decisions regarding assignment of milestone proficiency levels, which must be reported to the Accreditation Council for Graduate Medical Education. After the development of two new assessment methods, the end-of-shift (EOS) assessment and the end-of-rotation (EOR) assessment, we sought to evaluate their performance. We report data on the concordance between these assessments, as well as how each informs the final proficiency level determined in biannual CCC meetings. We hypothesized that there would be a high concordance level between the two assessment methods, including concordance of both the EOS and EOR with the final proficiency level designation by the CCC. Methods The residency program is an urban academic four-year emergency medicine residency with 48 residents. After their shifts in the emergency department (ED), residents handed out EOS assessment forms asking about individual milestones from 15 subcompetencies to supervising physicians, as well as triggered electronic EOR-doctor (EORd) assessments to supervising doctors and EOR-nurse (EORn) to nurses they had worked with after each two-week ED block. EORd assessments contained the full proficiency level scale from 16 subcompetencies, while EORn assessments contained four subcompetencies. Data reports were generated after each six-month assessment period and data was aggregated. We calculated Spearman’s rank order correlations for correlations between assessment types and between assessments and final CCC proficiency levels. Results Over 24 months, 5,234 assessments were completed. The strongest correlations with CCC proficiency levels were the EORd for the immediate six-month assessment period prior (rs 0.71–0.84), and the CCC proficiency levels from the previous six-months (rs 0.83–0.92). EOS assessments had weaker correlations (rs 0.49 to 0.62), as did EORn (rs 0.4 to 0.73). Conclusion End-of-rotation assessments completed by supervising doctors are most highly correlated with final CCC proficiency level designations, while end-of-shift assessments and end-of-rotation assessments by nurses did not correlate strongly with final CCC proficiency levels, both with overestimation of levels noted. Every level of proficiency the CCC assigned appears to be highly correlated with the designated level in the immediate six-month period, perhaps implying CCC members are biased by previous level assignments.


INTRODUCTION
In the "Milestone Project" for assessing resident physicians' competencies, 1 the determination of milestone proficiency is the responsibility of the Clinical Competency Committees (CCC). To meet this obligation our CCC, composed of our core emergency medicine (EM) educational faculty, meets twice a year. It seeks to rely on objective measures to select one of the five levels of ascending proficiency that best represents each resident's individual performance during the preceding six months of training. 2 While suggested assessment methods are provided for each of the subcompetencies within an individual specialty's milestones, 3 there are no clear current best practices regarding which assessments are most likely to provide the most useful and valid data to CCCs in the determination of the proper proficiency level.
Previous reports have noted that end-of-shift (EOS) assessments, if used in isolation, yield falsely elevated proficiency levels. 4 Schott et al. failed to validate the results of direct observation using either a checklist tool or a milestone proficiency-level tool when used in video review of a critical patient encounter with varying levels of trainees. They cited significant issues with both rater error and instrument error. 5 We developed a multi-modal milestone evaluation program geared at obtaining objective data for CCC usage. In this study we provide a description of the performance of the two predominant assessment methods used in this new milestone evaluation program: (1) the brief EOS assessment collected in paper form at the end of a shift after direct supervision; and (2) the end-of-rotation (EOR) global assessment collected in electronic form. We report data on the concordance between EOS and EOR assessments, as well as how each informs the final proficiency level determined in biannual CCC meetings. We hypothesized that there would be a high concordance level between the two assessment methods, including concordance of both the EOS and EOR with the final proficiency level designation by the CCC.

METHODS
The study site is an urban academic institution, home to a four-year EM residency with 48 residents and 42 fulltime faculty members across two large medical centers. Institutional review board approval was obtained. In short, the EOS assessment involved residents handing out individual assessment sheets comprised of 9-11 individual "milestone" questions, taken from 15 subcompetencies, to supervisory doctors after a shift. These pocket notebooks contained 10 sets of each 8-sheet assessment packet. Assessor identity was not tracked on the EOS. The EOR assessment allowed residents to electronically trigger an online assessment focused on global performance after two weeks of an emergency department (ED) rotation. The EOR for supervisory doctors (EORd) sent the full five levels of ascending proficiency from 16 subcompetencies for supervisory doctors and from four subcompetencies for nurses (EORn). Reports were run for both EOR and EOS assessments after each six-month period to calculate proficiency levels for each of the applicable subcompetencies, and information was provided to members of the CCC.
Similar to a grant review, each CCC member was assigned primary responsibility for up to six residents, reported a summary of the data after review, and suggested proficiency levels to the group. Final proficiency levels were determined after group discussion with guidance from the CCC leader. To determine correlations, aggregate data for the EORd, EORn, EOS, and final CCC proficiency levels were obtained for each of the four six-month time frames. We calculated Spearman's rank order correlations for correlations between assessment types and between assessments and final CCC proficiency levels. Correlations were considered "very strong" for r s > 0.8, "strong" for r s =0.6-0.79, "moderate for r s = 0.40-0.59, "weak" for r s = 0.20-0.39 and "very weak" for r s < 0.2. We calculated p-values and used the Bonferonni correction to account for the many correlations, with p-values below 0.0005 considered statistically significant.

Regan et al.
Do End-of-Rotation and End-of-Shift Assessments Inform CCC Decisions?

RESULTS
A total of 5,234 assessments were completed over 24 months. The EORd accounted for 1,330 assessments, the EORn accounted for 509, and the EOS accounted for 3,395. Table 1 presents the annual completion rates by each assessment type by resident year. Spearman's rank order correlations between the EOS and EOR assessments are reported in Table 2. Please note that each is aggregated and reported twice a year (December and May) and hence the designation of the month initial and year. For example, EOS.M14 indicates the EOS assessment for May 2014. Furthermore, the EOR assessments were reported separately for physicians and nurses, hence the designation end-of-rotation by doctor (EORd) and end-of-rotation by nurse (EORn).
As demonstrated in Table 2, the EOS and EOR assessments did not have strong correlations, with values ranging from -0.17 to 0.65. Taken within each corresponding timeframe (December or May of the same year), the correlations tended to be better overall. EOS assessments were more strongly correlated with EOR assessments performed by physicians as compared to those performed by nurses. The range of correlations between EOS and EOR performed by nursing was -0.17 to 0.54, while the range of correlations between EOS and EOR performed by physicians was 0.01 to 0.65. Table 3 shows the correlations between the EOR assessments performed by nurses and those performed by physicians.
The final assigned level of proficiency for each subcompetency (designated as CCC.XXX with the same month and year designation as above) is found to be best correlated with EOR assessments performed by physicians for that particular period (  Table 5 across time, each CCC level of proficiency is very strongly correlated with the assigned CCC proficiency level in the previous time period. For example, the final CCC proficiency level from May of 2015 (CCC.M15) was very highly correlated with the final CCC proficiency level from the previous December in 2014 (CCC. D14) with a value of 0.92. In particular, CCC levels are highly correlated within a given academic year, somewhat less so across academic years, with diminishing association over time.
Across post-graduate year levels (PGY) 1 through 4, we noticed that correlations between the CCC proficiency levels and EORd by physicians were the highest (range 0.74-0.85), compared to CCC proficiency levels correlated with EOS and EORn (Table 5). P-values are less than 0.00001 unless otherwise indicated.
Looking at correlations across various subcompetencies in Table 6, we noted that whenever multiple data sources (EORd, EOS, and EORn) were used to assess an individual subcompetency, the correlation for the CCC proficiency levels across all of these subcompetencies was highest with the EORd compared to the two other data sources. We also noted that the correlations between CCC level of proficiency and EOR assessments by nurses are moderately strong in the four applicable subcompetencies that were chosen with r s =0.66, 0.71, 0.65, and 0.57 for multi-tasking, patient-centered communication, team management and professional values (compassion, integrity), respectively.

DISCUSSION
The development and use of assessment tools for trainee assessment is a critical function of all residency training programs. The development of formal CCCs forced programs to re-evaluate their assessment methods and to determine whether

Regan et al.
Do End-of-Rotation and End-of-Shift Assessments Inform CCC Decisions?
the information being collected was both reliable and valid for use in the determination of proficiency levels for residents, at each stage of training.

Predictors of Final Recommended CCC Proficiency Level by Assessment Type
While many EM residency programs, including ours, use the EOS assessments that are publicly available via the Council of Residency Emergency Medicine Residency Directors (CORD-EM) website, 6 the literature calls into question the use of this type of assessment. Warrington et al. (the original developers of the forms available on the CORD-EM site) published results noting only slight to fair inter-rater agreement in a video-based study in which educators at a national conference scored a "resident encounter" using the EOS form. 7 Another study of EOS assessments, although completed electronically, is described by Dehon et al. in the literature and reports that their EOS assessments in EM yielded inflated proficiency levels when used in isolation and when compared to the final CCC recommended proficiency level. 4 Our findings corroborate this notion, as we found that EOS assessments were not strongly correlated with final CCC proficiency levels, yielding significantly inflated proficiency levels when compared to the final rankings.
In our study, what mattered most for the final recommended proficiency level by the CCC was the EOR assessment performed by doctors (EORd) for that particular immediate six-month period preceding the assessment, as well as the preceding six months. This correlation spanned across each PGY level, with EORd consistently having the strongest correlation in comparison to EOS or to EOR assessments completed by nurses (EORn). Over time, the strongest correlation of the final recommended proficiency level was found to be the immediate preceding proficiency level assigned by the CCC. In our CCC meetings, previous proficiency levels were available both during pre-review of the resident data, as well as during the discussion of current assignments. Given this finding, it may be prudent to withhold this information in future meetings to see whether or not the CCC members are biased by prior data.
In discussing the weak correlation between the final CCCassigned proficiency levels and EOS assessments, Dehon et al. commented that their overestimation was likely related to a lack of "No" responses by faculty and re-calculated proficiency levels after including "N/A" as a "No" response, 4 which allowed for a slightly increased differentiation across PGY level. At our program, we also noticed a paucity of "No" replies. This was thought to be related to faculty concern regarding the stigma associated with "No," especially in that EOS assessments were suggested for use as a discussion point with the residents at the end of the shift. Therefore, we chose to modify our answer scale to non-dichotomous choices, allowing for a "Progressing" option, placed between a newly titled "Consistently Demonstrating" to replace "Yes" and "No," which was replaced with an "Emerging" option. We chose "Emerging" as an attempt to remove the stigma associated with "No." We allowed an "NA" option. Unlike Dehon, our rate of "No" or "Emerging" was unchanged (average rate 1.5%; range 0.6% -2.4%), with few faculty choosing this option regardless of the terminology used to describe it. We did, however, note a significant decrease in both the use of the "N/A" option, as well as in "Yes" or the newly titled "Consistently Demonstrating," with an average usage of "Consistently Demonstrating" of 83.1% compared to 96.7% of "Yes" in the first year of the program. The "Progressing" option is responsible for the entirety of this difference. Despite this change, we noted no increase in the correlation of the EOS assessments with the final CCC proficiency level.
In evaluating EOR assessments, Kuo et al. 8 described the use of a milestone-based evaluation system in a surgery residency program in which global assessments using selected subcompetencies were sent out at the end of resident rotations. The authors found that EOR assessments yielded an increased distribution of possible scores across PGY levels, with evaluators using a wider range of the scale, including the lower proficiency levels. This was compared to their traditional Likert scale assessments, in which the median composite PGY1 score was 3.63 on a 1-4 scale, in comparison to 1.88 (proficiency levels 1-4) in their new milestone-based system.
Similar to the findings of Kuo et al., our study demonstrated that our program's EOR assessments, namely by doctors, reflected an increased distribution of scores, perhaps reflected in their higher correlations seen with our EOR and CCC proficiency levels. It is possible that the CCC may have found the EORd assessment to be more credible than other assessments and was biased towards considering these results more favorably. However, given the summative nature of  both a global rating form and the milestones, it is perhaps not surprising that this is where we found the highest correlation.

Assessment Tools Inter-Correlations
In addition to not correlating well with the CCC proficiency levels, we also found that the EOS assessments did not correlate well with their counterpart EOR assessments when compared by subcompetency. As our newly implemented evaluation program progressed, and perhaps due to continued re-education to nursing about the non-Likert scale of proficiency levels, EORn and EORd were more in line with each other. However, the EORn assessments continually yielded a more inflated overall score for residents than EORd. We found that nurses were highly resistant to assigning lower proficiency levels, even to PGY1 residents at the onset of the program. While our re-education did yield slightly lower overall scores on the whole, EORn assessments continued to rate residents quite higher on the proficiency scale.
In general, the EORn assessment scores were felt to not be useful to CCC members in deciding on their final proficiency scores; however, all members felt the descriptive comments provided by nursing staff were invaluable in finding items for improvement and commendation. Given the Accreditation Council for Graduate Medical Education (ACGME) requirement for multiple assessors, 8 it may be prudent to use feedback from nurses for more formative feedback, as opposed to the EORn assessments used in this initial version of our program.

Correlations by Subcompetency
Our study found that whenever multiple data sources (EORd, EOS, and EORn) were used to assess an individual subcompetency, the correlation for the CCC proficiency levels across all of these subcompetencies was also highest with the EORd compared to the two other data sources.
EOS assessments had the highest correlation with final CCC proficiency levels in milestones from PC3 (Diagnostic Studies) and PC7 (Disposition), while the lowest correlations were seen in those from SBP1 (Patient Safety) and PROF2 (Accountability). There were no strong correlations for either of the Interpersonal and Communication subcompetencies (Patient Communication or Team Management), nor either of the Professionalism subcompetencies between EOS assessments and final CCC proficiency levels. We found this particular weak correlation surprising, given that direct observation should provide the best opportunity for accurate assessments of skills such as communication and professionalism. We suspect that the variety of a resident's clinical encounters during any given shift may contribute to these data. Due to this finding, we advocate that EOS assessments be used cautiously as individual data points reflecting a "snapshot" of competence and not representative of a trainee's global assessment, to ensure the data provided can capture multiple encounter opportunities.

LIMITATIONS
We collected our data at a single site using two main assessment tools. While the CCC had an increased number of data points available for use, it is possible that the format used by our CCC is not generalizable to other institutions. In addition, the EOS is a paper tool, which is not ideal. However,

Regan et al.
Do End-of-Rotation and End-of-Shift Assessments Inform CCC Decisions?
we believe it is feasible to sustain use of the instrument as a paper tool if desired, as we have been using it now for over three years. Ideally, the tool would become an electronic assessment that would be completed in real time. We cannot infer how this would change the utility of the tool or its correlation to CCC levels.
In some instances, individual residents may have limited assessment data. Over the PGY1 year, our interns spend less than half of their year on ED rotations and some may have had minimal exposure during each individual six-month time period. Due to this variable pattern of resident schedules, as well as the small number of expected assessments over a single experience, we did not compare assessment data month to month, but rather over six-month periods. We felt this was not a significant limitation, given the data is being used for CCC discussions, which occur only every six months. Similarly, overall nursing data collected contributed to the smallest percentage of our individual assessment tools. However, we believe nursing assessments are an important component for trainee assessment, given the ACGME's requirement for multisource assessments by multiple evaluators, including professional staff.
Lastly, as residents are allowed to select faculty for the EORd assessments, it is possible that this self-selection has skewed our data. We did, however, note that our most "critical" faculty were frequently chosen and believe residents selected a wide variety of assessors over time. Any faculty is able to trigger and complete an assessment at any time in the electronic system.

CONCLUSION
In our single center study of assessing EM residents' milestone proficiency, the end-of-rotation (EORd) assessments completed by supervising physicians (attendings and senior residents) are the most highly correlated with the final CCC proficiency level designation, while end-ofshift (EOS) assessments and end-of-rotation assessments by nurses (EORn) did not correlate well with final CCC proficiency levels. Every level of proficiency the CCC assigned appears to be highly correlated with the designated level in the immediate six-month period, perhaps implying CCC members are biased by previous level assignments. Based on our study, we advocate that EOS assessments be used cautiously as individual data points reflecting a "snapshot" of competence and not representative of a trainee's global performance. Further studies are needed to determine the utility of the EOS for CCC use, and the effect of blinding of prior CCC-assigned proficiency levels on current proficiency level designation.