Research into animal cognitive abilities is increasing quickly and often uses methods where behavioral performance on a task is assumed to represent variation in the underlying cognitive trait. However, because these methods rely on behavioral responses as a proxy for cognitive ability, it is important to validate that the task structure does, in fact, target the cognitive trait of interest rather than non-target cognitive, personality, or motivational traits (construct validity). Although it can be difficult, or impossible, to definitively assign performance to one cognitive trait, one way to validate that task structure is more likely to elicit performance based on the target cognitive trait is to assess the temporal and contextual repeatability of performance. In other words, individual performance is likely to represent an inherent trait when it is consistent across time and across similar or different tasks that theoretically test the same trait. Here, we assessed the temporal and contextual repeatability of performance on tasks intended to test the cognitive trait behavioral flexibility in great-tailed grackles (Quiscalus mexicanus). For temporal repeatability, we quantified the number of trials to form a color preference after each of multiple color reversals on a serial reversal learning task. For contextual repeatability, we then compared performance on the serial color reversal task to the latency to switch among solutions on each of two different multi-access boxes. We found that the number of trials to form a preference in reversal learning was repeatable across serial color reversals and the latency to switch a preference was repeatable across color reversal learning and the multi-access box contexts. This supports the idea that the reversal learning task structure elicits performance reflective of an inherent trait, and that reversal learning and solution switching on multi-access boxes similarly reflect the inherent trait of behavioral flexibility.