The subcomponents of affect scale (SAS): validating a widely used affect scale

Abstract Objective There is a need for a brief affect scale that also encompasses different components of affect relevant for researchers interested in physiological and health outcomes. The Subcomponents of Affect Scale (SAS) meets this need. This 18-item scale has nine positive and nine negative affect items encompassing six subscales (calm, well-being, vigour, depression, anxiety, anger). Previous research using the SAS has demonstrated its predictive validity, but no work has tested its subscale structure or longitudinal validity. Design Data from the Common Cold Project in which individuals (N = 610) completed the SAS over the course of seven days were used. Results Confirmatory factor analysis demonstrated the reliability of the subscale structure of the SAS across seven days (positive affect subscale structure: CFIs ≥ 0.98; negative affect subscale structure: CFIs ≥ 0.94 with day 6 CFI = 0.91) and tests of factorial invariance showed the scale is valid to use over time. Conclusions These results confirm the psychometric validity of the subscale structure of the SAS and imply that the subscales can be used longitudinally, allowing for its use in health research as well as non-health research that can benefit from its subscale structure and longitudinal capabilities.

popular method is self-report. This involves asking participants how frequently or intensely they feel specific emotions (e.g. happiness, anger), then aggregating items to create positive and negative affect subscales.
Although there are a number of valid and reliable affect scales (e.g. the Profile of Mood States [POMS]; McNair et al., 1971, the Positive and Negative Affect Schedule [PANAS]; Watson et al., 1988), most have limitations when it comes to use in health research. First, health researchers often need to rely on scales that are brief due to the time intensive nature of their study procedures. Participants often have to complete invasive medical/physical examinations (e.g. Kubzansky et al., 2006;Shirom et al., 2010), long questionnaires and symptom checklists assessing other health-relevant variables (e.g. Glejsted Ingstrup et al., 2012;Middleboe et al., 1992), and multiple surveys over time (e.g. Czyz et al., 2019;Mustanski, 2007). For epidemiologic investigations that often contain samples with thousands of participants, the cost of adding just one item to a survey can be substantial (e.g. Montana Department of Public health and human Services, 2018). Further, participants often belong to sensitive populations with limited availability. A second limitation is that many instruments are not designed to capture affective subcomponents that are relevant to health outcomes of interest. There is increasing evidence of distinct emotions and affective arousal levels differentially influencing health outcomes (e.g. Kubzansky et al., 2006;Suls & Bunde, 2005). hence, a scale representing the key emotions used in health research provides a valuable tool.
A measure that meets the needs of health researchers and has been utilised frequently is the Subcomponents of Affect Scale (SAS; Cohen et al., 2003). The SAS was originally constructed by Cohen et al. (2003) based on a factor analysis of 65 emotion adjectives (Usala & hertzog, 1989), 38 of which were from the POMS (McNair et al., 1971) and 27 of which were from the Pregnancy Mood Checklist (Lebo & Nesselroade, 1978). Cohen et al. (2003) selected 15 items (and added three more), representing a range of affective categories central to theories and empirical work on health. The highest order categories were positive and negative affect. These were further divided into three subscales within each valence: calm, well-being, and vigour under positive affect and depression, anxiety, and anger under negative affect. Both valences included two higher arousal (high body activation; vigour and well-being; anxiety and anger) subscales and one lower arousal (low body activation; calm; depression) subscale.
The SAS has been used in a variety of health studies (e.g. Jenkins et al., 2018) that support its predictive validity in connection with a number of health outcomes. Investigations using the overall negative affect scale find that higher negative affect is associated with greater flu/cold symptoms (Cohen et al., 2003(Cohen et al., , 2006, higher disease severity in individuals with Type 2 diabetes (Sultan & Fisher, 2010), and lower sleep quality (Lillis et al., 2018). Similarly, studies using the overall positive affect scale find associations in the expected directions, with greater positive affect being associated with better sleep quality (Lillis et al., 2018) and more physical activity (Poole et al., 2011). Further, studies utilising the subscales find more nuanced effects. For example, while work has shown that higher scores on positive affect are associated with greater physical activity (Poole et al., 2011), it seems to be the positive affect subscale of calm that most drives this association (Aggio et al., 2017). Similarly, previous work found no association between scores on negative affect and physical activity (Poole et al., 2011), but when researchers divided negative affect into subscales, lower depression was associated with greater physical activity (Aggio et al., 2017). Interestingly, subscales of the SAS within a single valence often demonstrate differential effects. For example, individuals with higher vigour and/or well-being were less likely to develop an objectively measured cold, but individuals lower in calm reported more symptoms (Cohen et al., 2003(Cohen et al., , 2006. Taken together, this body of work demonstrates the predictive validity of the SAS and suggests that the SAS and its subscales uncover nuanced affect and health associations. however, no investigation has tested the subscale structure of the SAS to confirm that the subscales represent unique constructs. Additionally, while the SAS has been used in longitudinal investigations (e.g. Cohen et al., 2003), no study has confirmed that the subscales have the same structure over time, an important requirement for testing how affect impacts health longitudinally.
Therefore, the first goal of this study is to test the subscale structure of the SAS using confirmatory factor analysis. A confirmatory factor analysis of the subscale structure would provide evidence toward the construct validity of the SAS. Confirmatory factor analysis is an ideal approach as this structural equation modelling framework allows for a priori predictions about the measurement model based on theory to be tested. Therefore, we can specify the subscale structure of the SAS to test how closely the specified model accounts for the data. This approach is preferred over other common techniques such as exploratory factor analysis in which no a priori hypotheses can be evaluated. A second goal of this paper is to determine the stability of the subscale structure of the SAS over time. One of the strengths of the SAS is that its brevity provides the opportunity for researchers to use the measure in longitudinal investigations. Does the structure hold across days? Is the strength of the association of each item with its factor similar (or equal) across days? These questions can be answered by using confirmatory factor analysis to test the factorial invariance of the SAS. Confirming the subscale structure and testing the factorial invariance of the SAS will supply needed evidence of its validity in affective-health science research, and provide researchers in several fields with a highly useful scale.

Methods
Data from the Common Cold Project (data are openly available at www.commoncoldproject.com; grant number NCCIh AT006694; Laboratory for the Study of Stress, Immunity, and Disease, 2016) were accessed to examine the subscale structure of the SAS and test its validity longitudinally. Specifically, data from 276 individuals who participated in the Pittsburgh Cold Study 1 (conducted from 1993 to 1996) and 334 individuals who participated in the Pittsburgh Cold Study 2 (conducted from 1997 to 2001) in which affect data from the SAS were collected for seven consecutive days from a total of 610 individuals were used. The 610 participants in this data set had a mean age of 28.98 years (SD = 9.82; range 18 to 55) and 53% were female (see Table 1 for more demographic information).

Procedure
Participants for both studies were recruited through newspaper advertisements in the Pittsburgh, Pennsylvania area. Due to the intensive nature of data collection, participants were compensated with $800. The Pittsburgh Cold Studies 1 and 2 had parallel designs. They were both viral challenge investigations in which participants were quarantined for seven days (six nights) and were exposed to a cold virus (rhinovirus 39 (N = 147) or rhinovirus 21 (N = 129) in Pittsburgh Cold Study 1; rhinovirus 39 (N = 228) or rhinovirus 23 (N = 106) in Pittsburgh Cold Study 2) in the evening before the second night of quarantine (i.e. Day 2). In both studies, participants completed the SAS the evening before the first night of quarantine 1 (referred to from now on as Day 1) and then in the evening on the next 6 days of quarantine (referred to as Days 2 through 7). Thus, this study design allows for a strict test of the SAS in a health-relevant context, providing an excellent opportunity to test the SAS over time. Data from both studies were collected by the Laboratory for the Study of Stress, Immunity, and Disease at Carnegie Mellon University under the directorship of Sheldon Cohen, PhD. The Common Cold Project (www.commoncoldproject.com) was designed to meticulously combine data from these two studies to allow for analysis of common variables aggregated across the studies (as done previously; e.g. Janicki-Deverts et al., 2016;Prather et al., 2017;Sneed et al., 2012). Data are publicly available on the Common Cold Project website. All participants completed informed consent.

SAS
The Subcomponents of Affect Scale (SAS) is not named in past publications and is referred to as the State Adjective Questionnaire (18-item version) on the Common Cold Project website. In order to be more descriptive as to the nature of the scale, we have renamed it the Subcomponents of Affect Scale. The SAS is composed of 18 items, nine positive and nine negative affect adjectives. The nine positive affect adjectives are divided across three subscales: calm (items: calm, at ease, relaxed), well-being (items: happy, cheerful, pleased), and vigour (items: full of pep, lively, energetic). The nine negative affect adjectives are similarly divided across three subscales: depression (items: sad, unhappy, depressed), anxiety (items: on edge, tense, nervous), and anger (items: hostile, angry, resentful). Participants are asked to rate on a scale from 0 (not at all accurate) to 4 (extremely accurate) the extent to which each adjective accurately describes how they felt within a certain time range (e.g. past 24 hours, past hour, current moment). Data used in the present manuscript were collected at 5:30pm each night during quarantine using paper and pencil; participants were instructed to reflect on emotions over the past day with response options ranging from 'haven't felt that way at all since getting up/not at all accurate' (0) to 'felt that way a lot since getting up/extremely accurate' (4). For the current data, the estimated within-day reliability coefficients 2 for the set of nine positive affect adjectives ranged from 0.92 to 0.93 for each of the seven days (ω Mean = 0.93). The within-day reliability coefficients for the set of nine negative affect adjectives ranged from 0.82 to 0.87 for each of the seven days (ω Mean = 0.84). The within-day reliability coefficients across the seven days for the subscales were: vigour (ω Mean = 0.93, ω Range = 0.91, 0.94), well-being (ω Mean = 0.90, ω Range = 0.87, 0.92), calm (ω Mean = 0.85, ω Range = 0.82, 0.87), depression (ω Mean = 0.82, ω Range = 0.70, 0.90), anger (ω Mean = 0.82, ω Range = 0.77, 0.87), and anxiety (ω Mean = 0.63, ω Range = 0.51, 0.74).

Subscale structure
Confirmatory factor analysis was used to examine the subscale structure of the SAS on each of the seven days of data collection using Stata 15 (StataCorp, 2017). First, a positive affect model (Positive Affect Model 1) was built which included one latent variable of positive affect and each of the nine positive affect items as endogenous observed variables. Second, a model with three latent variables (calm, well-being, and vigour) and their corresponding subscale items serving as endogenous observed variables was tested (Model 2). This model was iteratively improved using modification indices to help guide theoretically justified alterations (Acock, 2013; see supplemental online material for detailed description of modification indices), resulting in Positive Affect Models 2a, 2b, 2c, and sometimes 2d. Following theory, we selected changes that specified correlations of measurement residuals within subscales (e.g. correlating the measurement residuals of calm and relaxed) over loading subscale items onto other latent variables (e.g. loading the item at ease onto the latent variable vigour), even when the latter might have had a higher modification index. The models with subscales also allowed us to examine how correlated latent variables were (e.g. correlation between calm and well-being).
Next, this same process was conducted for the negative affect items (i.e. one overall negative affect latent variable [Negative Affect Model 1] was followed by a model including the three subscale latent variables [Negative Affect Model 2] with iterative improvements [Negative Affect Models 2a, 2 b, …]). Lastly, the final positive affect model and negative affect model for each day were merged into one confirmatory factor analysis model (Positive and Negative Affect Model 3) and iterative adjustments were made using the same modification index strategy described above (sometimes resulting in a Positive and Negative Affect Model 3a). Of note, all models testing the subscale structure are in standardised form for ease of interpretation. Additionally, while improving upon the model fit of both the Positive Affect Model 1 (i.e. positive affect model with only one latent variable) and Negative Affect Model 1 (i.e. negative affect model with only one latent variable) was not our main goal, we did examine modification indices to determine if improvements in the models could be made given that researchers sometimes use the overall positive and/or negative affect scores.
Model fit was evaluated using chi-square, with non-significant values reflecting good fit (see supplemental material for detailed description of chi-square). Since chi-square is overly powered with large samples, we also used goodness of fit tests as recommended by Kline (2015) with the following cutoffs for guidance in model selection: CFI > 0.90 (Bentler, 1990), RMSeA < .08 (Browne & Cudeck, 1993), lower bound of 90% confidence interval of RMSeA below 0.05 and upper bound below 0.10, SRMR < .08, and coefficient of determination (CD) approaching 1. Finally, AIC (Akaike, 1974) and BIC (Schwarz, 1978) were also used to compare nested models, with smaller values reflecting better fit.
Throughout our results, we report 95% confidence intervals when appropriate. Missing data were low and varied across days: Day 1 had 10% missing data, Days 2 through 5 each had 1% missing data, and Days 6 and 7 each had 2% missing data. 3 Confirmatory factor analysis was conducted using the sem command in Stata which follows listwise deletion. Sample sizes are reported throughout our analyses. Sample size requirements in confirmatory factor analysis increase when the number of factors tested is higher, there are fewer endogenous observed variables, the strength of factor loadings is lower, and correlations between factors are weaker. For models in which we had 6 latent variables with 3 overserved variables each (for a total of 18 variables) and factor loadings of 0.5, we conservatively estimated our required sample size at 460 based on simulation analysis in previous work (Wolf et al., 2013). Sample sizes in the analysed data set varied among days, with the smallest sample size at 507. This provided us with sufficient power across all days. As our models were not predicting outcomes, effect sizes were not produced.

Factorial invariance
When testing whether the general subscale structure of the SAS holds across time (i.e. testing for factorial invariance), we elected to test a model that reflected the most common model form across days but still had strong goodness of fit test values. Using this model, we first tested for configural invariance by specifying a model in which the same group of items loaded on each of the latent variables (same configuration/form) across the days (see supplemental online material for detailed description of factorial invariance). Numerical values of the loadings can be different across days, but if all were significant, we achieved configural invariance. Next, to test for metric invariance, we constrained the factor loadings for each item across days to be equal to one another, which would allow us to conclude that the meaning of the emotion adjectives was invariant over time. Finally, to test scalar invariance, we additionally restricted the intercepts of each of the items on their latent variable to be equal across days. At each step, the model was compared to the previous model using the likelihood ratio chi-squared test. It is important to note that metric invariance is often the model most researchers are satisfied with (Acock, 2013). Scalar invariance is rarely achieved in real data; however, we test it here for completeness.

Positive affect
The positive affect items were first entered into a confirmatory factor analysis model with only the latent variable positive affect for each of the seven days. The initial fit of this model did not reach normative cutoffs for each of the days (Day 1: χ 2 (27) = 582.58, p < .001, CFI = 0.84; see Table 2 Positive Affect Model 1; for all other days see Appendix A Tables A1-A6). Upon examining the modification indices, covariances among the error terms of scale items could be included in the model to obtain CFIs above 0.95 for each of the seven days.
We next ran a model including the subscales. For the first day of data, the model had poor fit, χ 2 (27) = 884.94, p < .001, CFI = 0.76 (see Table 2 Positive Affect Model 2; all other days similarly had poor fit [see Appendix A Tables A1-A6]). however, after examining the modification indices and iteratively including covariances among the latent subscales, all CFIs were at or above 0.98, indicating strong fit and other goodness of fit tests similarly met their recommended cutoffs (see Table 2 and Appendix A [Tables A1-A6] Positive Affect Models 2a, 2 b, and 2c). The iterative changes, as indicated by the largest modification index, were identical across days and had the following order: covariance between latent subscales well-being and vigour (Positive Affect Model 2a), covariance between subscales well-being and calm (Positive Affect Model 2 b), covariance between subscales vigour and calm (Positive Affect Model 2c). Finally, modification indices of Positive Affect Model 2c suggested including covariances among error terms of the scale items calm and relaxed (days 1, 3, 4, 5, 6) and cheerful and pleased (day 7; see Figure 1; see Positive Affect Model 2d in Table 2 and Appendix A Tables A1-A6). These additions improved the model fit. Also, the correlations among all latent variables were significant (see Figure 1) with a common pattern across all days such that the strongest correlation was always between well-being and vigour (e.g. Day 1 covariance = 0.89, p < .001, 95% CI [0.86, 0.91]), followed by the correlation among well-being and calm (e.g. Day 1 covariance = 0.73, p < .001, 95% CI [0.67, 0.79]), while the covariance between vigour and calm was the least strongly correlated (e.g. Day 1 covariance = 0.64, p < .001, 95% CI [0.58, 0.71]).

Negative affect
The negative affect items were first entered into a confirmatory factor analysis model with only the latent variable negative affect for each of the seven days. The initial fit of this model did not reach normative cutoffs for each of the days (Day 1: χ 2 (27) = 463.41, p < .001, CFI = 0.80; see Table 2 Negative Affect Model 1; for all other days see Appendix A Tables A1-A6). Upon examining the modification indices, covariances among the error terms of scale items could be included in the model to obtain CFIs at or above 0.94.
We next ran a model including the subscales. For the first day of data, the model had poor fit, χ 2 (27) = 483.31, p < .001, CFI = 0.79 (see Table 2 Negative Affect Model 2; all other days similarly had poor fit [see Appendix A Tables A1-A6]). however, after examining the modification indices and iteratively including covariances among the latent subscales, all CFIs were at or above 0.91, indicating strong fit and other goodness of fit tests similarly met their recommended cutoffs (see Table 2 and Appendix A Tables A1-A6 Negative Affect Models 2a, 2 b, and 2c). While the order of iterative changes of adding the covariances was not the same across days, the final Negative Affect Model 2c for all seven days always ended with an identical pattern which included covariances among the three latent subscales.
Finally, modification indices of Negative Affect Model 2c suggested including covariances among error terms of scale items (see Figure 2; see Table 2 Negative Affect Model 2d; see Appendix A Tables A1-A6 Negative Affect Model 2d and sometimes 2e and 2f ). These additions improved model fit. Also, the correlations among all latent variables were significant (e.g. Day 1 covariance between latent subscales depression and anxiety = 0.68, p < .001, 95% CI[0.62, 0.75]; Day 1 covariance between latent

Positive and negative affect model
Last, we combined the separate positive and negative affect final models described above to create a full Positive and Negative Affect Model. For the first day of data, the model had a strong fit, χ 2 (127) = 340.43, p < .001, CFI = 0.96 (see Figure 3; see Table 2 Positive and Negative Affect Model 3). Similarly, the CFIs for all the other days for the Positive and Negative Affect Model 3 were at or above 0.96 and other   Table 2 Positive and Negative Affect Model 3a), and modification indices for Day 7 indicated that covariances of error terms of items sad and depressed could be included in the model, χ 2 (126) = 396.06, p < .001, CFI = 0.96. While combining the positive and negative affect models into one overall model in this section did have warrant for testing the validity of the SAS in its entirety, we should note that combining them did not drastically improve the model fit. For example, the CFIs for the Positive and Negative Affect Model 3 were 0.96 or 0.97 for each of the seven days. By comparison, the CFIs for the Positive Affect Model 2c were 0.98 or 0.99 for each of the seven days. In essence, the CFIs actually dropped when the positive affect model was combined with the negative affect model. Still, even the Negative Affect Model 2c CFIs were in the 0.94 to 0.97 range with the exception of day 6, which had a CFI of 0.91. Therefore, in testing the factorial invariance of the SAS (see Measurement Model Over Time section below), we test the positive affect model and the negative affect model separately and report the combined model in the Appendix for completeness (Appendix A Table A7). Given that the Positive Affect Model 2c and Negative Affect Model 2c were common across all days and had moderate to strong goodness of fit indices, we selected these models as the final ones to use when testing the factorial invariance across days.

Positive affect
The configural model resulted in fairly good fit and all the loadings were significant and of the same form, χ 2 (1,617) = 3,829.09, p < .001, CFI = 0.94 (see Table 3). The metric model also resulted in a fairly good fit, χ 2 (1,653) = 3,879.35, p < .001, CFI = 0.94 (see Table 3). The chi-squared difference test between the configural and metric models was not significant, χ 2 (36) = 50.52, p = 0.058, suggesting that the metric model did not fit significantly worse than the configural model. however, the scalar model fit was significantly worse than the metric model (see Table 3; scalar vs. metric model: χ 2 (54) = 313.95, p < .001). Therefore, the model achieved metric invariance, allowing us to conclude that the factor loadings for the subscale structure of the positive affect subscale are equal over time. As mentioned previously, metric invariance is often what most researchers are satisfied with (Acock, 2013). Scalar invariance is often too restrictive, but we tested it here for completeness.

Negative affect
While the model specifying configural invariance did not meet the 0.90 cutoff for CFI, the RMSeA value, RMSeA lower bound, and coefficient of determination did meet recommended levels (see Table 4). Furthermore, all the item loadings were significant and of the same form, suggesting that the scale did achieve configural invariance. Again, while the model specifying metric invariance did not meet the 0.90 cutoff for CFI, the RMSeA value, RMSeA lower bound, and coefficient of determination did meet recommended levels (see Table 4). however, the chi-squared difference test suggested the metric model fit significantly worse than the configural model, χ 2 (36) = 192.68, p < .001. Thus, we can only conclude that the negative affect subscale has the same configuration over time. We also tested the scalar invariance of the negative affect subscale structure, but this model was a significantly worse fit than the configural invariance model (see Table 4).

Discussion
The current study aimed to confirm the subscale dimensionality of the SAS (Cohen et al., 2003) and validate its psychometric properties for use in affect-health research. Confirmatory factor analysis supported the three-factor structure of positive affect as being comprised of vigour, well-being, and calm, and the three-factor dimensionality of negative affect, as represented by the subscales of anger, anxiety, and depression. Tests of measurement invariance across a seven-day time interval supported the validity of the measure for examining changes in affect over time. Further, the SAS had acceptable within-day reliability for the overall scales and subscales. The reliability was particularly strong for positive affect.
The three-factor structure of positive and negative affect has important implications. Although a clear link between general affect and health has been established (Skaff et al., 2009), researchers have advocated for the importance of considering how discrete emotions are differentially related to health outcomes (Consedine & Moskowitz, 2007;Suls & Bunde, 2005). With regard to positive emotion, the issue of conceptualising affect arousal has received considerable attention Pressman et al., 2019). Specifically, emotional experience can take the form of high arousal (e.g. vigour), mid-arousal (e.g. well-being), or low arousal (e.g. calm). Good evidence suggests that arousal level may differentially predict health-relevant outcomes (e.g. Pressman et al., 2017). In some investigations, vigour (high arousal positive affect) has been shown to have beneficial health effects such as increased longevity (Pressman & Cohen,   2012) and lower rates of illness (Cohen et al., 2003(Cohen et al., , 2006, but other studies suggest that high arousal positive emotionality is associated with risk of cardiovascular dysfunction (Armon et al., 2014). As such, correlations between positive affect subscales were evaluated in the current investigation to give light to possible differences in levels of affective arousal. In line with the theory that vigour may be high arousal and calm may be low arousal, with well-being being more mid-arousal, the correlation between calm and vigour (covariance = .64; see Figure 1) was much weaker than that of the association between well-being and the other subscales. This demonstrates that well-being may be more 'in the middle' and, thus, more strongly associated with the other arousal levels. however, well-being was consistently more strongly correlated with vigour (covariance = .89) than with calm (covariance = .73), suggesting that well-being may be a little more distinct from the concept of calm as compared to the concept of vigour.
Of note, we could not infer affective arousal differences in the subscales of the negative affect scale as cleanly because the correlations between measures of negative affect were similar across all subscales (ranging from .62 to .68; see Figure 2) across all days. This does seem surprising, as anger and anxiety are typically conceptualised as arousing states with depression being characterised as a lower arousal state. Nevertheless, the subscales of negative affect still represent important health-relevant domains. For example, anger is strongly linked with the development of angina (i.e. chest pain), whereas anxiety is related to the development of both fatal and nonfatal myocardial infarction (Kubzansky et al., 2006). Thus, each specific negative affect subscale is likely to continue to predict different health patterns and outcomes as seen in previous uses of the SAS (e.g. Aggio et al., 2017).
While the subscales do provide utility, some researchers may still opt to use the aggregate positive or negative affect subscales as has been done in past research (e.g. Cohen et al., 2003Cohen et al., , 2006Jenkins et al., 2018;Lillis et al., 2018;Poole et al., 2011;Sultan & Fisher, 2010). The findings suggest that the items from the SAS do load on to two unique factors (negative and positive affect). For completeness, the overall positive and negative affect subscales were examined and the model fit of aggregate positive and negative affect was poor prior to covarying error terms. As a result, researchers might consider employing a structural equation modelling framework so as to allow error terms to be correlated.
Temporal equality is an important assumption for longitudinal research in the study of affective health science (Meredith, 1993;Putnick & Bornstein, 2016), as health researchers are often examining change in health and health-relevant constructs over time. In other words, scales that have temporal equality indicate that the measure is being interpreted similarly by participants over time. In the case of the SAS, this would mean that the meaning of each adjective would hold constant over time. While the participants in the current data set were expected to have changes in levels of affect given that they were subjected to a health-relevant and stressful situation, findings from the factorial invariance tests supported that participants still interpreted items the same over time even as they potentially got sick, distressed, or homesick as the seven-day quarantine period went on. This analysis demonstrates that both the positive and negative affect subscales could be used over time in a diverse sample. Positive affect exhibited configural and metric invariance, signifying that both the factor structure and loadings of each positive emotion adjective were equivalent throughout the seven days of measurement. Negative affect achieved configural invariance, indicating that the factor structure of the negative affect subscales was the same longitudinally. however, the strength of the factor loadings of the negative affect scale were time variant. In other words, while we can assume the same negative affect theoretical constructs are being measured across days, the relative importance of each emotion adjective over time may not be the same.
The lack of metric equivalence for negative affect is not surprising, as several studies have shown violations of temporal measurement invariance when examining changes in symptoms of depression (Fried et al., 2016;Uher et al., 2008;Wetherell et al., 2001). As such, changes in level of negative emotionality might influence the participant interpretation of each emotion adjective. Further, data used in the present analysis were collected while participants were quarantined in a hotel and exposed to a cold virus. This context provides a strict, if not overly conservative, assessment of the scale over time in an extreme environment that is of interest to health psychologists.
There are limitations of the current study. Participants were instructed to retrospectively estimate their emotions over the past day. Although retrospective self-report is the standard measurement paradigm for examining affect (Cohen et al., 2003(Cohen et al., , 2006Lac & Donaldson, 2018;Watson et al., 1988), this type of assessment can introduce recall bias. Future investigations might employ smartphones and other electronic devices to record self-reports of emotionality in real time (e.g. Sherman et al., 2015). It is also important to note that affect was not manipulated in this study; thus, we could not assess whether the SAS subscales were sensitive to experimental manipulations of mood. The sample of this study also consisted only of adults, with the majority of participants identifying as White (74%) and African American (23%), limiting the overall generalisability of the results to populations of other races or ethnicities. That said, this study also included diversity in age, education, and employment, improving generalisability in some ways.
Finally, the attentive reader will notice the weaker reliability of the anxiety subscale. A similar result has been observed in past studies of the short form of the POMS (Curran et al., 1995, Table 1). We recommend researchers particularly interested in anxiety consider increasing the number of anxiety adjectives (e.g. Watson & Clark, 1994), observations per day, or sample size to achieve a higher reliability. Despite this, all of the overall and other subscale scales demonstrated acceptable to strong reliability in our data. Given that the intended use of the SAS is for repeated measures health psychology research, where the researcher's choice is often to use a very short measure or none at all, these reliabilities (including that of anxiety) seem sufficient.
The influential role of global affect, as represented as aggregate emotional experiences comprised of specific emotional states, has been central in the study of health (Consedine & Moskowitz, 2007). The SAS (Cohen et al., 2003;Usala & hertzog, 1989) is one measurement tool that has been argued to capture both positive and negative affect with different subscales of each affective valence. Results of a confirmatory factor analysis supported the three-factor dimensionality of both positive and negative affect, with positive affect being composed of calm, well-being, and vigour, and negative affect being comprised of anger, anxiety, and depression. Further, analyses demonstrated support for the validity of studying changes in the affect subscales over time. Findings confirm the structure of the SAS and imply that the subscales can be used as a valid longitudinal tool in the study of affective health science.

Notes
1. 48 individuals from the Pittsburgh Cold Study 1 were not given the SAS on Day 1 and the rationale for this exclusion was not provided in the study records. 2. Because the results of our confirmatory factor analyses detected unequal factor loadings between items and some item error covariance (see results section), we chose to use the reliability estimator McDonald's Omega (ω) (McDonald, 1999) instead of Cronbach's alpha. McDonald's Omega is a robust estimator of reliability under these conditions (see supplemental online material for detailed description of Omega). In contrast, Cronbach's alpha may either systematically inflate or deflate reliability estimates (Zinbarg et al., 2005). 3. As a robustness check to ensure that there was minimal bias due to missing data, we conducted follow up analyses using Full Information Maximum Likelihood (FIML) to address missingness in our SeM models (enders, 2010). After using the option method (mlmv) to invoke FIML for estimation in Stata SeM models, we found that results were equivalent. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion. Note. cFI = comparative fit index; RMsea = root mean squared error of approximation; sRMR = standardised root mean squared residual; cD = coefficient of determination; aIc = akaike's information criterion; BIc = Bayesian information criterion.