Genetic susceptibility to multiple sclerosis: interactions between conserved extended haplotypes of the MHC and other susceptibility regions

Background To study the accumulation of MS-risk resulting from different combinations of MS-associated conserved-extended-haplotypes (CEHs) of the MHC and three non-MHC “risk-haplotypes” nearby genes EOMES, ZFP36L1, and CLEC16A. Many haplotypes are MS-associated despite having population-frequencies exceeding the percentage of genetically-susceptible individuals. The basis of this frequency-disparity requires explanation. Methods The SNP-data from the WTCCC was phased at the MHC and three non-MHC susceptibility-regions. CEHs at the MHC were classified into five haplotype-groups: (HLA-DRB1*15:01 ~ DQB1*06:02 ~ a1)-containing (H +); extended-risk (ER); all-protective (AP); neutral (0); and the single-CEH (c1). MS-associations for different “risk-combinations” at the MHC and other non-MHC “risk-loci” and the appropriateness of additive and multiplicative risk-accumulation models were assessed. Results Different combinations of “risk-haplotypes” produce a final MS-risk closer to additive rather than multiplicative risk-models but neither model was consistent. Thus, (H +)-haplotypes had greater impact when combined with (0)-haplotypes than with (H +)-haplotypes, whereas, (H +)-haplotypes had greater impact when combined with a (c1)-haplotypes than with (0)-haplotypes. Similarly, risk-genotypes (0,H +), (c1,H +), (H + ,H +) and (0,c1) were additive with risks from non-MHC risk-loci, whereas risk-genotypes (ER,H +) and (AP,c1) were unaffected. Conclusions Genetic-susceptibility to MS is essential for MS to develop but actually developing MS depends heavily upon both an individual’s particular combination of “risk-haplotypes” and how these loci interact. Supplementary Information The online version contains supplementary material available at 10.1186/s12920-021-01018-6.


Introduction
The nature of susceptibility to multiple sclerosis (MS) is quite complex and involves both environmental and genetic factors [1][2][3][4]. Recently, considerable progress has been made in our understanding of the basis for "genetic-susceptibility" in MS. Thus, to date, over 200 common risk variants (located in diverse autosomal genomic regions) have been identified as being MSassociated by genome-wide association screens (GWAS) using large arrays of single nucleotide polymorphisms (SNPs) scattered throughout the genome [5][6][7][8][9][10][11][12][13][14]. Despite this recent explosion in the number of identified MS-associated regions, however, the association of MS susceptibility with certain alleles of the human leukocyte antigens (HLA) inside the major histocompatibility complex (MHC) has been known for decades [11,[15][16][17][18][19][20][21][22]. Also, the importance of these new observations to our understanding of "genetic-susceptibility" in MS is tempered by the fact that any single SNP is generally associated with more than one gene or with more than one allele of a single gene. Moreover, sometimes the presumptively associated (i.e., "candidate") genes are at a considerable genetic distance from the location of the SNP itself [13,14].
For example, we have recently identified an 11-SNP haplotype (a1), which spans 0.25 megabases (mb) of DNA surrounding the HLA-DRB1 gene on the short arm of chromosome 6, and which has the most significant association with MS of any SNP haplotype in the genome [23,24]. Moreover, 99% of these (a1) SNP haplotypes carry the HLA-DRB1*15:01~HLA-DQB1*06:02 haplotype and, conversely, 99% of these HLA-haplotypes carry the (a1) SNP haplotype. In the Welcome Trust Case Control Consortium (WTCCC) dataset, the odds ratio (OR) for an association the full HLA-DRB1*15:01~HLA-DQB1*06:02~a1 haplotype was 3.28 (p<<10 -300 ) and similar disease associations for portions of this haplotype have been consistently reported in many other studies from northern MS populations [11,[15][16][17][18][19][20][21][22]25]. Nevertheless, despite this extremely close association, and despite the fact that many of these 11 SNPs, individually, are highly associated both with this particular HLA-haplotype and with MS, for none of these individual SNPs is this association exclusive [26]. Thus, each of these SNPs is also found in association with other HLA-haplotypes [24,26]. Consequently, even with the large number of SNPs now identified as being MS-associated [13,14], any such association can only be viewed as simply tagging a relatively large genomic region; it cannot be used with confidence to identify any specific gene or to implicate any specific allele with respect to its role in causing, or contributing to, a "genetic-susceptibility" for MS.
Using data from the WTCCC, we recently reported that the MHC region was largely composed of a relatively small collection of highly conserved extended haplotypes (CEHs), stretching across all of the "classical" HLA genes (HLA-A, HLA-C, HLA-B, HLA-DRB1, and HLA-DQB1) -a distance spanning more than 2.7 mb of DNA [26]. As shown in File S2 (Supplemental Fig D), this same basic population structure is also found in numerous other widely separated human populations around the world [25]. These CEHs seem to be under a strong selection pressure, presumably based upon favorable biological properties of the complete haplotype [26]. Lastly, this population structure is unlikely to be the result of a linkage disequilibrium caused by the founder effects of a small population migrating out of Africa and radiating throughout Eurasia and the Americas. Rather, the marked divergence of the CEH composition both among and between these different human groups, including Africans (File S2; Tables S4a & S4b), indicates that this population structure must be due to selection.
Consequently, "genetic-susceptibility" to MS, at least in so far as it relates to the MHC, is not likely to be attributable to any specific HLA allele but, rather, seems to depend upon the nature of each CEH [26]. Nevertheless, because many CEHs seem to be selected simultaneously and because the exact composition of the selected CEHs seems to be so fluid between different populations, the actual fitness landscape for this selection must be extremely variable in space and/or time and the introduction of novel allelic combinations must occur quite frequently [26].
Indeed, in the mostly European WTCCC population, the most frequent (and, thus, the most highly selected) Class II haplotype is HLA-DRB1*15:01~HLA-DQB1*06:02~a1, which accounted for 12.4% of all Class II haplotypes present in the control population. Nevertheless, most (or all) of the CEHs, which contain this Class II HLA-haplotype (including those whose full CEH had only a single representation in the WTCCC), are associated with an increased MS-risk, although the magnitude of the association varies significantly among the different CEHs [25].
In the present manuscript, we explore these relationships and interactions between the different disease-associated CEHs in the MHC region and other "risk" haplotypes elsewhere in the genome, in order to shed light on the nature of "genetic-susceptibility" to MS. Before embarking, however, we develop two underlying theoretical considerations. First, we summarize what is currently known about the various epidemiological parameters, which are associated with "genetic-susceptibility" in MS, and, in particular, about those parameters potentially related to the role of the MHC and other loci in producing this susceptibility. Second, we consider the different risk models used in epidemiology to account for the accumulation of disease-risk, which is caused by the combination of one or more MS-risk factors in the same individual.

Epidemiological Parameters Associated with MS
Several epidemiological parameters, which are associated with MS pathogenesis, can be defined and estimated and the relationships between them established using published epidemiological data. The definitions for the model parameters are provided in Table 1. Thus, population parameters, which are directly observable in the population as a whole, can be used to estimate non-population parameters, which cannot be measured directly, but which are, nonetheless, of considerable theoretical interest (File S1). These estimates and these relationships, summarized here, are developed comprehensively in the S1 File. For example P(MS), which represents the life-time probability that an individual in the general population (Z) will develop MS can be estimated by three methods (File S1) based upon both the distribution of onset-ages for MS and the increased mortality experienced by MS patients [27][28][29][30][31][32][33], and each of these methods provides a remarkably consistent estimate for P(MS) in northern populations, which is ~0.3%.
Nevertheless, each of these methods estimates only the prevalence of "diagnosed" MS, which may underestimate the prevalence of "pathological" MS in the population (S1 File). For example, several autopsy studies [34][35][36][37] have reported that "pathological" MS in patients is found in approximately 0.1% of individuals who were "undiagnosed" (i.e., they were either minimally symptomatic or asymptomatic) during life. In addition, at present, many of these asymptomatic individuals can now be identified (in vivo) using modern imaging methods [38].
As a result, the actual life time-risk of MS in the population is likely to be somewhat higher that this 0.3% estimate, and possibly by as much as 50−100% (File S1).
Also, within the general population (Z), we can define a subset of so-called "geneticallysusceptible" individuals (G), such that every individual in this subset has a non-zero probability of developing MS (Table 1). All individuals who are not members of the (G) subset, therefore, are members of the so-called "non-susceptible" subset (G−).
In addition, we will define the term {P(MS | MZMS)} to represent the life-time probability of developing MS for an individual from a monozygotic (MZ) twin-ship, given the fact that their chance of getting MS and a small proportion -those individuals in the subset (G) -having a unique predisposition to MS (File S1).
Notable, also, is the fact that the vast majority of (H+)-carriers (<<20%) do not belong to the (G) subset. Therefore, at least with respect to the (H+) haplotype, "genetic-susceptibility" to MS must be the result of the combined effect of (H+) together with the effects of other genetic factors (File S1). By itself, the (H+) haplotype carries no MS-risk whatsoever. Finally, the impact of specific environmental events on the development of MS is also critical (File S1). Thus, in addition to being a genetic disease, MS is also an environmental disease. Both factors are necessary; neither alone is sufficient (File S1).

Relative Risk Models in MS
There are two basic epidemiological models for the accumulation of disease-risk (see File S2), which have been widely utilized -the so-called additive and multiplicative risk models [55][56][57][58][59]. Nevertheless, actual epidemiological circumstances often don't fall neatly into one model or the other. Indeed, the same basic probability model can be used to approximate either an additive or a multiplicative accumulation of risk [56]. The difference depends upon the definition of the term "no interaction" between the risk-factors [55][56][57][58][59]. In studies of the "genetic-susceptibility" to MS, multiplicative risk models have generally been utilized [60][61][62], although this choice may not be appropriate in all circumstances (see File S2).
Indeed, in practice, there are certain difficulties, which are encountered when trying to assess the appropriateness of either model. First, in a case-control studies (such as the WTCCC), because the incidence of the disease is not assessed (as it would be in a prospective cohort study), the actual RRs cannot be determined [63]. However, for a rare disease such as MS {e.g., where: P(MS)≈0.003}, the ORs and the RRs are almost identical [63] and, thus, can be used interchangeably.
Second, and more important, is the selection of an appropriate reference group for calculating the RRs (File S2). This choice will, necessarily, influence how well any observations fit into one or another of these risk models. As noted above and discussed further in the S2 File, the theoretical underpinnings for both the additive and multiplicative models arise from the same underlying probability assumptions [55][56][57][58], and are predicated on the notion that MS-risk for the different potential "risk-factors" is as great or greater than the "risk" in the reference group (File S2). This requires identifying the reference group with the lowest MS risk of any. Although the (G-) subset, by definition, has the lowest MS risk of any, this group (even if it could be identified) cannot be used as a reference because all RRs calculated with respect to it would, by definition, be either infinite or undefined. Indeed, the fact that, for MS, the (G-) subset is non-empty (see File S1), indicates that both of these "risk models" are, at a theoretical level, invalid for characterizing the accumulation of disease risk with an increasing number of diseaseassociated "risk" factors. Nevertheless, using a different reference group -i.e., one containing, at least, some members of the (G) subset − could be used to evaluate (approximately) whether either of these two models fits with the available data. In this circumstance, any subgroup consisting of only members of the (G-) subset would have an RR of zero. The subgroup with lowest risk of any that we identified in the WTCCC data was the (AP*) subset. Therefore, this group was used for the present analysis, despite the fact that a subgroup with an even smaller (non-zero) disease-risk seems likely to exist (File S2).
Each of these groups of CEHs seemed to be segregating independently and, in the control group, frequencies for each of the different combinations were, statistically, at their Hardy-Weinberg expectations.
The MHC allele HLA-A*02:01 has been previously reported to be protective [64].
Although the association between HLA-A*02:01-positive status and MS was also found, in the WTCCC, to be "protective" relative to the (0,0) MHC genotype (OR=0.69; p<10 -29 ), evidence from Tables S2 & S3 (S2 File) shows that any such association depends importantly upon the exact nature of the CEH on which this allele resides rather than upon the presence of the HLA-A*02:01 allele itself. Thus, the grouping used here (i.e., the AP group) seems more appropriate than the use of this allele in isolation.
The subset of individuals who don't carry any (H+), (ER), or (AP) CEHs at the MHC is referred to as the (0,0) MHC genotype. In Tables 2 & 3, all ORs are presented relative to this group. Each of the (H+) CEHs with 50 or more representations were significantly associated with MS-risk (File S2; Table S2), as were, collectively, (H+)-carrying CEHs with fewer than 50 representations in the WTCCC (Table 3). Moreover, also assessing, collectively, only those (H+)-carrying CEHs that a single representation in the WTCCC, the disease association is still statistically significant and of similar magnitude to other (H+)-carrying CEHs (i.e., OR=3.0; CI=2.7−3.4; p<10 −10 ). Consequently, the (H+)-haplotype, by itself, seems to contribute to the disease susceptibility in an individual although, as shown in File S2 (Table S2), the magnitude of this effect varies among different (H+)-carrying CEHs [25].
In addition, as in Table 4 (see legend), we defined different "risk" CEH combinations as: 1) "single copy risk" [1 copy of any (H+)-haplotype or any ER haplotype]; and: 2) "double copy were associated with a disease risk that was similar regardless of the underlying frequency of the different CEHs (Tables 2 & 3). However, the same was not true for CEHs carrying the HLAmotif: (A6)-haplotype: HLA-DRB1*03:01~HLA-DQB1*02:01~a6, which seemed to vary quite widely in their disease association depending upon the exact CEH composition (Table 3; S2 File; Table S3).
The impact on the phenotype of an individual in response to combining two CEHs into a single genotype is shown in Table 4. For example, as has been well described previously [11,[15][16][17][18][19][20][21][22], combining two copies of the (H+)-haplotype in to a single genotype markedly and significantly increases the disease association (Table 4; Fig. 1). Nevertheless, not all (H+)-carrying haplotypes have the same disease association [26]. For example, the OR for single copy carriers of the (c2) CEH is significantly greater (z=3.4-4.8; p=10 -3 -10 -6 ) than the OR for either single or double-copy carriers of the (c3) CEH.
Similarly, considering the AP group of CEHs (Table 4; Fig. 1), we found a significant dose-dependent response such that possessing 2 copies of an AP CEH is significantly more "protective" than possessing only a single copy and, in addition, the magnitude of these "protective" effects is similar to the disease-risk produced by (H+)-haplotypes (Table 4; Fig. 1).
Moreover, having an AP CEH, or even just the (c5) CEH, significantly and substantially mitigates (z=2.1-5.2; p=0.02-10 -7 ) the disease risk produced by single copies of (c2), (c3), or, more generally, any (H+)-haplotype (Table 4; Fig. 1). A single copy of an "extended risk" CEH adds to the risk of a single copy of (c2), (c3), or any (H+)-haplotype, although it adds significantly less (z=2.5; p=0.006) than does a 2 nd copy of an (H+)-haplotype (Table 4; Fig 1). And, finally, the (c1) CEH acts in a recessive manner with little, if any, disease risk produced by a single copy (Table 4). Nevertheless, (and by contrast) a single copy of the (c1) haplotype adds significantly (z=2.5-6.0; p=0.006-10 -9 ) to the disease risk produced by single copies of (c2), (c3), or, more generally, of any (H+)-haplotype (Table 4; Fig. 1). As can be appreciated from the figure, the impact of replacing one haplotype with another often depends considerably (and significantly) upon the exact nature of the companion haplotype, which, together with the haplotype being replaced, constitutes the MHC genotype ( Fig.   2). This reflects the multiple haplotype-haplotype interactions that exist within the MHC. Indeed, if no such interactions were present, each of the comparisons provided in the figure would have an OR of ~1.0 − i.e., ln(OR)=0 − and would be shaded in yellow (Fig 2).
The Non-MHC Loci. In the WTCCC data set, and as described previously [24], the (d1)  Table 5. The increase in disease susceptibility that results from combining susceptibility genotypes at these three non-MHC loci with MHC genotypes is quite different for the different MHC configurations (Fig 3). Thus, for example, the different combinations of these non-MHC "risk" haplotypes consistently increased the risk for (0,H+), (H+,H+), (0,c1), and (H+,c1) "risk" genotypes ( Fig.3). By contrast, for other "risk" genotypes such as (AP,H+) and (ER,H+) and for "protective" genotypes such as (AP,0) and (AP,c1), these other these non-MHC "risk" haplotypes seemed to contribute essentially nothing to the final risk (Fig. 3). . Nevertheless, neither model fits perfectly. When considering only MHC "risk" genotypes, for combinations of MHC genotypes whose disease-risk exceeds that of the (0,0) MHC genotype, the actual diseaserisk observed is, in general, greater than predicted by the additive model (Fig. 4). By contrast, considering also the other non-MHC "risk" genotypes, the observed disease-risk is generally less than predicted by the additive model ( Fig 5). This effect is increased when more "risk" loci are included in the combinations (Figs. 6,7).

Discussion
The present findings provide considerable insight to the underpinnings of "geneticsusceptibility" to MS and indicate that this susceptibility is complex. At the MHC, there are multiple different CEHs that contribute to susceptibility in different ways. When referenced to the MHC genotype (0,0), certain groups of CEHs seem to affect disease risk in a manner that either increase or decrease disease-risk when combined into a single genotype. For example, the combination of 2 "risk" CEHs (H+ or ER) results in an increased disease risk compared to a single copy of a "risk" CEH alone (Tables 2 and 3, Fig. 1). Similarly, the combination to 2 "protective" CEHs ("protective" or "all protective"), results in a decreased disease risk compared to a single copy (Table 4; Fig. 1). Finally, combining a "risk" CEH together with a "protective" CEH results in an intermediate disease risk compared with having a single copy of either CEHtype alone (Table 4).
Nevertheless, there are exceptions to this general rule. Notably, when referenced to the (0,0) MHC genotype, a single copy of the (c1) CEH − the highest frequency CEH in both the WTCCC controls and other European populations [25,26] − is associated with a negligible, nonsignificant, disease-risk (Table 4; Fig. 1). By contrast, the disease-risk is substantially (and significantly) increased in the homozygous state (Table 4; Fig. 1). Such a pattern suggests that (c1) is acting in a recessive manner. Nevertheless, a single (c1) CEH increases disease risk when combined with "risk" CEHs but not with "protective" CEHs (Table 4). Thus, (c1) with an (H+) or an (ER) CEH resulted in a significantly increased disease risk compared to each CEH alone ( Fig. 1). By contrast, the combination of (c1) and an (AP) CEH neither enhances nor to mitigates the effect of the "protective" CEH by itself (Fig.1).
Our findings have certain implications with respect to the appropriateness of the additive and multiplicative causal models for the accumulation of genetic risk. For appropriate OR or RR comparisons, calculations, need to be made using the lowest, non-zero, MS-risk as a reference group. In the WTCCC, the (AP,AP) MHC genotype had the lowest risk (OR=0.13) relative to the (0,0) MHC genotype of any that we identified (Table 4; Fig.1). Therefore, this group was used to normalize the MHC haplotype risk effects. We show that all MHC genotypes, except (c1), are intermediate between the two causal models (Fig. 4). By contrast, for (c1,c1) and (c1,ER) genotypes, the observations exceed the expectations of both models (Fig. 4).
When the other non-MHC "risk" loci are included in the analysis, observations are closer to the additive model. Thus, the estimates from a multiplicative model exceed observations by1-2 orders of magnitude (Figs. 5-7). As demonstrated previously for a different definition of the (G) subset [3], the distribution of penetrance values in the general population (Z) is incompatible with a lognormal distribution [3] − i.e., the distribution expected for a multiplicative model. In the present iteration of the model, defining the set (G) to include all genotypes, which that have a non-zero expected penetrance, the bimodality of the distribution can be established with certainty (File S1). Consequently, based upon both theory and observation, a multiplicative model for the accumulation of genetic risk in MS is inappropriate.
The additive model, in general, performed better in these circumstances (Figs. 4-7).
Nevertheless, it does not explain perfectly the accumulation of genetic risk in MS. First, (c1) CEH genotypes consistently exceed the additive expectations (Fig. 4). Second, effect of a given MHC haplotype is dependent on its companion MHC haplotype in a genotype (Fig. 2). Third, the effect of the 3 non-MHC "risk" haplotypes is not consistent across all MHC genotypes (Fig. 3).
And fourth, when more loci are included in the analysis, the observations become increasingly less than what is predicted by the additive model (Figs. 5-7). Taken together, these lines of evidence indicate that the accumulation of genetic risk from these "susceptibility loci" is inconsistent with both an additive and a multiplicative model. Rather, the magnitude of any change in disease-risk associated with the inclusion of additional "susceptibility loci" seems to depend upon the exact state at each "risk-locus" and on the interaction across all loci. Such a conclusion is also anticipated on the basis of theoretical considerations (File S1).
The MHC is known to have a remarkable diversity [63]. Although the non-MHC "risk" regions used for this analysis are likely to be less variable than the MHC, these regions span large amounts of DNA (200-680 kb) and they generally have hundreds of highly conserved SNP-haplotypes across each region. Moreover, despite the fact that authors sometimes identify specific genes as being MS-associated [13,14], the truth is that we have no basis for deciding which gene or genes within a region are responsible for the association. We cannot exclude the possibility that, within these regions, as within the MHC, there might exist "risk" or "protective" alleles interacting with each other. If so, the likelihood that any simple probability model (either additive or multiplicative) will adequately describe genetic-susceptibility to MS seems quite remote.
However, such complexity fits well with the model of genetic-susceptibility presented in the Introduction and more fully developed in the S1 File. Thus, MS-susceptibility -i.e., membership in the subset (G) -seems to be confined to a small subset (~2.2-4.5%) of the general population (File S1; Table S1) and, yet, this susceptibility is a prerequisite to getting MS, with members of the (G−) subset having no chance of getting MS, regardless of what environmental experiences they have (S1 File). Moreover, despite the fact that the Class II (H+)haplotype is, by far, the strongest, and most significant, MS-associated genetic factor (p<<10 -300 ) of any in the genome and has been known for over a half a century [11,[15][16][17][18][19][20][21][22]26], only a tiny fraction of (H+) carriers are even "susceptible" to getting MS (File S1). This observation indicates that at least with respect to the (H+)-haplotype, "genetic-susceptibility" to MS requires the combined effects of different genes (File S1). The presence of (H+), by itself, does not increase disease-risk.
Nevertheless, despite the necessity of being a member of the (G) subset in order for a person to develop MS, environmental factors are also required. Indeed, once "geneticsusceptibility" is established in an individual, these environmental factors, together with certain stochastic processes, are entirely responsible for determining who does and who does not ultimately develop MS (File S1). Some of these causal environmental factors seem to occur in utero or, possibly, in the early post-natal period; others seem to occur during adolescence; and still others seem to occur later [50,51,[65][66][67]. There is strong evidence that Epstein Barr viral infections (especially those associated with symptomatic infectious mononucleosis) are causally associated. There is also strong circumstantial evidence that Vitamin D deficiency is an important factor [50,51,[65][66][67]. Other factors (e.g., smoking, obesity, and possibly other infections) may also play a role [50,51,[65][66][67]. Regardless of the identity and role of each factor, however, it seems that, collectively, these environmental events (which currently occur as "population-wide" exposures) are major determinants of whether or not the disease will develop in a "susceptible" individual (File S1). For example, although the (H+)-haplotype, as noted, has the strongest association with MS of any, the possession of this haplotype seems only to make (G) subset membership more likely but does not seem to alter the likelihood of actually getting MS once (G) subset membership is established (Table 1; S1 File).
Similarly, the well-known gender-bias in MS prevalence seems to be largely explained by an increased responsiveness of "susceptible" women to the environmental factors involved in MS pathogenesis (File S1). Nevertheless, men are more likely to be members of the (G) subset than are women (File S1). Such a finding might seem surprising given the facts that 1) all of the ~200 MS-linked loci are on autosomal chromosomes [13,14]; 2) association studies specifically focused on the X-chromosome have not suggested the presence of any X-linked associated loci [7]; and, finally, 3) it is hard to rationalize how women could possibly be more or less likely to possess any specific autosomal genotype compared to men (S1 File). Indeed, in the WTCCC, women seemed to be equally likely to possess both the "risk" and the "protective" CEH combinations compared to men. Nevertheless, if (G) subset membership depends upon specific genetic combinations, and defining the subset (G a ) to represent the autosomal genotypes in the general population (Z), it is certainly plausible that men and women could be equally likely to possess each of its members and, yet, for any specific (k th ) member with genotype (G ak ), it could well be the case that: Such a circumstance would represent another example of "genetic-susceptibility" to MS requiring the combined effect of different genetic traits (File S1).
Moreover, from the epidemiological observations regarding changes in sex-ratio that have taken place over time [68], the response curves for "susceptible" men and women to increasing levels of environmental exposure can be derived quantitatively (Supplemental Fig. C; File S1) and it is clear from these response curves that women are, indeed, more responsive (probably physiologically) than men to the causal environmental factors involved in MS pathogenesis [3,4,68]. Moreover, this analysis strongly suggests that the relevant environmental exposures must be occurring currently at "population-wide" levels (S1 File). Such a conclusion is fully consistent with the same conclusion reached from observational studies in adopted individuals, in siblings and half-siblings raised together or apart, in conjugal couples, and in brothers and sisters of different birth order, which have generally indicated that MS-risk is unaffected by the childhood or other micro-environments [69][70][71][72][73][74][75]. subsets. Indeed, as demonstrated in File S1, despite the fact that the disease association for the (H+) subset is, by far, the strongest and most significant of any in the entire genome [11,[15][16][17][18][19][20][21][22], this association is due mostly to the fact that: P(G | H )  PG | H ) .
In the study of human genetics there has been a long-running debate between the socalled "common-disease, common variant" (CDCV) and the "common-disease, rare variant" (CDRV) hypotheses [76]. Nevertheless, with our improved genetic sophistication, it has become increasingly clear that, in different specific circumstances, either (or both, or neither) hypotheses could be operative [76]. In fact, our observations also support this notion. For example, on the one hand, all of the MHC CEH combinations, which impact MS-susceptibility, are quite rare.
None has a population frequency in controls of more than 6.2% and the large majority of them have population frequencies well below 1% (S2 File; Tables S2&S3). On the other hand, considered collectively, those CEH combinations, which include the Class II (H+)-haplotype, have a WTCCC Control population frequency of 23%. Indeed, this particular haplotype-group is the most prevalent (and, therefore, the most highly selected) of all such Class II haplotype combinations in northern population [26].
Consequently, regardless of whether one considers these observations to be supportive of an association between MS-susceptibility and "common" or "rare" variants, the fact remains that, whether considered individually or collectively, the most prevalent, and therefore the most highly selected [26], CEHs are those that are also associated with the highest MS-risk (S2 File; Tables S2&S3). Thus, it is clear that these particular CEHs must come with both adaptive and deleterious consequences for the individual. Also, although the CEH composition differs markedly among long-separated human populations (File S2; Tables 4a & 4b), specific CEHs are still being strongly selected in each of them [26]. Consequently, the benefits of the adaptive features of these CEHs must outweigh the risk of any deleterious ones. Obviously, for circumstances, either in which the risk of MS is small or in which MS has little impact on an individual's eventual number of surviving children, even a modest advantage in favor of a specific CEH might still cause it to be selected. In this regard, a recent French study estimated that women with MS had 31% fewer children than their contemporary controls [77]. If this observation is correct, it suggests that there is a strong selective disadvantage to having MS. Therefore, the explanation for the benefits of these MS-associated CEHs outweighing the risks is likely to lie in an individual's low risk of MS rather than the disease having little impact on their fertility. Based on our observations, this seems likely to be the case. Thus, because natural selection can only select against those genotypes, which actually carry risk (relative to other genotypes), the fact that so few members of the "susceptible" (G) subset ever actually develop MS makes such a favorable tradeoff between adaptive and deleterious features considerably more likely to occur.
Our results also bear on the common notion that there is a considerable amount of "missing heritability" in both MS and other complex genetic disorders [78][79][80]. First, as discussed in File S1, much of the variability in MS expression (even among "genetically-susceptible" individuals) is attributable to stochastic processes that are unrelated to either environmental or genetic factors. Second, MS expression is related to an interaction between the environmental and genetic factors involved in MS pathogenesis; neither alone are sufficient and both are necessary (File S1). Third, both theoretically (File S1) and observationally (Figs 2 & 3) specific gene-gene combinations are crucial determinants of "susceptibility" to MS − a circumstance which renders the common (additive) methods of estimating heritability unreliable [77]. And fourth, with over 200 independent MS-associated genetic regions [5][6][7][8][9][10][11][12][13][14], each potentially with more than one "susceptible state" (e.g., the MHC), there are so many possible combinations of states at these loci that, almost certainly, every person with MS possesses a unique combination. If, as indicated by our results, only a few of these combinations are members of the (G) subset, even of combinations that are similar to each other (File S1 & File S2; Table S2), then there are more than enough genetic associations identified already to account fully for membership in the (G) subset. Naturally, many more MS-associated loci may yet be identified in the future although their existence is not necessary (File S1).
Alternatively, if "missing heritability" is meant to imply only that our genetic model cannot predict accurately the occurrence of MS, then it is true that a substantial amount of the Finally, it is worth noting that the nature of "genetic susceptibility" developed in this manuscript is applicable to a wide range of other complex genetic disorders such as type-1 diabetes mellitus, Crohn's disease, ulcerative colitis, and rheumatoid arthritis. Indeed, base solely upon Proposition #1 (S1 File), any disease for which the MZ-twin concordance rate is substantially greater than the life-time risk in the general population, only a small fraction of the population can possibly be in the "genetically susceptible" subset (i.e., have any chance of developing the disease). Moreover, any disease for which the MZ-twin concordance rate is substantially less than 100% must, in addition to "genetic susceptibility", include environmental factors, stochastic factors, or both in the causal pathway leading to the disease.

Ethics Statement
This research has been approved by the University of California, San Francisco's Institutional Review Board (IRB) has been conducted according to the principles expressed in the Declaration of Helsinki.

Study Participants
Wellcome Trust Case Control Consortium (WTCCC). This multinational study cohort consists of 18,872 controls and 11,376 cases with MS and has been described in detail previously [13,14]. However, SNP haplotype data was unavailable for 380 controls and 232 cases. Of the cases, 72.9% were women, the average age-of-onset was 33.1 years, and the mean Extended Disability Status Score (EDSS) was 3.7 [14]. The patients enrolled in this study (except for a few African Americans from the United States) were of European ancestry. The large majority (89%) of the cases had a relapsing-remitting onset [13]. The diagnosis of MS was made based upon internationally recognized criteria [81][82][83]. Control subjects were composed of healthy individuals with European ancestry [13]. The Ethical Committees or Institutional Review Boards at each of the participating centers approved the protocol and informed consent was obtained from each study participant. The WTCCC granted data access for this study.

Genotyping, and Quality Control
The genotyping methods and quality control for the WTCCC have been described in detail previously [13,14,16,18,19]. for each participant by imputation using the HIBAG method [84].

Statistical Methods
Phasing. Both the phasing of alleles at each of five HLA loci (HLA-A, HLA-C, HLA-B, HLA-DRB1 and HLA-DQB1) and the phasing of the SNP-haplotypes surrounding the Class II region of the DRB1 gene were accomplished using previously published probabilistic phasing algorithms [23,24,85,86]. SNP-haplotypes from 3 of the 102 non-MHC genomic regions, which had been identified previously as being significantly MS-associated, were also included in our analysis [24]. In our previous report, the MS-associated SNP haplotypes were numbered (arbitrarily) from 1 to 932. These three particular regions (arbitrarily labeled d1, d2, and d3) were selected based on their having a "risk" SNP-haplotype with 500 or more representations in the WTCCC dataset and also having the largest ORs for disease-association of any haplotype meeting this specification. The reason for choosing only three regions was that, when more regions were added, there were an insufficient number observations to estimate the ORs for any of the possible higher order combinations. These three regions were located at chromosomal locations 3p24.2, 14q24.1, and 16p13.13 and in the vicinity, respectively, of the genes EOMES, ZFP36L1, CLEC16A [13,14]. Chromosome 3; Region 22 (d1) spanned 0.65 mb of DNA and the 11-SNPhaplotype (number 234) was used [24]. Chromosome 14; Region 78 (d2) spanned 0.68 mb of DNA and the 3-SNP-haplotype (number 734) was used [24]. Chromosome 16; Region 85 (d3) spanned 0.20 mb of DNA and the SNP-haplotypes (numbers 814, 818, and 822) were combined into a single 15-SNP haplotype [24]. This was done because each of these risk-haplotypes were adjacent to each other and because the individual risk SNP haplotypes were part of the same  Table S2; 2) other increased risk or "extended risk" (ER) CEHs (c23, c27, c34, c46, c68, c81, c85, c96, and c107) as shown in S2 File (Table S3); 3) decreased risk or "all protective" (AP) CEHs (c5, c15, c18, c24, c30, c32, c51, and c73) as shown in S2 File (Table S3); 4) all CEHs not in the (H+), (ER), or (AP) groups (0) CEHs; and 5) the (c1) CEH by itself. We also explored a "protective" group, which excluded the (c5) CEH. However, this analysis is not presented because the findings were the same as when the AP group was analyzed as a whole. In many circumstances, an individual's MHC genotype was specified by the haplotype combination that they possessed. For example, by this convention, an individual homozygous for (H+) would be characterized as having the (H+,H+) MHC genotype. By contrast, a heterozygous individual would be characterized as having the (H+,0), the (H+,ER), the (H+,c1), or the (H+,AP) MHC genotype In the principal analysis, all MS-associations were assessed compared to a reference group consisting of the (0,0) MHC genotype. Similarly, when the disease associations for those "nonrisk" CEHs carrying the HLA-DRB1*03:01~HLA-DQB1*02:01~a6 haplotype were assessed, other carriers of this haplotype were also excluded from the (0,0) reference group. For notational simplicity, when using the (AP,AP) MHC genotype as a reference, this genotype was referred to as (AP*).
Disease associations for the risk SNP-haplotypes on Chromosomes 3, 14, and 16, were assessed compared to a reference group consisting of the (0,0) MHC genotype, and excluded individuals carrying their risk-haplotypes at these chromosomal locations. We designate (collectively) all non-risk-haplotypes at each of these chromosomal locations as the (0) haplotype at each locus.
Significance of the differences between two ORs in disease association for any haplotype or haplotype combination was determined by z-scores calculated for the differences in the natural logarithm of the ORs for any two haplotypes. As discussed earlier, pair-wise comparisons of ORs are independent of the reference group chosen. The MHC genotype (0,0) had the largest sample size of any and, therefore, in order to maximize the statistical power to detect differences, the

Supporting Information Captions
File S1. This Supplemental File describes, in detail, the susceptibility Model used for this manuscript and rigorously develops its logical framework. It includes the methods used to estimate both the "population" parameters such as: (See Table 1   Proband-wise concordance calculated as: 2C/(2C+D) − adjusted [39] for double ascertainments (13/24=54%)

*
Odds ratio (OR) of disease for individuals having certain haplotype combinations compared to having the (0,0) genotype at both the MHC and the non-MHC loci.
** The p-values are expressed in scientific notation as powers of 10 (E); ns=not significant.  Positive numbers (red shades) indicate that the genotype in the column has a greater OR than the genotype in the row. Conversely, negative numbers (blue shades) indicate that the genotype in the column has a smaller OR than the genotype in the row. For example, the genotype (AP-2) has a smaller OR than the genotype (AP-1). Similarly, the genotype (H-1) has a smaller OR than the combination (H-2). ORs given as (0.0) indicate that (OR<0.05). For no genotype were there zero MS cases observed. The following designations are use to indicate the different genotype configurations: AP1 = 1 copy of "All Protective" (AP) + 1 copy of (0) AP2 = 2 copies of "All Protective" (AP) H-1 = 1 copy of (H+) + 1 copy of (0) H-2 = 2 copies of (H+) ER-1 = 1 copy of "Extended Risk" (ER) + 1 copy of (0) ER-2 = 2 copies of "Extended Risk" (ER) c1-1 = 1 copy of the c1 CEH + 1 copy of (0) c1-2 = 2 copies of the c1 CEH AP-ER = 1 copy of "AP" + 1 copy of "ER" AP-H = 1 copy of "AP" + 1 copy of (H+) AP-c1 = 1 copy of "AP" + 1 copy of the c1 CEH ER-c1 = 1 copy of "ER" + 1 copy of the c1 CEH H-c1 = 1 copy of (H+) + 1 copy of the c1 CEH H-ER = 1 copy of (H+) + 1 copy of "ER 0,0 = 2 copies of (0) Figure 2. Lower triangular plots of the natural logarithm of the odds ratios (ORs) and z-scores for the different transitions from one MHC haplotype to another in different genotypic contexts. For example, the point of intersection for and represents the ratio of: (0  H ) (c1  H ) Figure 3. Natural logarithm of the odds ratios (ORs) and z-scores for the different combinations of MHC and non-MHC genotypes. All ORs were calculated relative to a group consisting of the same MHC genotype and with the genotypes (0,0) at all non-MHC loci involved in the comparison -see text. The MHC genotypes, in order of increasing disease-risk (Table 4)  as defined in the text: R b =R AP* =1). With the exception of (c1), which seems to behave in a unusual fashion, the combination of other risk alleles produced, in general, a risk in between the two models, albeit closer to that predicted by the additive model. All combinations had, at least, 50 representations in the WTCCC and the green shading indicates the ORs actually observed.
Cells with yellow shading in the "Observed" column also represents the ORs actually observed.
However, in these yellow-highlighted cases, the ORs were used to approximate the relative risks (RRs), which, in turn, were used to assess whether the genotypes that are not yellow-highlighted conformed to the additive and multiplicative models (see Methods).