Pangenome Evolution Reconciles Robustness and Instability of Rhizobial Symbiosis

ABSTRACT Root nodulating rhizobia are nearly ubiquitous in soils and provide the critical service of nitrogen fixation to thousands of legume species, including staple crops. However, the magnitude of fixed nitrogen provided to hosts varies markedly among rhizobia strains, despite host legumes having mechanisms to selectively reward beneficial strains and to punish ones that do not fix sufficient nitrogen. Variation in the services of microbial mutualists is considered paradoxical given host mechanisms to select beneficial genotypes. Moreover, the recurrent evolution of non-fixing symbiont genotypes is predicted to destabilize symbiosis, but breakdown has rarely been observed. Here, we deconstructed hundreds of genome sequences from genotypically and phenotypically diverse Bradyrhizobium strains and revealed mechanisms that generate variation in symbiotic nitrogen fixation. We show that this trait is conferred by a modular system consisting of many extremely large integrative conjugative elements and few conjugative plasmids. Their transmissibility and propensity to reshuffle genes generate new combinations that lead to uncooperative genotypes and make individual partnerships unstable. We also demonstrate that these same properties extend beneficial associations to diverse host species and transfer symbiotic capacity among diverse strains. Hence, symbiotic nitrogen fixation is underpinned by modularity, which engenders flexibility, a feature that reconciles evolutionary robustness and instability. These results provide new insights into mechanisms driving the evolution of mobile genetic elements. Moreover, they yield a new predictive model on the evolution of rhizobial symbioses, one that informs on the health of organisms and ecosystems that are hosts to symbionts and that helps resolve the long-standing paradox.

strains that nodulate crop hosts, but with mixed results for effective symbiosis. For instance, in settings where legume crops and compatible rhizobia were introduced by growers, transfer of an entire symICE from highly effective inoculant strains to native rhizobia occurred and generated a diversity of novel nodulating strains; however, many of them were ineffective in fixing nitrogen and the bases for loss of SNF on target crops remain unknown (15)(16)(17). Moreover, processes that drive symICE variation, e.g., monopartite and polypartite and diverse integration sites, as well as the selective advantages for such variation in nitrogen fixing bacteria are also unknown (18,19).
In contrast to other rhizobia, most members of Bradyrhizobium are traditionally thought to have genes necessary for SNF clustered in a genomic island referred to as a symbiosis island (SI) (20). Bradyrhizobium is cosmopolitan and its members can fix nitrogen in facultative associations with diverse members of the legume family, Fabaceae. (21). Host species include at least 24 of the 33 legume tribes that can form nodules, spread across the three legume subfamilies, Caesalpinioideae, Mimosoideae, and Papilionoideae. Bradyrhizobium populations have been extensively characterized in native host communities, and shown to exhibit broad variation in symbiotic capacity, providing natural tests to investigate genomic drivers of this variation (22). We compiled and analyzed a data set of genome sequences of genetically and phenotypically diverse strains of Bradyrhizobium (Data set S1; Extended Data sets S1-S3 available at https://github.com/osuchanglab/ BradyrhizobiumPangenomeManuscript/tree/main/Extended_Supplementary_Materials). Critical to resolving the drivers of symbiosis variation in natural settings, we sequenced genomes of 85 strains (here metapopulation strains) isolated from across an 800 km transect of wild Acmispon strigosus populations in California and phenotyped the strains as beneficial (Nod 1 /Fix 1 ), ineffective (Nod 1 /Fix-), or non-nodulating (Nod-/Fix-) on this host species (22,23). We additionally included 167 publicly available genome sequences of strains beneficial to plants of many legume tribes as well as strains considered non-nodulating, photosynthetic, or not isolated from plants. Here, findings suggested that the SI of Bradyrhizobium represents a diverse set of symICEs. We additionally demonstrate that recombination among symICEs and with nonsymbiosis ICEs as well as nonsymbiosis plasmids generates tremendous structural and functional diversity. Modularity of genes that contribute to SNF and their presence on mobile genetic elements are key to generating variation and conferring robustness to this ecologically important trait.

RESULTS
Symbiosis genes of Bradyrhizobium cluster within a strikingly diverse set of mobile genetic elements. We first identified clusters of symbiosis genes in Bradyrhizobium genome sequences and searched for hallmark features indicative of being within mobile genetic elements. A total of 179 strains have symbiosis genes clustered within regions with features of ICEs, here called symbiosis ICEs (symICEs), and two have them clustered within megaplasmids (Fig. 1). Both classes of genetic elements are associated with a T4SS-encoding locus that mediates interbacterial conjugation. In addition, the symICEs have integrase-associated genes, while the symbiosis megaplasmids (Sym plasmids) have a repABC origin of replication. The symICEs are also associated with tRNA genes and represent five types based on these associations and multiple subtypes based on differences in sequence signatures and gene composition (Fig. S1, S2A, S3; Data set S1). Analysis of replication genes suggest that the two Sym plasmids have independent origins ( Fig. 1; Ext. Fig. S1 available at https://github.com/osuchanglab/ BradyrhizobiumPangenomeManuscript/tree/main/Extended_Supplementary_Materials) (24). Twenty additional strains have nif/fix islands that lack features of ICEs or plasmids and two of these strains also have nod genes located in different regions originally classified as islands (25,26). The remaining strains lack symbiosis genes and are incapable of SNF (Data set S1) (23).
Previous work used genomic comparisons, att site identification, and experimental transfer to characterize tripartite symICEs in Mesorhizobium, wherein sequential steps assemble the parts into a single circular element during recombination (13). Molecular evidence herein is consistent with symICEs of Bradyrhizobium having the capacity to mobilize and for some to induce large-scale rearrangements during recombination with chromosomes. The tRNA-Val (subtype A) symICE is the most common subtype. However, unlike those in Mesorhizobium, most of these adopt a bipartite configuration with a symICE element integrated near a tRNA-Val gene and a "B" element near the ybgC gene (Fig. 2). Each of the two elements has an integrase gene bordering a predicted att site and the other att site at the other boundary of the element; and recombination is predicted to occur in two steps, causing a chromosomal inversion around cognate att sites in each element ( Fig. 2A; Fig. S1 and S4; Data set S1). This proposed mechanism of recombination is supported by alterations seen in four variants of tRNA-Val (subtype A). The tRNA-Val (subtype A) symICE of strain #195 is organized in a manner consistent with being in an intermediate configuration, as its two elements are adjacent to each other at one of the predicted att sites. Those of strain USDA6 and closely related E109 are predicted to be locked in the chromosome due to them having an att site transposed proximal to its partner att site, precluding its integrases from mediating recombination ( Fig. 2B and C). Last is the symICE of strain number 186, which adopts a tripartite configuration (Fig. 2D). This tRNA-Val (subtype A) symICE acquired a third integrase and att site that resulted in nif/fix genes splitting into a third region adjacent to a tRNA-His gene. Similar to polypartite Mesorhizobium symICEs, this variant is predicted to require three steps to integrate and excise and cause two genome inversions (13). Four other types of symICEs are monopartite and likely integrate in one step at different tRNA genes.
Phylogenetic patterns of symbiosis MGEs are consistent with them having been mobilized horizontally across the genus (9,10). Diverse species-level groups of Bradyrhizobium have similar symbiosis MGEs and are associated with the same host taxon while clades of closely related Bradyrhizobium vary in symbiosis MGEs and include ineffective and nonnodulating strains ( Fig. 3; Fig. S3; Extended Data sets S1-S3). We examined conserved single copy genes in symbiosis MGEs to identify evidence for transmission events. Genes formed 14 groups based on similar evolutionary histories, reflecting substantial recombination and reshuffling ( Fig. S5; Extended Fig. S2; Extended Data set S4). Nonetheless, symICE types and subtypes tended to cluster similarly across the 14 trees, each of which has a topology that differs from that of the genus tree, suggesting that symbiosis MGEs are largely inherited vertically and horizontally, as single units. Modularity of symbiosis MGEs enables diversification while preserving SNF. Like other rhizobia, most genes implicated in SNF of Bradyrhizobium are encoded within MGEs (9,10,13,14,16). Thus, transfer of MGEs and recombination of symbiosis genes could be important to inform on mechanisms that diversify and impact SNF functions. We used principal-component analysis (PCA) to visualize the compositional variation of the symbiosis MGEs and showed that a large proportion is associated with differences in host plant species (Fig. 4A). Two additional observations of metapopulation strains provided critical insights. First, symbiosis MGEs of ineffective strains are each more similar in composition to that of a beneficial strain than to each other, suggesting that symbiosis MGEs of ineffective strains are derived from those of beneficial strains. Second, both tRNA-Val (subtype A) and tRNA-Ile (subtype A) symICEs can confer benefit to A. strigosus,   (Continued on next page) suggesting that symbiosis genes have been shuffled across symICE types. Collectively, these observations indicate that modularity impacts both loss and gain of plant species as hosts.
To identify reshuffling events that are associated with ineffectiveness, we first grouped genes of the symbiosis MGEs into three categories, those representing the minimal set necessary for symbiotic nitrogen fixation, e.g., nif/fix, and nodABC, those core to strains beneficial to A. strigosus, and those identified in a genome-wide association study defined as signatures of being associated with A. strigosus relative to Glycine max (Data set S2). The genes within these categories were then used to query for differences in ineffective strains. Based on comparisons of genes essential for SNF present in beneficial Bradyrhizobium, the symbiosis MGEs of the six ineffective strains number   (22). For example, the symbiosis MGEs of strains number 2, number 187, and number 200 lack symbiosisassociated T3SS and effector genes and several signature genes associated with being a symbiont of A. strigosus. Other differences in sequences or composition of nod genes and other genes in symbiosis MGEs were also identified ( Fig. S6A; Fig. S7A-C, Data sets S2 and S3; Extended Data set S5) (27)(28)(29).
To test predictions that ineffective strains maintain SNF functions but benefit different plant species, we measured symbiosis outcomes on five sympatric legume species (Extended Data set S6). Four ineffective strains, marginally beneficial strain number 156, and beneficial strain number 4 can all nodulate multiple host species that they overlap in their native range (Fig. 5A). On A. strigosus, benefits derived from strain number 186 were not significant, consistent with its classification as ineffective (22). However, strain number 186 exhibited a qualitative shift in host range by inducing nodules on Lupinus bicolor. This strain caused significant growth benefits that were strikingly greater than those caused by strains number 4 and number 156, which also caused significant beneficial effects to L. bicolor (Fig. 5B). These data suggest that shuffling of symbiosis genes preserve SNF functions while changing host specificity, thus providing an explanation for the maintenance of variation in effectiveness in metapopulation strains.
We also uncovered processes in the evolution of symbiosis MGEs predicted to have allowed metapopulation strains to gain A. strigosus as a host species and provide benefits. First, nodV was identified as a signature gene in the GWAS analysis. NodVW is a  two-component system that induces nod gene expression upon sensing specific plant isoflavones (30). In all sequenced strains, the nodV and nodW genes are adjacent to each other in the symICE. In addition, nodVW are close paralogs of nwsAB in strain USDA110, a symbiont of G. max and other hosts in the Phaseoleae tribe (30). The phylogeny of nodV is polyphyletic and incongruent with the phylogeny of nodW, supporting a scenario where a distantly related nodV homolog was acquired by an ancestral tRNA-Val symICE (Fig. 4C and D). We predict that it displaced the original and maintained the ability to function with the original nodW, thereby gaining a new capacity to perceive A. strigosus as a host species while retaining the ability to regulate nod gene expression.
Second, we predict that reshuffling of genes among symICEs led to strains gaining A. strigosus as a host species. Our analysis suggested a tRNA-Ile symICE of a symbiont associated with a host outside the Loteae tribe (i.e., Acmispon, Lotus, related genera) acquired a region from a tRNA-Val (subtype A) symICE, yielding the tRNA-Ile (subtype A) symICE with two nod clusters (Fig. S2B. Fig. S6A, S7). The acquired region includes most of the nod genes common to beneficial metapopulation strains, including nodVW with the A. strigosus-associated nodV allele. In beneficial strains, the region includes symbiosis-associated T3SS-and effector-encoding genes but in ineffective strain number 200, they are absent. The nonfunctional nodD1 allele of strain number 200 also resides in this region. Across tRNA-Ile (subtype A) symICEs, the acquired region lacks a complete set of nod genes necessary for host-specific nodulation and normal nodule development (30). However, it is complemented by the other nod region, and three of four strains with a tRNA-Ile (subtype A) symICE can benefit A. strigosus. Thus, modularity allows genes to reconfigure at different levels of organization and diversify Bradyrhizobium strains without compromising essential functions of SNF.
Reorganization of symICE modules promotes diversification. The process by which symICEs excise causes important changes to gene organization that inform on mechanisms that led to the diversity of SNF uncovered herein (Fig. 6). Upon circularization, Bradyrhizobium symICEs reorganize SNF-associated genes into a small "Sym region," including nif/fix, T3SS-encoding, and nod genes (13) (Fig. 6A; Fig. S6B-C). We predict that the reconfiguration, with SNF-associated genes more closely clustered and separated from a large variable region, is crucial for reshuffling across elements and generating the diversity observed in symbiosis MGEs among metapopulation strains. We suggest that the Sym region recombines with other elements, as the Bradyrhizobium genus is predicted to have an extremely large and diverse collection of nonsymbiosis associated MGEs ( Fig. 1 and 4; Extended Fig. S3). For example, the Sym region from a tRNA-Val (subtype A) symICE likely recombined with a tRNA-Pro ICE and plasmid to yield the symbiosis MGEs in strains number 2 and number 187, respectively ( Fig. 6B; Fig. S1, S2A, and S6A). Likewise, evidence strongly supports a scenario where the Sym region recombined with a tRNA-Ile symICE to yield the A subtype ( Fig. 6B; Fig. S6A). Recombination of Sym regions also occurred among tRNA-Val symICEs and ICEs, generating members with highly similar symbiosis genes and diverged variable regions ( Fig. S5 and S6B-C). Consistent with this, the variable regions of examined tRNA-Val symICEs have closely related T4SS-encoding loci flanked by large polymorphic regions, nested within closely related Sym regions ( Fig. S2A and S6B-C). However, evidence also suggests that in other symbiosis MGEs, recombination can occur across more distantly related molecules and lead to the acquisition of more divergent T4SS-encoding loci and potentially, its flanking variable regions (Fig. S2A).
The model of symICE excision predicts that joining and circularization of bipartite tRNA-Val symICEs will link integrase genes together with att sites and the small B element. This subsequent predicted reorganization is critical in facilitating losses of this region, whose absence is necessary to transition one type of symICE to another that integrates at a different site, or to a Sym plasmid that does not integrate into the chromosome. The Sym plasmid of strain number 187 has a scar, one att sequence of a tRNA-Val symICE, supporting the proposed loss. Furthermore, loss of one or both integrase genes is consistent with the emergence of tRNA-Ile symICEs from a tRNA-Val symICE ( Fig. 6B; Fig. S1B, and S1F). Critically, losses can also include genes flanking the B element and impact SNF. Loss of T3SS-encoding genes and signature genes is associated with three strains exhibiting ineffectiveness on A. strigosus ( Fig. 6; Fig. S6A). Conversely, loss of most of the nif/fix genes from the donor tRNA-Val symICE appeared to have occurred in the transition of tRNA-Ile (subtype A) symICE to benefit A. strigosus while loss of both flanking regions resulted in ineffective strain number 200. The flanking region affected depends on whether recombination is with another symICE or with a nonsymbiosis ICE, as loss of nif/fix is not selected against if recombination is with an element that already confers SNF.
However, organizational differences can also restrict recombination between symICEs. Subtypes A and B of the tRNA-Val symICEs, associated with symbiosis on Loteae and Phaseoleae hosts, respectively, are derived from a common ancestor, but there is little evidence for them recombining (Fig. 4A; Fig. S1A-B and S5). When tRNA-Val (subtype B) symICEs circularize, nod and T3SS-encoding genes will separate to opposite regions flanking the B region, as opposed to the same region predicted for subtype A. We suggest this difference reduces productive recombination events between these two subtypes (Fig. S6A). Large circles represent symbiosis MGEs (vermillion = A. strigosus-associated; blue = G. max-associated) and small symbols within them represent their type. Integrase genes are abbreviated with "int" and a letter denoting the location of its attachment site. The boxed region in the lower right represents the accessory genome with collections of symbiosis and nonsymbiosis MGEs.

DISCUSSION
Here, we showed that modularity and reshuffling of genes by mobile genetic elements generate uncooperative genotypes of rhizobia and make individual partnerships unstable, but these same properties are fundamental for robustness and extending beneficial associations to diverse host species, as well as transferring symbiotic capacity among diverse rhizobia. We sampled native bacteria from an 800 km transect of wild Acmispon strigosus populations and included strains various in symbiosis phenotypes. We applied a strategy, developed for virulence plasmids, to study symbiosis ICEs and plasmids in Bradyrhizobium, and in doing so we were able to infer relationships among many symbiosis MGEs and model interactions among MGEs that led to their diversification (31). Findings suggested that symICE circularization reorganizes genes and is predicted to promote the shuffling of blocks of SNF genes into different ICE backbones, the generation of new combinations of modules, the acquisition of genes from the chromosome onto the symICE, and transfer of the SNF trait to plasmids. In this regard, despite being integrated in chromosomes for most of their life cycle, symICEs are like plasmids in promoting more rapid evolution of their cargo genes (32). The symICEs described here are incredibly diverse, and even metapopulation strains isolated from a single plant host species and sampled in one US state can have different symICE types, subtypes, or symbiosis plasmids. No two symICEs isolated from metapopulation strains or in this data set have identical gene content or sequence. This reflects multiple scales of modularity. Plants can be host to diverse species of Bradyrhizobium, strains can host diverse and exchangeable symbiosis MGEs, which themselves exchange and acquire genes, and individual genes within functional units can vary.
The patterns that we uncovered in the metapopulation strains, including the role of MGEs in gene reshuffling and diversification, extend broadly across the Bradyrhizobium genus. For example, most analyzed photosynthetic nitrogen fixing Bradyrhizobium strains have only nif/fix islands and are restricted in host range (25,33). But two strains acquired symICEs that expanded their host ranges. Notably, strain ORS285 has an island that we suggest is a remnant of a symICE located at tRNA-Ile that includes nod, T3SS-associated genes, and an integrase gene but no nif/fix genes or conjugation machinery-encoding locus (26) (Fig. S1B and S2B). Moreover, several variants of symICEs have also recurrently gained fixNOQP and fixGHIS, genes necessary for respiration in microoxic root nodules and typically in fix gene cluster III located in the chromosome (34) (Extended Fig. S4). Acquisition of these symICE variants extended SNF to strains that lack fix gene cluster III and would not otherwise be capable of supporting SNF. Thus, acquisition of MGEs, and the traits that they encode, appear to play a major role in the adaptation of Bradyrhizobium to novel lifestyles.
We propose that diversification in the Mesorhizobium genus, which exhibits many parallel patterns, is also driven by acquisition and reshuffling of MGEs (19). Importantly, key aspects of our study differ from those of Mesorhizobium, where diversification has been primarily characterized in managed systems. In Mesorhizobium, entire symICEs originating from inoculum strains were predicted to be transferred into indigenous nonsymbiotic rhizobia, or among strains already naturalized under monoculture crops, a scenario that imposes intense selection for host crop compatibility (14,16,17,19). Conversely, our findings are based on investigation of phenotypically variable strains of Bradyrhizobium isolated from diverse native plant communities, where multiple legume species overlap and select for differential subsets of rhizobia (35)(36)(37). Our study suggested that symICE transfer has recurrently promoted novel host acquisition, and that loss of effectiveness on one host is associated with gains of other hosts, processes that likely require a diverse array of potential hosts.
Reconceiving symbiosis as a dynamical system with links that can form and dissolve among symbionts and between symbionts and hosts is essential for revealing emergent properties. Modularity and flexibility of genetic elements, coupled to their mobility, drive diversification, giving rise to variation in symbiosis, such as that revealed in Bradyrhizobium. Modularity and flexibility are central to robustness (3). A fundamental principle of robustness is that it maintains the function of a system, not a state (38). Therein lies the source of the paradox where symbiosis functions are maintained at the system level but lost from individual states, such as a symbiont, partnership, or host (2). Models that reduce symbiosis to bipartite partnerships and ignore symbiont-symbiont interactions unknowingly neglect major sources of variation and overlook robustness (2,39). Additionally, by separating symbiosis into categorical partnerships, these models fail to recognize the effects of multiple and various symbiont-host interactions within the system. Host species select for different combinations of symbiosis genes in their bacterial partners. Pangenome evolution, shaped by individual and gene-level selection, reassorts genes into new combinations that can extend symbiosis to new host species. Hence, alignment of fitness interests between host and symbiont is necessary for persistence of a partnership while interactions diverse in partners are necessary for robustness and evolutionary stability of symbiosis (3).
This alternative framework provides a predictive understanding of symbiosis functions that are encoded on MGEs. SNF is evolutionarily stable despite repeated abandonment by both symbiont and host species (22,40). Conversely, symbioses involving vertically transmitted endosymbiotic bacteria with closed pangenomes are not as robust and are at higher risks of extinction (41). In agriculture, elite rhizobia strains are often added to monocultures in attempt to establish a highly specific and optimal partnership. Success is difficult to achieve because the system is flexible, and plants can partner with different genotypes of rhizobia (9). Even if the optimal partnership is attained, the likelihood for it to persist is low because of potential trade-offs between state optimality and system robustness (42). Strategies that promote interactions between multiple lineages of beneficial nitrogen-fixing rhizobia and diverse crops will have greater success for long-term sustainability.

MATERIALS AND METHODS
Genome sequencing, assembly, and annotation. Strains were selected from a metapopulation that was previously generated and phenotyped (22,23) (Data set S1). Bacteria were grown overnight in a modified arabinose-gluconate medium at 29°C with shaking (43). The Qiagen DNeasy blood and tissue kit (Qiagen, Valencia, CA, USA) was used to extract total genomic DNA, and according to methods previously described, prepared, multiplexed, and sequenced on one channel of an Illumina HiSeq 3000 (Illumina Inc., San Diego, CA USA) by the Center for Genome Research and Biocomputing (CGRB; Oregon State University, Corvallis, OR, USA) to generate 150mer paired end sequencing reads (31).
For long read sequencing, strains were either sequenced on two flow cells of an Oxford Nanopore GridION X5 by De Novo Genomics Corporation (Kansas City, KS, USA) or on a MinION flowcell (1D chemistry, LSK109) on a Mk1b MinION sequencer controlled by a MinIT coprocessor. In both cases, samples were multiplexed and prepared using a pooled 1D Native genomic DNA library prep (SQK-LSK109) prior to sequencing (44). Previously described methods were used to process and assemble sequencing reads and annotate genome assemblies (31).
BLASTN v. 2.6.0 searches were used, with "TTCACACGGGAGAGGTCCAAGGTTCGATCCCTTGTGCGCCC ACCATTCACCT" and "TCCGTATCTTCGAAATAGACGCGGACCTGCATATGATGGTGACCGCCGCGAATTTCGCCA TCAAGAGAAGCTGTCACG" as queries and the parameters "-word_size 7 -evalue 100" to identify attachment sequences of Int V and Int Y , respectively (45). HMMER v 3.3 hmmsearch and custom hmm profiles were used to annotate putative nod-box and ttsI-box sequences (46)(47)(48). Macsyfinder 1.0.5 with the default options and the hmm profiles TXSS and CONJ were used to annotate secretion system-encoding loci (49). BLASTP and translated sequences of type III effector genes from rhizobia were used as queries to identify homologs of effector genes (48,50).
Analyses of genome sequences. Previously described methods were used to calculate percentage of conserved proteins (POCP; 50% threshold) and pairwise average nucleotide identity (ANI; threshold $ 95%) to operationally classify strains into genus-and species-level groups, respectively (31). The software package get_homologues v. 20170418 with the options "-M -t 0" was used to cluster genes into orthologous groups (51). Gene presence/absence heatmaps were generated using the R package heatmap.plus with "complete" clustering of binary distances, or with the ggtree function gheatmap (52,53). Publicly available genome sequences were downloaded from NCBI on May 7, 2018.
Panaroo v. 1.2.3 with the option "-clean-mode sensitive" was used to generate a pangenome graph for select genomes, which was visualized using Cytoscape v. 3.8.0 (56,57).
Islander v. 1.2 with the options "-translate -trna -annotate -reisland -table 11 -nocheck" was used to identify tRNA-associated ICEs in finished or hybrid assembled genome sequences (58). Islander v. 1.2 was modified by increasing maximum island size thresholds to 2 Mb to search for large symICEs.
Predicted regions, conserved and syntenic across multiple strains and spanning large clusters of tRNA loci, were identified manually and filtered out.
Plasmids were identified from finished or hybrid assembled genome sequences as separate replicons or from draft genome sequences as contigs with a repABC locus.
Analysis of symICE and symbiosis genes. The symICEs in finished or hybrid assembled genome sequences were identified based on presence of nod, T3SS-associated, and nif/fix genes. The program progressiveMauve was used to identify symICE boundaries by comparing genome sequences with those from closely related strains that lack the same type of symICE types (59). The symICEs were classified into types based on concordance with identified ICE regions, location of tRNA genes, and the presence of integrase genes next to or overlapping border sequences. Repetitive sequences at border regions, including putative att sites were also identified. Sequences corresponding to symICEs of draft genome sequences were identified by using CONTIGuator v. 2.7, with default parameters, to map contigs to complete genome sequences with most similar nif/fix genes (60). Contigs and regions mapping within the symICE region of the reference genome were extracted.
A symICE gene presence/absence matrix, representing the diversity of gene content of symICEs, was inferred from the ortholog clusters of the full genome get_homologues analysis. The ortholog group for each symICE-associated gene was identified from the larger analysis and using the "spread" function of the R package tidyr, a new presence/absence matrix containing only symICE genes and paralogs was generated (61). These new gene cluster sequences were subset in fasta format and used in phylogenetic and topology clustering analysis. The base R function "prcomp" was used to perform a principal-component analysis (PCA) of symICE gene content (62). Scoary v. 1.6.16 with the options "-s 4 -collapse -e 100" was used to perform a genome wide association study (GWAS) analysis of symICE gene content, comparing symICEs of strains isolated from A. strigosus and G. max (63). LAST v. 1066 lastal with the option "-f BlastTab1" was used to compare symICEs, and homologous regions were visualized using the BioPython package GenomeDiagram v. 1.72 (64,65). A previously described method was used to generate and visualize gene synteny networks for select symICEs (31).
Sourmash v. 2.0.0a11 with the option "compute -scaled 100" and "compare -k 21" was used to estimate a Jaccard Index between symbiosis MGEs based on their k-mer signatures (66). These values were used to build graphs where symbiosis MGEs are nodes, and edges connect them in which at least one symbiosis MGE has a Jaccard Index $ 0.1 to the other. Cytoscape v. 3.8.0 was used to visualize graphs (56).
The top 100 hits of a BLASTP search, using SctU as a query against the NCBI nr database, was retrieved on March 4, 2019. BLASTP with the default parameters was used to identify homologs of sctU, nodV, nodW, repC, and trbE in the genomes of analyzed strains. Data sets for integrases and other gene/ protein phylogenies were comprised of ortholog groups from the get_homologues analysis.
Plant inoculations. Bradyrhizobium strains number 2, number 4, number 156, number 186, number 187, and number 200 were each inoculated on to five sympatric host species, including A. strigosus AcS049, A. wrangelianus AcW10-R5, and L. bicolor, L. succulentus, L. nanus. Inbred lines were used for Acmispon whereas mixed seed sets of Lupinus were used (S & S Seeds, Carpinteria, CA). Seeds were surface sterilized in 5.0% sodium hypochlorite, rinsed in sterile water, and scarified. For A. wrangelianus, seeds were vernalized for a week at 4°C before planting (71). For others, immediately after scarification, seeds were planted in sterilized containers (Steuwe and Sons, Corvallis, OR), each filled with autoclaved Turface Pro League soil mixed 1:1 with small-and coarse-grain sand. The seedlings were germinated in a growth facility and when true leaves appeared, moved to a greenhouse. Plants were hardened to greenhouse conditions for 1 week before inoculation.
Previously published protocols were followed to inoculate plants with Bradyrhizobium strains (43). A minimum of six plant replicates was used for each host-strain combination. Negative-control plants received sterilized reverse-osmosis H 2 O only. Immediately before inoculation, plants were arranged in a randomized design blocking by plant size, determined based on the number of true leaves. Plants were fertilized weekly with 5.0 mL nitrogen-free Jenson's fertilizer. After 7 weeks postinoculation, plants were depotted, rinsed of remaining soil, and photographed. We counted and weighed nodules and measured separately dry biomasses of shoots and roots. Tissues were separated and dried at 60°C for at least 72 h prior to weighing.
The effect of strain inoculation on plants was calculated as percent relative growth benefit (RGB), the mean percent biomass of an inoculated plant relative to its corresponding control plants: RGB = 100 * (Inoculated host biomass-control host biomass)/control host biomass. Linear models were used to investigate variation in nodulation (total nodule number, dry nodule biomass) and RGB for effects of strain, host, and interaction effects. We used the same model for all tests to improve normality of the residuals and homoscedasticity. Significant differences among strains or hosts were assessed using Tukey's HSD test. To test if RGB was significantly larger than zero, we performed a series of one sample unpaired t-tests. All statistical analyses were carried out using R version 3.6.0 (62).
Data and materials availability. Short reads and assemblies have been deposited in NCBI as BioProject PRJNA671608 and accession numbers are listed in Data set S1. Network graphs in nexus or sif format, phylogenetic trees in Newick format, genome annotations, and scripts can be downloaded from https:// github.com/osuchanglab/BradyrhizobiumPangenomeManuscript. Strains sequenced in this study are available from JLS upon request. Extended supplementary figures and data sets can be downloaded from https://github.com/osuchanglab/BradyrhizobiumPangenomeManuscript/tree/main/Extended_Supplementary_ Materials.

SUPPLEMENTAL MATERIAL
Supplemental material is available online only. DATA SET S1, XLSX file, 0.03 MB.