Population Genomics of the Facultatively Mutualistic Bacteria and

Download PDF České info

The symbiosis between rhizobial bacteria and legume plants has served as a model for investigating the genetics of nitrogen fixation and the evolution of facultative mutualism. We used deep sequence coverage (>100×) to characterize genomic diversity at the nucleotide level among 12 Sinorhizobium medicae and 32 S. meliloti strains. Although these species are closely related and share host plants, based on the ratio of shared polymorphisms to fixed differences we found that horizontal gene transfer (HGT) between these species was confined almost exclusively to plasmid genes. Three multi-genic regions that show the strongest evidence of HGT harbor genes directly involved in establishing or maintaining the mutualism with host plants. In both species, nucleotide diversity is 1.5–2.5 times greater on the plasmids than chromosomes. Interestingly, nucleotide diversity in S. meliloti but not S. medicae is highly structured along the chromosome –⁠ with mean diversity (θ_π) on one half of the chromosome five times greater than mean diversity on the other half. Based on the ratio of plasmid to chromosome diversity, this appears to be due to severely reduced diversity on the chromosome half with less diversity, which is consistent with extensive hitchhiking along with a selective sweep. Frequency-spectrum based tests identified 82 genes with a signature of adaptive evolution in one species or another but none of the genes were identified in both species. Based upon available functional information, several genes identified as targets of selection are likely to alter the symbiosis with the host plant, making them attractive targets for further functional characterization.

Published in the journal: Population Genomics of the Facultatively Mutualistic Bacteria and. PLoS Genet 8(8): e32767. doi:10.1371/journal.pgen.1002868
Category: Research Article
doi: https://doi.org/10.1371/journal.pgen.1002868

Summary

Introduction

Analyses of genome sequences can provide a nearly complete description of the nature and extent of nucleotide diversity segregating within and among species. There have been multiple investigations into genomic diversity in microbial communities using library-based and megtagenomic approaches [1] and phylogenomic studies of relatedness among microbial species [2]. By contrast, there have been few genome-wide surveys of nucleotide diversity within a prokaryotic species, and those studies have often focused on variation in genome content [3]–[5] rather than nucleotide diversity. Yet it is clear that population-genomic analyses provide an opportunity to greatly expand our understanding of the evolutionary forces shaping diversity within prokaryotic lineages [6]–[9] and identify targets of strong positive selection without bias that may be introduced when focusing on a limited number of genes or phenotypes of prior interest [10].

Prokaryotic species are often studied because they are either pathogens, of environmental or industrial importance, or because they form mutualistic associations with eukaryotes. The latter group includes members of the genera Rhizobium, Sinorhizobium (now Ensifer), Bradyrhizobium, Azorhizobium, and Mesorhizobium, collectively referred to as the rhizobia, a group of gram-negative bacteria that form symbiotic associations with legume plants. When growing in symbiosis with legumes, rhizobia convert atmospheric nitrogen (N₂), which is unavailable to plants, into ammonia, which plants can use for the synthesis of amino acids. This symbiosis is estimated to contribute nearly half of all current biological nitrogen fixation [11] and is a key component of agricultural systems that are not dependent on synthetic fertilizers [12].

One of the best characterized rhizobial species is Sinorhizobium meliloti (now Ensifer meliloti). The interaction between S. meliloti and the closely related species S. medicae with the model legume M. truncatula, the genome of which was recently sequenced [13], has been the subject of extensive biochemical, molecular genetic [14]–[16], and evolutionary investigation [17]–[20]. The genomes of both S. meliloti and S. medicae consist of a single circular chromosome (∼3.65 Mb) plus two large symbiotic (sym) plasmids (∼1.3 and ∼1.6 Mb) [21], [22]. Sinorhizobium spp. also contain auxiliary plasmids, the number and identity of which varies widely among strains [23] and the functional importance of which is largely unknown. In Sinorhizobium, the genes required for forming nodules with legume hosts (including nod, exo, and nif genes) are distributed across both the chromosome and each of the two mega plasmids (hereafter referred to as plasmids) [21], [22], [24]. Bailly et al. [25] recently used low-coverage (∼0.8× average) genomic sequence data to characterize variation in gene content and nucleotide diversity on the chromosomes and two plasmids among 12 S. medicae strains. Their coverage was, however, too shallow to robustly characterize nucleotide variation along the genome or search for signatures of recent selection.

In this study we used Illumina technology to sequence the genomes of 12 S. medicae and 32 S. meliloti strains to over 100× mean depth. We aligned the Illumina data to the S. meliloti RM1021 and S. medicae WSM419 reference genomes (the chromosome and two plasmids) from each species and then used the aligned sequences to i) search for evidence of recent horizontal gene transfer between species, ii) characterize genome-wide nucleotide diversity within each species, and iii) identify genes that bear the signature of recent positive selection.

Results

We aligned an average of ∼1,287 Mbp of sequence from each of 12 S. medicae and 32 S. melliloti strains resulting in median aligned coverage of >100 reads site⁻¹ (Tables S1 and S2). For all six replicons (the chromosome and two plasmids of each species) the vast majority of sites were covered by either >50 or <2 reads (Figure S1). The regions with very low coverage are likely either present in the reference genome but not the resequenced strains, are <91% identical in the two strains, and thus too diverged to have aligned using our alignment parameters, or do not align to a single region in the reference genome. Because sequence reads were required to have a single alignment to the reference genome, reads that align to multiple locations were not included in final analyses. In S. medicae, an average of 95%, 79%, and 95% of the positions along the reference chromosome, pSMED02, and pSMED01 sequences, respectively, were covered by ≥10 uniquely aligned reads for each resequenced strain (Table S1). In S. meliloti, an average of 95%, 71%, and 93% of the positions along the reference chromosome, pSymA, and pSymB, respectively, were covered by ≥10 uniquely aligned reads for each resequenced strain (Table S2. Note, pSMED02 is orthologous to pSymA and pSMED01 is orthologous to pSymB). The high percentage of the reference sequence that can be aligned to the sequence data from the resequenced strains indicate that the vast majority of the sequence found in the reference genomes in each species is also found in all of our resequenced strains.

Species relatedness

Sinorhizobium medicae and S. meliloti are closely related, have very similar host ranges, and at least partially overlapping geographic ranges [26], [27], characteristics that would provide considerable opportunity for horizontal gene transfer (HGT). Nevertheless, these are clearly distinct species; the chromosomes and plasmids from each species were reciprocally monophyletic (Figure 1) and the number of fixed differences between species greatly exceeded the number of shared polymorphisms (Table 1, Figure S3).

Neighbor-joining trees showing relationships among 32 <i>S. meliloti</i> (blue squares) and 12 <i>S. medicae</i> (red circles). — **Fig. 1. Neighbor-joining trees showing relationships among 32 *S. meliloti* (blue squares) and 12 *S. medicae* (red circles).**

Number of fixed differences and shared polymorphisms and the ratio of shared polymorphisms to fixed differences between <i>S. meliloti</i> and <i>S. medicae</i> protein coding genes. — **Tab. 1. Number of fixed differences and shared polymorphisms and the ratio of shared polymorphisms to fixed differences between *S. meliloti* and *S. medicae* protein coding genes.**

Although we found no evidence for interspecific transfer of whole plasmids, there are 97 genes (1 located on the chromosome, 21 on pSymB/Smed01, and 75 on pSymA/Smed02) with a ratio of shared polymorphisms to fixed differences >0.2, indicative of transferred alleles segregating within the recipient lineage (Table 1, Figure S3). Among these 97 genes (Table S3) are many with clear potential to alter the efficacy of nodulation or nitrogen metabolism including 11 fix, 13 nod, 8 nif, 2 noe, 2 nol, 5 rkp and 3 syr genes. By contrast, only 12 fix, 7 nod, 7 rkp, and no nif, noe, nol, or syr genes for which the data meet the coverage criteria had a ratio of shared polymorphisms to fixed differences <0.2.

To gain insight into the origin and fate of horizontally transferred genes we clustered the putatively transferred genes into contiguous genomic regions (horizontally transferred genes separated only by genes which did not have a putative ortholog in the reference genome of the other species or by ≤2 genes with ratios of shared polymorphism <0.2) then used neighbor joining trees to examine within and between species relationships. On pSymB/pSMED01, 20 of the 21 putatively transferred genes were found within a single 38 kb region. On pSymA/pSMED02, 6 of the putatively transferred genes are located within a 10.5 kb region of the S. medicae reference genome and 62 are located within an ∼300 kb region of the reference genomes. This 300 kb region also contains 236 genes that are present in the S. medicae genome (∼102 in S. meliloti) for which there was no identifiable ortholog in the reference genome of the other species (Table S3).

Neighbor joining trees of the large transferred regions (Figure 2), as well as other putatively transferred genes (Figure S4), suggest the history of HGT is complex. For all regions harboring genes with evidence of transfer, the majority of sequences from each species are monophyletic but the branch length separating sequences from the two species is much shorter than the length of the branch separating the two species at genes that show no signal of HGT (Figure 2). There are five regions, all of them on pSymA/pSMED02, for which the putatively transferred genes are not monophyletic (Figure S4); three for which a single S. medicae-like sequence was sampled from an S. meliloti strain, one for which a S. meliloti-like sequence was sampled from an S. medicae strain, and one for which the longest branch on the tree separates four sequences (three sampled from S. meliloti and one from S. medicae) from all other sequences. Interestingly, for this latter case, these four sequences were all sampled from Syria, suggesting geographic structuring of horizontal gene transfer and symbiotic gene alleles.

Neighbor-joining trees showing relationships among sequences sampled from <i>S. meliloti</i> (blue squares) and <i>S. medicae</i> (red circles) for genes showing evidence of horizontal gene transfer. — **Fig. 2. Neighbor-joining trees showing relationships among sequences sampled from *S. meliloti* (blue squares) and *S. medicae* (red circles) for genes showing evidence of horizontal gene transfer.**

Within-species diversity

Consistent with previous multi-locus sequence [17], [20], [28] and genomic hybridization data [29] we found 1.5–2 times greater nucleotide diversity on each of the S. medicae plasmids than on the S. medicae chromosome (Table 2). For the S. medicae chromosome, Tajima's D (D_T) was unimodal and centered near zero (Figure 3), a distribution consistent with a panmictic, neutrally evolving population (i.e. the standard neutral model [SNM]). Diversity within the full 32 strain sample of S. meliloti (Table 2) was two to three times greater than diversity within S. medicae but showed the same broad pattern of higher diversity on the plasmids than the chromosome (Table 2). The distribution of chromosomal D_T values in the 32 strain S. meliloti sample was negatively centered (Figure 3) and the frequency spectrum of chromosomal polymorphisms revealed a mode of minor alleles found in four strains (Figure S5). This pattern is not consistent with a sample drawn from a single panmictic population and the chromosomal genealogy shows that the 32 strain sample was comprised of three distinct clades; one 24 strain clade and two 4-strain clades (Figure 1).

**Fig. 3. Distributions of chromosomal nucleotide diversity statistics.**

Total number of SNPs, nucleotide diversity (mean θ<sub>π</sub>) and mean <i>D<sub>T</sub></i> for all data and for gene regions separated by species and replicon. — **Tab. 2. Total number of SNPs, nucleotide diversity (mean θ_π) and mean *D_T* for all data and for gene regions separated by species and replicon.**

To remove confounding effects that population structure can have on nucleotide diversity, we recalculated diversity statistics using the sample of 24 S. meliloti strains that comprised the largest subpopulation found in our 32 strain sample (Figure 1). Unlike the 32 strain sample, for which the distribution of θ_W was unimodal, the distribution of θ_W values from the 24 strain sample was distinctly bimodal –⁠ with a considerable portion of genes having very low diversity (Figure 3). Similarly, the distribution of D_T values from the 24 strain sample was multi-modal and both far more widely dispersed and more negative than the distribution in the 32 strain sample. This strongly skewed distribution appears to be largely due to genes located on the second half of the chromosome (bp 1,735,000–3,654,135); genes on this half have both low θ_W and low D_T values (Figure 4). At both sides of this region, near the origin and terminus of replication there are sharp increases in the per-gene θ_W and D_T values. Moreover, neither the first half of the chromosome (Figure S6) nor the plasmids show the well defined 24-strain clade seen in the whole-chromosome genealogy (Figure 1). Taken together, the lack of congruence between genealogies constructed from the two chromosome halves and plasmids as well as the sharp breaks in patterns of diversity provide evidence for transfer of large parts of the plasmids and chromosome among strains of S. meliloti.

Tajima's D (<i>D<sub>T</sub></i>) values for protein coding genes along the length of the chromosome. — **Fig. 4. Tajima's D (*D_T*) values for protein coding genes along the length of the chromosome.**

Targets of selection

To identify targets of recent adaptation we used the joint D_TH statistic that provides a relatively powerful test of selection that is robust against demographic effects [30]. In S. medicae, 27 chromosomal, 9 pSMED01, and 4 pSMED02 genes were identified as putative targets of recent selection. Because the 32 strain sample of S. meliloti was strongly affected by population structure we searched for targets of selection in only the 24 strain sample. Moreover, because diversity in this sample was strongly structured along the length of the chromosome, we applied the D_TH test to the first and second halves of the chromosome separately. These analyses identified 15 and 11 genes on the first and second halves of the chromosome, respectively, 11 pSymB, and 5 pSymA genes that harbored signatures of selection (Table S4). None of the genes identified as targets of selection were identified in both species, although fts genes, which are annotated as being involved in cell division and are down-regulated in bacteroids [31] were identified as targets of selection in both species (ftsW in S. medicae, ftsZ1 and ftsZ2 in S. meliloti). Consistent with the lack of between species overlap in the genes that harbor signatures of recent selection, between-species correlations in nucleotide diversity (θ_W, D_T) were low for each of the three replicons (all R<0.26). Such low correlations are unexpected if selective constraints or among-gene variance in mutation rates are important determinants of nucleotide diversity and are similar in the two species.

The genes identified as putative targets of selection have a variety of annotated functions. Some of these functions are related to survival or reproduction either inside of nodules or in the soil environment, i.e. osmotic tolerance and stress (gst9, cysK2, guaB, hutH2, oxyR) and nutrient acquisition (phoU, thuR). Other putatively selected genes have functions that may be directly related to symbiosis, including hemA which is essential for symbiotic nitrogen fixation in many rhizobia, glgC and rkpJ which affect exopolysaccharide biosynthesis or export which is required that is essential for nodulation, as well as ftsW, ftsZ1, ftsZ2.

Discussion

Rhizobia species are important symbionts of legume plants and this symbiosis is responsible for approximately half of all current biologically fixed nitrogen [11]. Because of this importance the biochemical and genetic basis of the symbiosis has been subject to extensive investigation [14]–[16]. To gain insight into genomic diversity segregating within rhizobial species, we sequenced to high coverage the genomes of 32 strains of S. meliloti and 12 strains of S. medicae, the two primary rhizobia symbionts of the model legume Medicago truncatula. Our analyses provide insight into the genome-scale extent of horizontal gene transfer (HGT), the structuring of nucleotide diversity within rhizobial genomes, and identify genes that have been subject to recent adaptive evolution in these species.

Previous analyses of genetic diversity within Sinorhizobium and other groups of rhizobia [20], [32]–[34] have found clear evidence that genes directly involved in nodule formation can be transferred between species; indeed, in prokaryotes genes involved in symbiosis are often found on mobile elements [35]. Consistent with these previous analyses, our whole-genome analyses revealed 97 genes, most of which are clustered on just three symplasmid regions, for which S. medicae and S. meliloti genes had a high ratio of shared to fixed polymorphisms, suggestive of recent horizontal transfer between these species. Genes with potential to alter nodulation or nitrogen fixation are overrepresented among the putatively transferred genes, suggesting that HGT may be important in the evolution of symbiosis. At the same time, the importance of HGT in shaping nucleotide diversity is largely restricted to the plasmids and appears to have very little effect on nucleotide variation in genomic regions outside of nodulation-gene islands.

Neighbor-joining trees constructed from plasmid genes show striking differences between genes that show signatures of HGT and those that do not. For genes that do not show evidence of transfer the branch separating sequences from different species is considerably longer than the branches separating sequences sampled from the same species. By contrast, genes that show evidence of transfer have comparatively short branches separating sequences sampled from the two species and relatively long branches separating sequences sampled from the same species. The short branch separating the sequences sampled from the two species suggests that transfer has occurred relatively recently, followed by the transferred sequence spreading through the recipient species. Interestingly, however, sequences from even these transferred genes are largely monophyletic. The sequence similarity of these transferred regions may facilitate ongoing transfer through homologous recombination [36] –⁠ thereby preserving these as islands of HGT. The single chromosomal gene with a high ratio of shared to fixed polymorphisms indicates that HGT of chromosomal genes is possible, even if HGT doesn't have an important effect on chromosomal sequence diversity.

Nucleotide Diversity

The picture of diversity segregating in S. meliloti is highly dependent upon the composition of the sample. In our sample of 32 strains, the distributions of summary statistics (i.e. θ_W and D_T) are largely consistent with a panmictic population. However, the frequency spectrum of polymorphic sites and genealogical relationships indicate that the 32 strain sample is composed of several distinct subpopulations with 24 strains forming a single well defined clade. The reasons for this substructure are not clear; the members of the 24 strain group were sampled from a wide geographic area (including France, Jordan, Syria, Tunisia), from the full spectrum of geographic locations that the 32 strains were sampled, and from multiple species of host plant (including M. truncatula, M. rigidula, and M. sativa, Table S2 and Figure S2). Regardless of the causes of the population structure, we found a close correspondence between the three major clades identified by whole-genome sequence data and relationships inferred from a 10 locus MLST characterization of population structure within S. meliloti [37], [38]. There were 17 strains in common between the two studies, 15 of these were part of the 24 strain group we identified and were monophyletic in the MLST analysis (13 were members of a single MLST group) and the two additional strains were each members of different MLST clusters. The similarity suggests that MLST data provide robust descriptions of population structure in this species.

Interestingly, patterns of diversity segregating among the 24 strain subpopulation are fundamentally different than those found in the 32 strain sample. Most strikingly, for the 24 strain sample, the two halves of the chromosome harbor distinctly different patterns of diversity, with one half of the chromosome having very low values of D_T and θ_W relative to the other half. There are several possible causes for the two halves of the chromosome having such different patterns. One possibility is that recent HGT or balancing selection in the 24 strain sample has led to excess diversity and intermediate frequency variants on the high-diversity half of the chromosome. In fact, a neighbor-joining tree made with data from only the high-diversity half of the chromosome shows that some of the strains included in the 24 strain group cluster with strains that are not included in this group (Figure S6). The alternative possibility is that the low-diversity half of the chromosome has experienced a recent selective sweep, leading to reduced diversity and an excess of low-frequency variants. A recently introgressed region segregating at low frequency or the inclusion of a sequence from a divergent subpopulation could also explain the excess of low-frequency variants (low D_T values) found on the second half of the chromosome. Either of these possibilities would cause elevated nucleotide diversity. By contrast, nucleotide diversity in this chromosomal half is reduced relative to the rest of the genome.

To determine if the data are more consistent with excess diversity on the first half of the chromosome or low diversity on the second half of the chromosome we compared nucleotide diversity on each chromosome half to diversity found segregating on the plasmids. For S. medicae the chromosome harbors 33 or 60% of the diversity segregating on pSMED02 and pSMED01, respectively. The ratio of chromosomal to plasmid diversity is similar for the S. meliloti chromosome from the 32 strain group and the first half of the chromosome from the 24 strain group, these samples harbor >30% and >50% of the diversity segregating on pSymA and pSymB, respectively. By contrast, the low-diversity, second half of the chromosome in the 24 strain group harbors 6% and 10% of the diversity segregating on pSymA and pSymB respectively. To the extent that plasmid diversity can be used as a reference, these data suggest that the distinctly different patterns of diversity found on the two chromosome halves of the 24-strain group may be due to a recent selective sweep that was strong enough to reduce diversity along the entire 1.8 Mb half of the chromosome through genetic hitchhiking [39].

If a selective sweep is responsible for the low diversity on the second half of the chromosome, the sharp borders near the origin and terminus of replication suggests that recombination at the borders is much higher than recombination along the second half of the chromosome or that selection favored the entire region, perhaps due to epistasis between genes located near the borders of the low-diversity region. If a selective sweep is the reason for the reduced diversity than it indicates that genetic hitchhiking along with selective variants could be an extremely important force shaping diversity within natural populations of prokaryotic species and may contribute to driving the divergence between prokaryotic lineages [40].

Targets of Selection

We identified 82 genes that bear a signature of recent adaptive evolution. These species have similar geographic ranges, life history, and share a common host plant, and as such they may be expected to experience similar selective forces. Nevertheless, the targets of selection in the two species show almost no overlap –⁠ no orthologous genes were identified as targets of selection in both species although fts genes, involved in cell division, are identified as targets in both species. The lack of overlap in the targets of selection suggest that these two ecologically similar species either experience very different selective forces or that selection acting similarly at the phenotypic level acts on very different genetic targets.

It is notable that no fix, nif, nod, nol, noe, or exo genes, which mutational screens identified as necessary for nodule establishment and nitrogen fixation [14]–[16], are among the genes we identified as bearing a signature of a recent selective sweep. However, nearly all of the nif, nod, and approximately one-half of the fix genes in the Sinorhizobium genome showed evidence of HGT or had no reciprocal best sequence match in the other species. Because these genes had no reciprocal best sequence matches we excluded them from our analyses of selection and therefore, their absence from the list of selected genes does not mean they have not been the subject of recent adaptation.

Summary

Population genetic analyses of nucleotide diversity segregating within Sinorhizobium medicae and S. meliloti have provided unprecedented insight into the evolutionary history of these ecologically important facultative symbionts. While previous analyses have detected evidence for horizontal gene transfer between these species, our data reveal that gene transfer is restricted almost exclusively to plasmid genes and that the plasmid regions that show evidence of transfer have less interspecific divergence than other genomic regions. Interestingly, nucleotide variation segregating within a 24-strain subpopulation of S. meliloti is highly structured along the chromosome, with one half of the chromosome harboring approximately one-fifth as much diversity as the other. The causes of the difference between the two chromosome halves may be a selective sweep coupled with extensive hitchhiking, if this is correct it would suggest that bouts of strong selection may be important in driving the divergence of bacterial species. Finally, we've identified genes that bear a signature of having evolved in response to recent positive selection. Functional characterization of these genes will provide insight into the selective forces that drive rhizobial adaptation.

Methods

We used Illumina sequencing technology to sequence the genomes of 32 strains of S. meliloti and 12 strains of S. medicae. These strains were chosen to capture diversity found within the USDA-ARS Rhizobium Germplasm Collection [38], as representative of different multi-locus genotypes [38], or because they had been recently sampled from natural populations and used in experiments to investigate interactions between Sinorhizobium and M. truncatula [41]. From each strain, DNA was extracted from culture grown cells using the Wizard Genomic DNA Purification kit (Promega Corp. Madison, WI, USA), with further purification by phenol extraction. DNA was then used to construct Illumina paired end libraries using Illumina's phusion-based library kits following the manufacturer's protocols. Insert sizes averaged 332 nt (range = 245 nt to 443 nt). Four samples were multiplexed per lane and sequenced on Illumina GAIIx machines following the manufacturer's protocols. Samples averaged just over 1 Gb of sequence (range = 724 Mb to 1584 Mb) translating into an average and minimum coverage of 174× and 108×, respectively, of the ∼6.7 Mb genome before aligning reads.

For SNP discovery, reads were aligned to the genome sequence of either S. meliloti Rm1021 [21], pSymA megaplasmid [42], pSymB megaplasmid [43] and the accessory plasmids pSmeSM11a [44], pSMeSM11b [45] and pRm1132f [46], or S. medicae WSM419 chromosome and plasmids pSMED01, pSMED02 and pSMED03 [22], using GSNAP [47] with a 91% minimum identity using the Alpheus pipeline [48]. For this work, only SNPs discovered in the alignment to the chromosome or the megaplasmids (pSymA, pSymB, pSMED01, pSMED02) were used due to poor coverage of the accessory plasmids. Nucleotide identity at a site was called only if that site was covered by ≥10 but <500 uniquely aligned reads (i.e. reads had maximum identity to only a single genomic location) and the nucleotide was supported by ≥70% of unique reads. Positions that did not meet these criteria were treated as ambiguous (N). Sequence reads are available at SRP009881, and SNP data are available for download at http://medicagohapmap.org/.

To evaluate the accuracy of base calls we PCR amplified and Sanger sequenced 25 loci from 4–6 strains (including 3 S. meliloti and 3 S. medicae) (Table S5). For the 42,953 bp of sequence for which we had both high-quality Sanger and Illumina data that met our coverage criteria there were 157 variants identified by both Sanger and Illumina, 3 variants identified in Sanger but not Illumina, and no variants identified in Illumina but not Sanger data.

Sequences of protein-coding genes were constructed using the IMG version of the WSM419 annotation downloaded on 1 December 2010 and the Rhizobase version of the Rm1021 annotation downloaded on 19 August 2010. For gene-based analyses requiring an outgroup, Rm1021 and WSM419 genes with ≥80% amino acid similarity across ≥80% of their length that were also bidirectional best hits were identified as homologous using the MaGe phyloprofile tool [49]. Gene sequences from the resequenced strains were aligned prior to analyses using the profile alignment tool in ClustalW [50].

Statistical analyses

We calculated nucleotide diversity for 10 kb non-overlapping windows and for each gene model for which we had data for >90% of the gene length for ≥80% of the strains. The number of replacement and synonymous sites for each gene within each species were estimated using the polydNdS program of the libsequence “analysis” software package [51]. Tajima's D (D_T) [52], the average number of segregating sites (θ_W) [53] and the average pairwise number of segregating sites (θ_π) [54] were all estimated using the compute program in libsequence. Fay and Wu's H (H) [55] was estimated using a custom libsequence-based program. Because D_T is not defined for genes that have no polymorphism and the distribution of D_T is highly skewed for genes with a single segregating site, we excluded genes with <2 segregating sites from the analysis. The number of fixed differences between S. medicae and S. meliloti were calculated on biallelic sites in alignments of orthologs using the sharedPoly program (in libsequence). Summary statistics for each of the annotated genes which met coverage criteria are in Dataset S1.

We used the joint D_TH test [30] to look for genes that have experienced recent selective sweeps, considering genes in the lower 5% tail of the distribution for both D_T and H as likely targets of selection. We restricted these tests of selection to genes with unambiguous nucleotide calls for >90% of the length from ≥80% of the strains and for which there was no evidence for horizontal gene transfer. For defining the 5% tails we took the ratio of genes that met the coverage and HGT requirement to the total number of genes. Genes with <2 SNPs or without a value for H were excluded.

We identified genes likely to have experienced recent horizontal gene transfer by comparing the ratio of polymorphisms that were shared between species to fixed differences between species. Based on the whole-genome distribution of this ratio (Figure S3) we identified putatively transferred genes as those with a ratio of shared polymorphisms to fixed differences >0.2.

Genealogical relationships

To characterize the genealogical relationships among strains we constructed genealogies using the Neighbor joining algorithm [56] implemented in the dnadist and neighbor programs in Phylip [57] with the F84 model of DNA evolution [58]. Genealogies were constructed using concatenated gene sequences for the chromosome and each of the plasmids separately (2,741 chromosomal genes, 2,668,564 bp; 408 genes on pSymA/pSmed02, 416,009 bp and 1,049 pSymB/pSmed01 genes, 1,084,937 bp). Statistical support for clades in whole-replicon trees was evaluated using 200 bootstrap replicates. NJ trees for genes bearing a signature of horizontal gene transfer were constructed using similar methods, with statistical supported evaluated using 400 bootstrap replicates.

Origin and terminus of replication

Several analyses were conducted separately for the first and second halves of the S. meliloti chromosome. In these cases, we used position 1,735,000 as the dividing line: this position seemed to correspond to the location of the sharp change in D_T along the chromosome. This is also the location of a change in sign of the GC skew in the reference genome, indicating that the terminus of replication is located near this position [59] (Figure S7). GC skew was calculated using a custom R script [60] on all nucleotide positions in 10 kb sliding windows with a 5 kb step. The origin of replication for S. meliloti Rm1021 (the reference strain) has been experimentally determined to be near position 0 [61].

Supporting Information

Zdroje

1. GilbertJA, DupontCL (2011) Microbial metagenomics: beyond the genome. Annu Rev Marine Sci 3 : 347–371.

2. JaureguyF, LandraudL, PassetV, DiancourtL, FrapyE, et al. (2008) Phylogenetic and genomic diversity of human bacteremic Escherichia coli strains. BMC Genomics 9 : 560.

3. TettelinH, MasignaniV, CieslewiczMJ, DonatiC, MediniD, et al. (2005) Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome.”. Proc Natl Acad Sci USA 102 : 13950–13955.

4. LefébureT, StanhopeMJ (2007) Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8: R71.

5. TianCF, ZhouYJ, ZhangYM, LiQQ, ZhangYZ, et al. (2012) Comparative genomics of rhizobia nodulating soybean suggests extensive recruitment of lineage-specific genes in adaptations. Proc Natl Acad Sci USA 109 : 8629–8634.

6. TenaillonO, SkurnikD, PicardB, DenamurE (2010) The population genetics of commensal Escherichia coli. Nat Rev Microbiol 8 : 207–217.

7. TouchonM, HoedeC, TenaillonO, BarbeV, BaeriswylS, et al. (2009) Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5: e1000344.

8. CroucherNJ, HarrisSR, FraserC, QuailMA, BurtonJ, et al. (2011) Rapid pneumococcal evolution in response to clinical interventions. Science 331 : 430–434.

9. TakunoS, KadoT, SuginoRP, NakhlehL, InnanH (2012) Population genomics in b acteria: a case study of Staphylococcus aureus. Mol Biol and Evol 29 : 797–809.

10. FalushD (2009) Toward the use of genomics to study microevolutionary change in bacteria. PLoS Genet 5: e1000627.

11. GruberN, GallowayJN (2008) An Earth-system perspective of the global nitrogen cycle. Nature 451 : 293–296.

12. VanceCP (2001) Symbiotic nitrogen fixation and phosphorus acquisition. Plant nutrition in a world of declining renewable resources. Plant Physiol 127 : 390.

13. YoungND, DebelleF, OldroydGED, GeurtsR, CannonSB, et al. (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480 : 520–524.

14. GibsonKE, KobayashiH, WalkerGC (2008) Molecular determinants of a symbiotic chronic infection. Annu Rev Genet 42 : 413–441.

15. CooperJE (2007) Early interactions between legumes and rhizobia: disclosing complexity in a molecular dialogue. J Appl Microbiol 103 : 1355–1365.

16. JonesKM, KobayashiH, DaviesBW, TagaME, WalkerGC (2007) How rhizobial symbionts invade plants: the Sinorhizobium–Medicago model. Nat Rev Microbiol 5 : 619–633.

17. BaillyX, OlivieriI, De MitaSÉ, Cleyet-MarelJC, BenaG (2006) Recombination and selection shape the molecular diversity pattern of nitrogen-fixing Sinorhizobium sp. associated to Medicago. Mol Ecol 15 : 2719–2734.

18. FriesenML, MathiasA (2010) Mixed infections may promote diversification of mutualistic symbionts: why are there ineffective rhizobia? J Evol Biol 23 : 323–334.

19. HeathKD, TiffinP (2009) Stabilizing mechanisms in a legume–rhizobium mutualism. Evolution 63 : 652–662.

20. BaillyX, OlivieriI, BrunelB, Cleyet-MarelJ-C, BénaG (2007) Horizontal gene transfer and homologous recombination drive the evolution of the nitrogen-fixing symbionts of Medicago species. J Bacteriol 189 : 5223–5236.

21. GalibertF, FinanTM, LongSR, PühlerA, AbolaP, et al. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti. Science 293 : 668–672.

22. ReeveW, ChainP, O'HaraG, ArdleyJ, NandesenaK, et al. (2010) Complete genome sequence of the Medicago microsymbiont Ensifer (Sinorhizobium) medicae strain WSM419. Standards in Genomic Sciences 2 : 77.

23. KuhnS, StiensM, PühlerA, SchlüterA (2008) Prevalence of pSmeSM11a-like plasmids in indigenous Sinorhizobium meliloti strains isolated in the course of a field release experiment with genetically modified S. meliloti strains. FEMS Microbiol Ecol 63 : 118–131.

24. MacLeanAM, FinanTM, SadowskyMJ (2007) Genomes of the symbiotic nitrogen-fixing bacteria of legumes. Plant Physiol 144 : 615–622.

25. BaillyX, GiuntiniE, SextonMC, LowerRP, HarrisonPW, et al. (2011) Population genomics of Sinorhizobium medicae based on low-coverage sequencing of sympatric isolates. ISME J 5 : 1722–1734.

26. BenaG, LyetA, HuguetT, OlivieriI (2005) Medicago –⁠ Sinorhizobium symbiotic specificity evolution and the geographic expansion of Medicago. J Evol Biol 18 : 1547–1558.

27. RomeS, FernandezMP, BrunelB, NormandP, Cleyet-MarelJ-C (1996) Sinorhizobium medicae sp. nov., isolated from annual Medicago spp. Int J Syst Bacteriol 46 : 972–980.

28. SilvaC, KanFL, Martínez-RomeroE (2007) Population genetic structure of Sinorhizobium meliloti and S. medicae isolated from nodules of Medicago spp. in Mexico. FEMS Microbiol Ecol 60 : 477–489.

29. GiuntiniE, MengoniA, De FilippoC, CavalieriD, Aubin-HorthN, et al. (2005) Large-scale genetic variation of the symbiosis-required megaplasmid pSymA revealed by comparative genomic analysis of Sinorhizobium meliloti natural strains. BMC Genomics 6 : 158.

30. ZengK, FuY-X, ShiS, WuC-I (2006) Statistical tests for detecting positive selection by utilizing high-frequency variants. Genetics 174 : 1431–1439.

31. BarnettMJ, TomanCJ, FisherRF, LongSR (2004) A dual-genome symbiosis chip for coordinate study of signal exchange and development in a prokaryote–host interaction. Proc Natl Acad Sci USA 101 : 16636–16641.

32. SullivanJT, PatrickHN, LowtherWL, ScottDB, RonsonCW (1995) Nodulating strains of Rhizobium loti arise through chromosomal symbiotic gene transfer in the environment. Proc Natl Acad Sci USA 92 : 8985.

33. SunS, GuoH, XuJ (2006) Multiple gene genealogical analyses reveal both common and distinct population genetic patterns among replicons in the nitrogen-fixing bacterium Sinorhizobium meliloti. Microbiology 152 : 3245–3259.

34. VinuesaP, SilvaaC, WernerbD, Martınez-RomeroaE (2005) Population genetics and phylogenetic inference in bacterial molecular systematics: the roles of migration and recombination in Bradyrhizobium species cohesion and delineation. Mol Phyl Evol 34 : 29–54.

35. OchmanH, MoranNA (2001) Genes lost and genes found: evolution of bacterial pathogenesis and symbiosis. Science 292 : 1096–1099.

36. MaticI, TaddeiF, RadmanM (1996) Genetic barriers among bacteria. Trends in Microbiology 4 : 69–73.

37. van BerkumP, EliaP, EardlyBD (2006) Multilocus sequence typing as an approach for population analysis of Medicago-nodulating rhizobia. J Bacteriol 188 : 5570–5577.

38. van BerkumP, BadriY, EliaP, AouaniME, EardlyBD (2007) Chromosomal and symbiotic relationships of rhizobia nodulating Medicago truncatula and M. laciniata. Appl Environ Microbiol 73 : 7587–7604.

39. SmithJM, HaighJ (1974) The hitch-hiking effect of a favourable gene. Genet Research 23 : 23–35.

40. CohanFM (2001) Bacterial species and speciation. Syst Biol 50 : 513–524.

41. HeathKD (2010) Intergenomic epsistasis and coevolutionary constraint in plants and rhizobia. Evolution 64 : 1446–1458.

42. BarnettMJ, FisherRF, JonesT, KompC, AbolaAP, et al. (2001) Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid. Proc Natl Acad Sci USA 98 : 9883–9888.

43. FinanTM, WeidnerS, WongK, BuhrmesterJ, ChainP, et al. (2001) The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont Sinorhizobium meliloti. Proc Nat Acad of Sci USA 98 : 9889–9894.

44. StiensM, SchneikerS, KellerM, KuhnS, PühlerA, et al. (2006) Sequence analysis of the 144-kilobase accessory plasmid pSmeSM11a, isolated from a dominant Sinorhizobium meliloti strain identified during a long-term field release experiment. Appl Environ Microbiol 72 : 3662–3672.

45. StiensM, SchneikerS, PühlerA, SchlüterA (2007) Sequence analysis of the 181-kb accessory plasmid pSmeSM11b, isolated from a dominant Sinorhizobium meliloti strain identified during a long-term field release experiment. FEMS Microbiol Lett 271 : 297–309.

46. BarranLR, RitchotN, BromfieldESP (2001) Sinorhizobium meliloti plasmid pRm1132f replicates by a rolling-circle mechanism. J Bacteriol 183 : 2704–2708.

47. WuTD, NacuS (2010) Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26 : 873–881.

48. MillerNA, KingsmoreSF, FarmerA, LangleyRJ, MudgeJ, et al. (2008) Management of high-throughput DNA sequencing projects: Alpheus. J Comput Sci Syst Biol 1 : 132–132.

49. VallenetD, EngelenS, MornicoD, CruveillerS, FleuryL, et al. (2009) MicroScope: a platform for microbial genome annotation and comparative genomics. Database (Oxford) 2009: bap021.

50. LarkinMA, BlackshieldsG, BrownNP, ChennaR, McGettiganPA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23 : 2947–2948.

51. ThorntonK (2003) libsequence: a C++ class library for evolutionary genetic analysis. Bioinformatics 19 : 2325–2327 doi:10.1093/bioinformatics/btg316.

52. TajimaF (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123 : 585–595.

53. WattersonGA (1975) On the number of segregating sites in genetical models without recombination. Theor Popul Biol 7 : 256–276.

54. KimuraM (1968) Evolutionary rate at the molecular level. Nature 217 : 624–626.

55. FayJC, WuC-I (2000) Hitchhiking under positive Darwinian selection. Genetics 155 : 1405–1413.

56. SaitouN, NeiM (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4 : 406–425.

57. FelsensteinJ (1989) PHYLIP –⁠ Phylogeny Inference Package (Version 3.2). Cladistics 5 : 164.

58. FelsensteinJ, ChurchillGA (1996) A hidden markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13 : 93–104.

59. CapelaD, Barloy-HublerF, GouzyJ, BotheG, AmpeF, et al. (2001) Analysis of the Chromosome Sequence of the Legume Symbiont Sinorhizobium meliloti Strain 1021. Proc Nat. Acad Sci USA 98 : 9877–9882.

60. R Development Core Team (2011) R: A language and environment for statistical computing. Vienna, Austria. http://www.R-project.org/.

61. SibleyCD, MacLellanSR, FinanTM (2006) The Sinorhizobium meliloti chromosomal origin of replication. Microbiology 152 : 443–455.