#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate–Invertebrate Gap


In animals, the population genomic literature is dominated by two taxa, namely mammals and drosophilids, in which fully sequenced, well-annotated genomes have been available for years. Data from other metazoan phyla are scarce, probably because the vast majority of living species still lack a closely related reference genome. Here we achieve de novo, reference-free population genomic analysis from wild samples in five non-model animal species, based on next-generation sequencing transcriptome data. We introduce a pipe-line for cDNA assembly, read mapping, SNP/genotype calling, and data cleaning, with specific focus on the issue of hidden paralogy detection. In two species for which a reference genome is available, similar results were obtained whether the reference was used or not, demonstrating the robustness of our de novo inferences. The population genomic profile of a hare, a turtle, an oyster, a tunicate, and a termite were found to be intermediate between those of human and Drosophila, indicating that the discordant genomic diversity patterns that have been reported between these two species do not reflect a generalized vertebrate versus invertebrate gap. The genomic average diversity was generally higher in invertebrates than in vertebrates (with the notable exception of termite), in agreement with the notion that population size tends to be larger in the former than in the latter. The non-synonymous to synonymous ratio, however, did not differ significantly between vertebrates and invertebrates, even though it was negatively correlated with genetic diversity within each of the two groups. This study opens promising perspective regarding genome-wide population analyses of non-model organisms and the influence of population size on non-synonymous versus synonymous diversity.


Vyšlo v časopise: Reference-Free Population Genomics from Next-Generation Transcriptome Data and the Vertebrate–Invertebrate Gap. PLoS Genet 9(4): e32767. doi:10.1371/journal.pgen.1003457
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003457

Souhrn

In animals, the population genomic literature is dominated by two taxa, namely mammals and drosophilids, in which fully sequenced, well-annotated genomes have been available for years. Data from other metazoan phyla are scarce, probably because the vast majority of living species still lack a closely related reference genome. Here we achieve de novo, reference-free population genomic analysis from wild samples in five non-model animal species, based on next-generation sequencing transcriptome data. We introduce a pipe-line for cDNA assembly, read mapping, SNP/genotype calling, and data cleaning, with specific focus on the issue of hidden paralogy detection. In two species for which a reference genome is available, similar results were obtained whether the reference was used or not, demonstrating the robustness of our de novo inferences. The population genomic profile of a hare, a turtle, an oyster, a tunicate, and a termite were found to be intermediate between those of human and Drosophila, indicating that the discordant genomic diversity patterns that have been reported between these two species do not reflect a generalized vertebrate versus invertebrate gap. The genomic average diversity was generally higher in invertebrates than in vertebrates (with the notable exception of termite), in agreement with the notion that population size tends to be larger in the former than in the latter. The non-synonymous to synonymous ratio, however, did not differ significantly between vertebrates and invertebrates, even though it was negatively correlated with genetic diversity within each of the two groups. This study opens promising perspective regarding genome-wide population analyses of non-model organisms and the influence of population size on non-synonymous versus synonymous diversity.


Zdroje

1. CharlesworthB (2010) Molecular population genomics: a short history. Genet Res 92: 397–411.

2. LitiG, CarterDM, MosesAM, WarringerJ, PartsL, et al. (2009) Population genomics of domestic and wild yeasts. Nature 458: 337–341.

3. SlotteT, BataillonT, HansenTT, St OngeK, WrightSI, et al. (2011) Genomic determinants of protein evolution and polymorphism in Arabidopsis. Genome Biol Evol 3: 1210–1219.

4. SabetiPC, VarillyP, FryB, LohmuellerJ, HostetterE, et al. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449: 913–918.

5. MackayTF, RichardsS, StoneEA, BarbadillaA, AyrolesJF, et al. (2012) The Drosophila melanogaster genetic reference panel. Nature 482: 173–178.

6. ShapiroJA, HuangW, ZhangC, HubiszMJ, LuJ, et al. (2007) Adaptive genic evolution in the Drosophila genomes. Proc Natl Acad Sci U S A 104: 2271–2276.

7. LiWH, SadlerLA (1991) Low nucleotide diversity in man. Genetics 129: 513–523.

8. McDonaldJH, KreitmanM (1991) Adaptive protein evolution at the Adh locus in Drosophila. Nature 351: 652–654.

9. BierneN, Eyre-WalkerA (2004) The genomic rate of adaptive amino acid substitution in Drosophila. Mol Biol Evol 21: 1350–1360.

10. BustamanteCD, Fledel-AlonA, WilliamsonS, NielsenR, HubiszMT, et al. (2005) Natural selection on protein-coding genes in the human genome. Nature 437: 1153–1157.

11. ZhangL, LiWH (2005) Human SNPs reveal no evidence of frequent positive selection. Mol Biol Evol 22: 2504–2507.

12. WelchJJ (2006) Estimating the genomewide rate of adaptive protein evolution in Drosophila. Genetics 173: 821–837.

13. FayJC, WyckoffGJ, WuCI (2001) Positive and negative selection on the human genome. Genetics 158: 1227–1234.

14. Eyre-WalkerA (2006) The genomic rate of adaptive evolution. Trends Ecol Evol 21: 569–575.

15. BegunDJ, HollowayAK, StevensK, HillierLW, PohYP, et al. (2007) Population genomics: whole-genome analysis of polymorphism and divergence in Drosophila simulans. PLoS Biol 5: e310 doi:10.1371/journal.pbio.0050310.

16. HvilsomC, QianY, BataillonT, LiY, MailundT, et al. (2012) Extensive X-linked adaptive evolution in central chimpanzees. Proc Natl Acad Sci U S A 109: 2054–2059.

17. TsagkogeorgaG, CahaisV, GaltierN (2012) The population genomics of a fast evolver: high levels of diversity, functional constraint and molecular adaptation in the tunicate Ciona intestinalis. Genome Biol Evol 4: 740–749.

18. BazinE, GléminS, GaltierN (2006) Population size does not influence mitochondrial genetic diversity in animals. Science 312: 570–571.

19. LefflerEM, BullaugheyK, MatuteDR, MeyerWK, SégurelL, et al. (2012) Revisiting an old riddle: what determines genetic diversity levels within species? PLoS Biol 10: e10001388 PLoS Biol 10: e10001388..

20. PopadinK, PolishchukLV, MamirovaL, KnorreD, GunbinK (2007) Accumulation of slightly deleterious mutations in mitochondrial protein-coding genes of large versus small mammals. Proc Natl Acad Sci USA 104: 13390–13395.

21. NikolaevSI, Montoya-BurgosJI, PopadinK, ParandL, MarguliesEH, et al. (2007) Life-history traits drive the evolutionary rates of mammalian coding and noncoding genomic elements. Proc Natl Acad Sci USA 104: 20443–20448.

22. Phifer-RixeyM, BonhommeF, BoursotP, ChurchillGA, PiálekJ, TuckerPK, NachmanMW (2012) Adaptive evolution and effective population size in wild house mice. Mol Biol Evol 29: 2949–2955.

23. StrasburgJL, KaneNC, RaduskiAR, BoninA, MichelmoreR, RiesebergLH (2011) Effective population size is positively correlated with levels of adaptive divergence among annual sunflowers. Mol Biol Evol 28: 1569–1580.

24. CarneiroM, AlbertFW, Melo-FerreiraJ, GaltierN, GayralP, et al. (2012) Evidence for widespread positive and purifying selection across the european rabbit (Oryctolagus cuniculus) genome. Mol Biol Evol 29: 1837–1849.

25. DamuthJ (1987) Interspecific allometry of population-density in mammals and other animals – the independence of body-mass and population energy use. Biol J Linn Soc 31: 193–246.

26. PerryGH, MelstedP, MarioniJC, WangY, BainerR, et al. (2012) Comparative RNA sequencing reveals substantial genetic variation in endangered primates. Genome Res 22: 602–610.

27. VeraJC, WheatCW, FescemyerHW, FrilanderMJ, CrawfordDL, et al. (2008) Rapid transcriptome characterization for a nonmodel organism using 454 pyrosequencing. Mol Ecol 17: 1636–1647.

28. O'NeilST, DzurisinJD, CarmichaelRD, LoboNF, EmrichSJ, HellmannJJ (2010) Population-level transcriptome sequencing of nonmodel organisms Erynnis propertius and Papilio zelicaon. BMC Genomics 11: 310.

29. ChenS, YangP, JiangF, WeiY, MaZ, et al. (2010) De novo analysis of transcriptome dynamics in the migratory locust during the development of phase trait. PLoS ONE 5: e15633 doi:10.1371/journal.pone.0015633..

30. RenautS, NolteAW, BernatchezL (2010) Mining transcriptome sequences towards identifying adaptive single nucleotide polymorphisms in lake whitefish species pairs (Coregonus spp. Salmonidae). Mol Ecol 19: 115–131.

31. WolfJBW, BayerT, HauboldB, SchilhabelM, RosenstielP, et al. (2010) Nucleotide divergence versus gene expression differentiation: comparative transcriptome sequencing in natural isolates from the carrion crow and its hybrid zone with the hooded crow. Mol Ecol 19: 162–175.

32. GagnairePA, NormandeauE, BernatchezL (2012) Comparative genomics reveals adaptive protein evolution and a possible cytonuclear incompatibility between european and american eels. Mol Biol Evol (in press)..

33. HollandLZ, Gibson-BrownJJ (2003) The Ciona intestinalis genome: when the constraints are off. Bioessays 25: 529–532.

34. CaputiL, AndreakisN, MastrototaroF, CirinoP, VassilloM, et al. (2007) Cryptic speciation in a model invertebrate chordate. Proc Natl Acad Sci U S A 104: 9364–9369.

35. NydamML, HarrisonRG (2010) Polymorphism and divergence within the ascidian genus Ciona. Mol Phylogenet Evol 56: 718–726.

36. NevoE (1978) Genetic variation in natural populations: patterns and theory. Theoret Pop Biol 13: 121–177.

37. SauvageC, BierneN, LapègueS, BoudryP (2007) Single Nucleotide polymorphisms and their relationship to codon usage bias in the Pacific oyster Crassostrea gigas. Gene 406: 13–22.

38. SmallKS, BrudnoM, HillMM, SidowA (2007) Extreme genomic variation in a natural population. Proc Natl Acad Sci U S A 104: 5698–5703.

39. Melo-FerreiraJ, AlvesPC, RochaJ, FerrandN, BoursotP (2011) Interspecific X-chromosome and mitochondrial DNA introgression in the Iberian hare: selection or allele surfing? Evolution 65: 1956–1968.

40. LenkP, FritzU, JogerU, WinkM (1999) Mitochondrial phylogeography of the European pond turtle, Emys orbicularis (Linnaeus 1758). Mol Ecol 8: 1911–1922.

41. DehalP, SatouY, CampbellRK, ChapmanJ, DegnanB, et al. (2002) The draft genome of Ciona intestinalis: insights into chordate and vertebrate origins. Science 298: 2157–2167.

42. CahaisV, GayralP, TsagkogeorgaG, Melo-FerreiraJ, BallenghienM, et al. (2012) Reference-free transcriptome assembly in non-model animals from next generation sequencing data. Mol Ecol Resources 12: 834–845..

43. Melo-FerreiraJ, BoursotP, CarneiroM, EstevesPJ, FareloL, et al. (2012) Recurrent introgression of mitochondrial DNA among hares (Lepus spp.) revealed by species-tree inference and coalescent simulations. Syst Biol 61: 367–381.

44. ZhanA, MacisaacHJ, CristescuME (2010) Invasion genetics of the Ciona intestinalis species complex: from regional endemism to global homogeneity. Mol Ecol 19: 4678–4694.

45. LauneyS, LeduC, BoudryP, BonhommeF, Naciri-GravenY (2002) Geographic structure in the European flat oyster (Ostrea edulis L.) as revealed by microsatellite polymorphism. J Hered 93: 331–351.

46. Melo-FerreiraJ, AlvesPC, FreitasH, FerrandN, BoursotP (2009) The genomic legacy from the extinct Lepus timidus to the three hare species of Iberia: contrast between mtDNA, sex chromosomes and autosomes. Mol Ecol 18: 2643–2658.

47. DeHeerCJ, KutnikM, VargoEL, BagnèresAG (2005) The breeding system and population structure of the termite Reticulitermes grassei in Southwestern France. Heredity 95: 408–415.

48. DrummondDA, BloomJD, AdamiC, WilkeCO, ArnoldFH (2005) Why highly expressed proteins evolve slowly. Proc Natl Acad Sci U S A 102: 14338–14343.

49. LiH, HandsakerB, WysokerA, FennellT, RuanJ, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25: 2078–2079.

50. FayJC, WyckoffGJ, WuCI (2002) Testing the neutral theory of molecular evolution with genomic data from Drosophila. Nature 415: 1024–1026.

51. HudsonRR (1991) Gene genealogies and the coalescent process. Oxford Surveys in Evolutionary Biology 7: 1–44.

52. LiH (2011) A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27: 2987–2993.

53. Eyre-WalkerA, KeightleyPD (2009) Estimating the rate of adaptive molecular evolution in the presence of slightly deleterious mutations and population size change. Mol Biol Evol 26: 2097–2108.

54. GossmannTI, SongB-H, WindsorAJ, Mitchell-OldsT, DixonCJ, et al. (2010) Genome wide analyses reveal little evidence for adaptive evolution in many plant species. Mol Biol Evol 27: 1822–1832.

55. GossmannTI, KeightleyPD, Eyre-WalkerA (2012) The effect of variation in the effective population size on the rate of adaptive molecular evolution in eukaryotes. Genome Biol Evol 4: 658–667.

56. GillespieJH (2001) Is the population size of a species relevant to its evolution? Evolution 55: 2161–2169.

57. WeissmanDB, BartonNH (2012) Limits to the rate of adaptive substitution in sexual populations. PLoS Genet 8: e1002740 doi:10.1371/journal.pgen.1002740.

58. BetancourtAJ, Blanco-MartinB, CharlesworthB (2012) The relation between the neutrality index for mitochondrial genes and the distribution of mutational effects on fitness. Evolution 66: 2427–2438.

59. DouJ, ZhaoX, FuX, JiaoW, WangN, et al. (2012) Reference-free SNP calling: Improved accuracy by preventing incorrect calls from repetitive genomic regions. Biol Dir 7: 17..

60. NielsenR, PaulJS, AlbrechtsenA, SongYS (2011) Genotype and SNP calling from next-generation sequencing data. Nat Rev Genet 12: 443–451.

61. DeHeerCJ, VargoEL (2005) An indirect test of inbreeding depression in the termites Reticulitermes flavipes and Reticulitermes virginicus. Behav Ecol Sociobiol 59: 753–761.

62. LynchM (2010) Evolution of the mutation rate. Trends Genet 26: 345–352.

63. KongA, FriggeML, MassonG, BesenbacherS, SulemP, et al. (2012) Rate of de novo mutations and the importance of father's age to disease risk. Nature 488: 471–475.

64. DuretL (2002) Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 12: 640–649.

65. DuretL, GaltierN (2009) Biased gene conversion and the evolution of mammalian genomic landscapes. Annu Rev Genomics Hum Genet 10: 285–311.

66. WarneckeT, WeberCC, HurstLD (2009) Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence. Biochem Soc Trans 37: 756–761.

67. PiganeauG, Eyre-WalkerA, JancekS, GrimsleyN, MoreauH (2011) How and why DNA barcodes underestimate the diversity of microbial eukaryotes. PLoS ONE 6: e16342 doi:10.1371/journal.pone.0016342..

68. FernándezA, LynchM (2011) Non-adaptive origins of interactome complexity. Nature 474: 502–505.

69. LynchM (2012) The evolution of multimeric protein assemblages. Mol Biol Evol 29: 1353–1366.

70. LourençoJ, GaltierN, GléminS (2011) Complexity, pleiotropy, and the fitness effect of mutations. Evolution 65: 1559–1571.

71. MartinG, LenormandT (2006) A general multivariate extension of Fisher's geometrical model and the distribution of mutation fitness effects across species. Evolution 60: 893–907.

72. Hedgecock D (1994) Does variance in reproductive success limit effective population sizes of marine organisms?, pp. 122–134 in Genetics and evolution of aquatic organisms, A. R. Beaumont ed, Chapman & Hall, London, UK.

73. EldonB, WakeleyJ (2006) Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172: 2621–2633.

74. EldonB, WakeleyJ (2009) Coalescence times and FST under a skewed offspring distribution among individuals in a population. Genetics 181: 615–629.

75. SargsyanO, WakeleyJ (2008) A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theor Popul Biol 74: 104–114.

76. DerR, EpsteinC, PlotkinJB (2012) Dynamics of neutral and selected alleles when the offspring distribution is skewed. Genetics 191: 1331–1344.

77. Lewontin RC (1974) The genetic basis of evolutionary change. Columbia University Press, 560 New York.

78. LewontinRC (2002) Directions in evolutionary biology. Annu Rev Genet 36: 1–18.

79. GayralP, WeinertL, ChiariY, TsagkogeorgaG, BallenghienM, et al. (2011) Next-generation sequencing of transcriptomes: a guide to RNA isolation in nonmodel animals. Mol Ecol Resources 11: 650–661.

80. SimpsonJT, WongK, JackmanSD, ScheinJE, JonesSJ, et al. (2009) ABySS: a parallel assembler for short read sequence data. Genome Res 19: 1117–1123.

81. HuangX, MadanA (1999) CAP3: A DNA sequence assembly program. Genome Res 9: 868–877.

82. LiH, DurbinR (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760.

83. WagnerJR, GeB, PokholokD, GundersonKL, PastinenT, et al. (2010) Computational analysis of whole-genome differential allelic expression data in human. PLoS Comput Biol 6: e1000849 doi:10.1371/journal.pcbi.1000849..

84. HeinrichV, StangeJ, DickhausT, ImkellerP, KrügerU, et al. (2012) The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process. Nucleic Acids Res 40: 2426–2431.

85. DeVealeB, van der KooyD, BabakT (2012) Critical evaluation of imprinted gene expression by RNA-seq: a new perspective. PLoS Genet 8: e1002600 doi:10.1371/journal.pgen.1002600..

86. RanwezV, HarispeS, DelsucF, DouzeryEJ (2011) MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons. PLoS ONE 6: e22594 doi:10.1371/journal.pone.0022594..

87. DutheilJ, GaillardS, BazinE, GleminS, RanwezV, et al. (2006) Bio++: a set of C++ libraries for sequence analysis, phylogenetics, molecular evolution and population genetics. BMC Bioinform 7: 188..

88. RandDM, KannLM (1996) Excess amino acid polymorphism in mitochondrial DNA: contrasts among genes from Drosophila, mice, and humans. Mol Biol Evol 13: 735–748.

89. SmithNGC, Eyre-WalkerA (2002) Adaptive protein evolution in Drosophila. Nature 415: 1022–1024.

90. HernandezRD, WilliamsonSH, ZhuL, BustamanteCD (2007) Context-dependent mutation rates may cause spurious signatures of a fixation bias favoring higher GC-content in humans. Mol Biol Evol 24: 2196–2202.

91. TajimaF (1989) Statistical method for testing the neutral mutation hypothesis by DNA polymorphism. Genetics 123: 585–595.

92. KimSY, LohmuellerKE, AlbrechtsenA, LiY, KorneliussenT, et al. (2011) Estimation of allele frequency and association mapping using next-generation sequencing data. BMC Bioinform 12: 231.

93. LinW, PiskolR, TanMH, LiJB (2012) Comment on “Widespread RNA and DNA sequence differences in the human transcriptome”. Science 335: 1302..

94. PickrellJK, GiladY, PritchardJK (2012) Comment on “Widespread RNA and DNA sequence differences in the human transcriptome”. Science 335: 1302..

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2013 Číslo 4
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Aktuální možnosti diagnostiky a léčby litiáz
nový kurz
Autori: MUDr. Tomáš Ürge, PhD.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#