Identification of Rare Causal Variants in Sequence-Based Studies: Methods and Applications to , a Gene Involved in Cohen Syndrome and Autism
Sequencing technologies allow identification of genetic variants down to single base resolution for a whole human genome. The vast majority of these variants (over 90%) are rare, with population frequencies less than 1%. Furthermore, in a specific study, many of the variants identified are not associated with the disease of interest, and identification of the small proportion of truly causal variants is a difficult task. Clearly, for causal variants that are rare enough to only appear a few times in a study, observed frequencies in cases and controls are not enough to distinguish them from the vast majority of random variation, and rich functional annotations can help identify the causal variants. Here we propose to develop a set of statistical methods that leverage diverse functional genomics annotations with sequencing data to identify a small set of potentially causal variants and estimate their effects. Pinpointing a subset of potentially causal variants is crucial for understanding precise biological mechanisms, and for further experimental functional studies.
Vyšlo v časopise:
Identification of Rare Causal Variants in Sequence-Based Studies: Methods and Applications to , a Gene Involved in Cohen Syndrome and Autism. PLoS Genet 10(12): e32767. doi:10.1371/journal.pgen.1004729
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1004729
Souhrn
Sequencing technologies allow identification of genetic variants down to single base resolution for a whole human genome. The vast majority of these variants (over 90%) are rare, with population frequencies less than 1%. Furthermore, in a specific study, many of the variants identified are not associated with the disease of interest, and identification of the small proportion of truly causal variants is a difficult task. Clearly, for causal variants that are rare enough to only appear a few times in a study, observed frequencies in cases and controls are not enough to distinguish them from the vast majority of random variation, and rich functional annotations can help identify the causal variants. Here we propose to develop a set of statistical methods that leverage diverse functional genomics annotations with sequencing data to identify a small set of potentially causal variants and estimate their effects. Pinpointing a subset of potentially causal variants is crucial for understanding precise biological mechanisms, and for further experimental functional studies.
Zdroje
1. MardisER (2008) The impact of next-generation sequencing technology on genetics. Trends Genet 24: 133–141.
2. MetzkerML (2010) Sequencing technologies - the next generation. Nat Rev Genet 11: 31–46.
3. ZhangJ, ChiodiniR, BadrA, ZhangG (2011) The impact of next-generation sequencing on genomics. J Genet Genomics 38: 95–109.
4. NelsonMR, WegmannD, EhmMG, KessnerD, St JeanP, et al. (2012) An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337: 100–104.
5. Ionita-LazaI, ChoMH, LairdNM (2013) Statistical Challenges in Sequence-Based Association Studies with Population-and Family-Based Designs. Statistics in Biosciences 5: 54–70.
6. PritchardJK (2001) Are rare variants responsible for susceptibility to common diseases? Am J Hum Genet 69: 124–137.
7. PritchardJK, CoxNJ (2002) The allelic architecture of human disease genes: common disease-common variant… or not? Hum Mol Genet 11: 2417–2423.
8. KryukovGV, ShpuntA, StamatoyannopoulosJA, SunyaevSR (2009) Power of deep, all-exon resequencing for discovery of human trait genes. Proc Natl Acad Sci USA 106: 3871–3876.
9. BonnefondA, ClementN, FawcettK, YengoL, VaillantE, et al. (2012) Rare MTNR1B variants impairing melatonin receptor 1B function contribute to type 2 diabetes. Nat Genet 44: 297–301.
10. LiB, LealSM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83: 311–321.
11. LiuDJ, LealSM (2010) A novel adaptive method for the analysis of next-generation sequencing data to detect complex trait associations with rare variants due to gene main effects and interactions. PLoS Genet 6: e1001156.
12. Ionita-LazaI, BuxbaumJ, LairdNM, LangeC (2011) New testing strategy to identify rare variants with risk or protective effect on disease. PLoS Genet 7: e1001289.
13. WuMC, LeeS, CaiT, LiY, BoehnkeM, et al. (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89: 82–93.
14. Ionita-LazaI, MakarovV, YoonS, RabyB, BuxbaumJ, et al. (2011) Finding disease variants in Mendelian disorders by using sequence data: methods and applications. Am J Hum Genet 89: 701–712.
15. LeeS, WuMC, LinX (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13: 762–775.
16. Ionita-LazaI, LeeS, MakarovV, BuxbaumJD, LinX (2013) Family-based association tests for sequence data, and comparisons with population-based association tests. Eur J Hum Genet doi: 10.1038/ejhg.2012.308
17. Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin X (2013) Sequence Kernel Association Tests for the Combined Effect of Rare and Common Variants. Am J Hum Genet doi: pii: S0002-9297(13)00176-6.
18. CooperGM, ShendureJ (2011) Needles in stacks of needles: finding disease-causal variants in a wealth of genomic data. Nat Rev Genet 12: 628–640.
19. SunyaevSR (2012) Inferring causality and functional significance of human coding DNA variants. Hum Mol Genet 21(R1): R10–7.
20. WangK, LiM, HakonarsonH (2010) ANNOVAR: Functional annotation of genetic variants from next-generation sequencing data. Nucleic Acids Research 38: e164.
21. AdzhubeiIA, SchmidtS, PeshkinL, RamenskyVE, GerasimovaA (2010) A method and server for predicting damaging missense mutations. Nat Methods 7: 248–249.
22. KumarP, HenikoffS, NgPC (2009) Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4(7): 1073–81.
23. Davydov EV GoodeDL, SirotaM, CooperGM, SidowA, et al. (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol 6: e1001025.
24. CapanuM, OrlowI, BerwickM, HummerAJ, ThomasDC (2008) The use of hierarchical models for estimating relative risks of individual genetic variants: an application to a study of melanoma. Stat Med 27: 1973–1992.
25. CapanuM, BeggCB (2011) Hierarchical modeling for estimating relative risks of rare genetic variants: properties of the pseudo-likelihood method. Biometrics 67: 371–380.
26. QuintanaMA, ContiDV (2013) Integrative variable selection via Bayesian model uncertainty. Stat Med doi: 10.1002/sim.5888
27. LongN, DicksonSP, MaiaJM, KimHS, ZhuQ (2013) Leveraging prior information to detect causal variants via multi-variant regression. PLoS Comput Biol 9: e1003093.
28. PickrellJK (2014) Joint Analysis of Functional Genomic Data and Genome-wide Association Studies of 18 Human Traits. Am J Hum Genet 94: 559–573.
29. ENCODE Project Consortium, BernsteinBE, BirneyE, DunhamI, GreenED (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489: 57–74.
30. DaviesRB (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64: 247–254.
31. LoSH, ZhengT (2004) A demonstration and findings of a statistical approach through reanalysis of inflammatory bowel disease data. Proc Natl Acad Sci USA 101: 10386–10391.
32. BenagliaT, ChauveauD, HunterDR (2009) An EM-like algorithm for semi- and non-parametric estimation in multivariate mixtures. Journal of Computational and Graphical Statistics 18: 505–526.
33. SchorkAJ, ThompsonWK, PhamP, TorkamaniA, RoddeyJC (2013) All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9: e1003449.
34. Schaffner SF FooC, GabrielS, ReichD, DalyMJ, et al. (2005) Calibrating a coalescent simulation of human genome sequence variation. Genome Res 15: 1576–1583.
35. RomeoS, PennacchioLA, FuY, BoerwinkleE, Tybjaerg-HansenA, et al. (2007) Population-based re-sequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 39: 513–516.
36. Romeo YinW, KozlitinaJ, PennacchioLA, BoerwinkleE, et al. (2009) Rare loss-of-function mutations in ANGPTL family members contribute to plasma triglyceride levels in humans. J Clin Invest 119: 70–79.
37. KolehmainenJ, BlackGC, SaarinenA, ChandlerK, Clayton-SmithJ, et al. (2003) Cohen syndrome is caused by mutations in a novel gene, COH1, encoding a transmembrane protein with a presumed role in vesicle-mediated sorting and intracellular protein transport. Am J Hum Genet 72: 1359–1369.
38. DouzgouS, PetersenMB (2011) Clinical variability of genetic isolates of Cohen syndrome. Clinical genetics 79: 501–506.
39. Kivitie-KallioS, NorioR (2001) Cohen syndrome: essential features, natural history, and heterogeneity. American journal of medical genetics 102: 125–135.
40. HowlinP, KarpfJ, TurkJ (2005) Behavioural characteristics and autistic features in individuals with Cohen Syndrome. European child & adolescent psychiatry 14: 57–64.
41. YuTW, ChahrourMH, CoulterME, JiralerspongS, Okamura-IkedaK, et al. (2013) Using whole-exome sequencing to identify inherited causes of autism. Neuron 77: 259–273.
42. de LigtJ, WillemsenMH, van BonBW, KleefstraT, YntemaHG, et al. (2012) Diagnostic exome sequencing in persons with severe intellectual disability. N Engl J Med 367: 1921–1929.
43. UenoS, MarukiY, NakamuraM, TomemoriY, KamaeK, et al. (2001) The gene encoding a newly discovered protein, chorein, is mutated in chorea-acanthocytosis. Nature genetics 28: 121–122.
44. WalterfangM, EvansA, LooiJC, JungHH, et al. (2011) The neuropsychiatry of neuroacanthocytosis syndromes. Neuroscience and biobehavioral reviews 35: 1275–1283.
45. PetrovskiS, WangQ, HeinzenEL, AllenAS, GoldsteinDB (2013) Genic Intolerance to Functional Variation and the Interpretation of Personal Genomes. PLoS Genetics doi: 10.1371/journal.pgen.1003709
46. Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, et al. (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet. 2014 Feb 2. doi: 10.1038/ng.2892.
47. SeifertW, KuhnischJ, MaritzenT, HornD, HauckeV, et al. (2011) Cohen syndrome-associated protein, COH1, is a novel, giant Golgi matrix protein required for Golgi integrity. J Biol Chem 286: 37665–37675.
48. HofmannK, StoffelW, eds. ( TMbase - A database of membrane spanning proteins segments. Biol Chem Hoppe-Seyler 374: 166 (1993).
49. LiB, KrishnanVG, MortME, XinF, KamatiKK, et al. (2009) Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25: 2744–2750.
50. GonatasNK, StieberA, GonatasJO (2006) Fragmentation of the Golgi apparatus in neurodegenerative diseases and cell death. Journal of the neurological sciences 246: 21–30.
51. NakagomiS, BarsoumMJ, Bossy-WetzelE, SutterlinC, MalhotraV, et al. (2008) A Golgi fragmentation pathway in neurodegeneration. Neurobiology of disease 29: 221–231.
52. BetancurC (2011) Etiological heterogeneity in autism spectrum disorders: More than 100 genetic and genomic disorders and still counting. Brain Res 1380: 42–77.
53. LimET, RaychaudhuriS, SandersSJ, StevensC, SaboA, et al. (2013) Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77: 235–242.
54. GiannandreaM, BianchiV, MignognaML, SirriA, CarrabinoS, et al. (2010) Mutations in the small GTPase gene RAB39B are responsible for X-linked mental retardation associated with autism, epilepsy, and macrocephaly. American journal of human genetics 86: 185–195.
55. CondonKH, HoJ, RobinsonCG, HanusC, EhlersMD (2013) The Angelman syndrome protein Ube3a/E6AP is required for Golgi acidification and surface protein sialylation. J Neurosci 33: 3799–3814.
56. PoultneyCS, GoldbergAP, DrapeauE, KouY, Harony-NicolasH, et al. (2013) Identification of Small Exonic CNV from Whole-Exome Sequence Data and Application to Autism Spectrum Disorder. American journal of human genetics 93: 607–619.
57. van der ZwaagB, FrankeL, PootM, HochstenbachR, SpierenburgHA, et al. (2009) Gene-network analysis identifies susceptibility genes related to glycobiology in autism. PloS one 4: e5324.
58. Gonzlez-PrezA, Lpez-BigasN (2011) Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 88: 440–449.
59. HoerlAE, KennardR (1970) Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12: 55–67.
60. TibshiraniR (1996) Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58: 267–288.
61. XuC, LadouceurM, DastaniZ, RichardsJB, CiampiA, et al. (2012) Multiple regression methods show great potential for rare variant association tests. PLoS One 7: e41694.
62. ZhouH, SehlME, SinsheimerJS, LangeK (2010) Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26: 2375–2382.
63. SunL, CraiuRV, PatersonAD, BullSB (2006) Stratified false discovery control for large-scale hypothesis testing with application to genome-wide association studies. Genet Epidemiol 30: 519–530.
64. FerkingstadE, FrigessiA, RueH, ThorleifssonG, KongA (2008) Unsupervised empirical Bayesian multiple testing with external covariates. Ann. Appl. Stat 2: 714–735.
65. Ionita-LazaI, MakarovV, ARRA Autism Sequencing Consortium (2012) BuxbaumJD (2012) Scan-statistic approach identifies clusters of rare disease variants in LRP2, a gene linked and associated with autism spectrum disorders, in three datasets. Am J Hum Genet 90: 1002–1013.
66. Ionita-Laza I, Xu B, Makarov V, Buxbaum J, Louw Roos J, et al. (2013) A Scan-Statistic Based Analysis of Exome Sequencing Data Identifies FAN1 at 15q13.3 as a Susceptibility Gene for Schizophrenia and Autism Proceedings of the National Academy of Sciences USA, in press.
67. LiuDJ, LealSM (2010) Replication strategies for rare variant complex trait association studies via next-generation sequencing. Am J Hum Genet 87: 790–801.
68. PetersenGM, ParmigianiG, ThomasD (1998) Missense mutations in disease genes: a Bayesian approach to evaluate causality. Am J Hum Genet 62: 1516–1524.
69. CingolaniP, PlattsA, Wang leL, CoonM, NguyenT, et al. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3 Fly (Austin). 6: 80–92.
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2014 Číslo 12
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Tetraspanin (TSP-17) Protects Dopaminergic Neurons against 6-OHDA-Induced Neurodegeneration in
- Maf1 Is a Novel Target of PTEN and PI3K Signaling That Negatively Regulates Oncogenesis and Lipid Metabolism
- The IKAROS Interaction with a Complex Including Chromatin Remodeling and Transcription Elongation Activities Is Required for Hematopoiesis
- Echoes of the Past: Hereditarianism and