Characterising and Predicting Haploinsufficiency in the Human Genome
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Vyšlo v časopise:
Characterising and Predicting Haploinsufficiency in the Human Genome. PLoS Genet 6(10): e32767. doi:10.1371/journal.pgen.1001154
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pgen.1001154
Souhrn
Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.
Zdroje
1. NgPC
HenikoffS
2006 Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7 61 80
2. WilkieAOM
1994 The molecular basis of genetic dominance. J Med Genet 31 89 98
3. XueY
DalyA
YngvadottirB
LiuM
CoopG
2006 Spread of an inactive form of caspase-12 in humans is due to recent positive selection. Am J Hum Genet 78 659 670
4. NgSB
TurnerEH
RobertsonPD
FlygareSD
BighamAW
2009 Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461 272 276
5. NgPC
LevyS
HuangJ
StockwellTB
WalenzBP
2008 Genetic variation in an individual human exome. PLoS Genet 4 e1000160 doi:10.1371/journal.pgen.1000160
6. ConradDF
PintoD
RedonR
FeukL
GokcumenO
2009 Origins and functional impact of copy number variation in the human genome. Nature
7. LeeC
IafrateAJ
BrothmanAR
2007 Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet 39 S48 54
8. DangV
KassahnK
MarcosA
RaganM
2008 Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur J Hum Genet 16 1350 1357
9. SeidmanJG
SeidmanC
2002 Transcription factor haploinsufficiency: when half a loaf is not enough. J Clin Invest 109 451 455
10. BlekhmanR
ManO
HerrmannL
BoykoAR
IndapA
2008 Natural Selection on Genes that Underlie Human Disease Susceptibility. Curr Biol 18 883 889
11. KondrashovFA
KooninEV
2004 A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20 287 290
12. NguyenD-Q
WebberC
PontingCP
2006 Bias of selection on human copy-number variants. PLoS Genet 2 e20 doi:10.1371/journal.pgen.0020020
13. DeutschbauerAM
JaramilloDF
ProctorM
KummJ
HillenmeyerME
2005 Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169 1915 1925
14. VeitiaRA
2007 Exploring the molecular etiology of dominant-negative mutations. Plant Cell 19 3843 3851
15. HamoshA
ScottAF
AmbergerJ
BocchiniC
ValleD
2002 Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30 52 55
16. BlakeJA
BultCJ
EppigJT
KadinJA
RichardsonJE
2009 The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res 37 D712 719
17. International Schizophrenia Consortium 2008 Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455 237 241
18. FirthHV
RichardsSM
BevanAP
ClaytonS
CorpasM
2009 DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet 84 524 533
19. BoykoAR
WilliamsonSH
IndapAR
DegenhardtJD
HernandezRD
2008 Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4 e1000083 doi:10.1371/journal.pgen.1000083
20. BustamanteCD
Fledel-AlonA
WilliamsonS
NielsenR
HubiszMT
2005 Natural selection on protein-coding genes in the human genome. Nature 437 1153 1157
21. LohmuellerKE
IndapAR
SchmidtS
BoykoAR
HernandezRD
2008 Proportionally more deleterious genetic variation in European than in African populations. Nature 451 994 997
22. van der HeijdenGJ
DondersAR
StijnenT
MoonsKG
2006 Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59 1102 1109
23. JiW
FooJN
O'RoakBJ
ZhaoH
LarsonMG
2008 Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40 592 599
24. GirirajanS
RosenfeldJA
CooperGM
AntonacciF
SiswaraP
A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat Genet 42 203 209
25. NgPC
HenikoffS
2001 Predicting deleterious amino acid substitutions. Genome Res 11 863 874
26. SunyaevS
RamenskyV
KochI
LatheW3rd
KondrashovAS
2001 Prediction of deleterious human alleles. Hum Mol Genet 10 591 597
27. MadsenBE
BrowningSR
2009 A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5 e1000384 doi:10.1371/journal.pgen.1000384
28. McCarrollS
KuruvillaF
KornJ
CawleyS
NemeshJ
2008 Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40 1166 1174
29. KornJ
KuruvillaF
McCarrollS
WysokerA
NemeshJ
2008 Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40 1253 1260
30. HubbardTJ
AkenBL
AylingS
BallesterB
BealK
2009 Ensembl 2009. Nucleic Acids Res 37 D690 697
31. CooperGM
StoneEA
AsimenosG
GreenED
BatzoglouS
2005 Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15 901 913
32. SuAI
WiltshireT
BatalovS
LappH
ChingKA
2004 A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101 6062 6067
33. AssouS
Le CarrourT
TondeurS
StromS
GabelleA
2007 A meta-analysis of human embryonic stem cells transcriptome integrated into a web-based expression atlas. Stem Cells 25 961 973
34. SmithCM
FingerJH
HayamizuTF
McCrightIJ
EppigJT
2007 The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 35 D618 623
35. BrownKR
JurisicaI
2005 Online predicted human interaction database. Bioinformatics 21 2076 2082
36. Chatr-aryamontriA
CeolA
PalazziLM
NardelliG
SchneiderMV
2007 MINT: the Molecular INTeraction database. Nucleic Acids Res 35 D572 574
37. Keshava PrasadTS
GoelR
KandasamyK
KeerthikumarS
KumarS
2009 Human Protein Reference Database–2009 update. Nucleic Acids Res 37 D767 772
38. RualJ-F
VenkatesanK
HaoT
Hirozane-KishikawaT
DricotA
2005 Towards a proteome-scale map of the human protein-protein interaction network. Nature 437 1173 1178
39. VastrikI
D'EustachioP
SchmidtE
Joshi-TopeG
GopinathG
2007 Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8 R39
40. LeeI
LiZ
MarcotteEM
2007 An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae. PLoS ONE 2 e988 doi:10.1371/journal.pone.0000988
41. LeeI
LehnerB
CrombieC
WongW
FraserAG
2008 A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet 40 181 188
42. van DongenS
2008 Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal & Appl 30 121 141
43. ForbesS
ClementsJ
DawsonE
BamfordS
WebbT
2006 COSMIC 2005. Br J Cancer 94 318 322
44. FawcettT
2006 An introduction to ROC analysis. Pattern Recognition Letters 27 861 874
45. BaldiP
BrunakS
ChauvinY
AndersenCA
NielsenH
2000 Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16 412 424
Štítky
Genetika Reprodukčná medicínaČlánok vyšiel v časopise
PLOS Genetics
2010 Číslo 10
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
Najčítanejšie v tomto čísle
- Genome-Wide Identification of Targets and Function of Individual MicroRNAs in Mouse Embryonic Stem Cells
- Common Genetic Variants and Modification of Penetrance of -Associated Breast Cancer
- Allele-Specific Down-Regulation of Expression Induced by Retinoids Contributes to Climate Adaptations
- Simultaneous Disruption of Two DNA Polymerases, Polη and Polζ, in Avian DT40 Cells Unmasks the Role of Polη in Cellular Response to Various DNA Lesions