Double triage to identify poorly annotated genes in maize: The missing link in community curation
Autoři:
Marcela K. Tello-Ruiz aff001; Cristina F. Marco aff003; Fei-Man Hsu aff004; Rajdeep S. Khangura aff005; Pengfei Qiao aff006; Sirjan Sapkota aff007; Michelle C. Stitzer aff008; Rachael Wasikowski aff009; Hao Wu aff010; Junpeng Zhan aff011; Kapeel Chougule aff001; Lindsay C. Barone aff003; Cornel Ghiban aff003; Demitri Muna aff001; Andrew C. Olson aff001; Liya Wang aff001; Doreen Ware aff001; David A. Micklos aff003
Působiště autorů:
Plant Biology Program, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
aff001; Department of Biological Sciences, State University of New York at Old Westbury, Old Westbury, New York, United States of America
aff002; DNA Learning Center, Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, United States of America
aff003; Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan
aff004; Department of Biochemistry, Purdue University, West Lafayette, Indiana, United States of America
aff005; Plant Biology Section, School of Integrative Plant Sciences, Cornell University, Ithaca, New York, United States of America
aff006; Department of Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
aff007; Department of Plant Sciences and Center for Population Biology, University of California Davis, Davis, California, United States of America
aff008; Department of Biological Sciences, University of Toledo, Toledo, Ohio, United States of America
aff009; Genetics, Development & Cell Biology Department, Iowa State University, Ames, Iowa, United States of America
aff010; School of Plant Sciences, University of Arizona, Tucson, Arizona, United States of America
aff011; Donald Danforth Plant Science Center, St. Louis, Missouri, United States of America
aff012; USDA, Agricultural Research Service, Washington, D.C., United States of America
aff013
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0224086
Souhrn
The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors–including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.
Klíčová slova:
Plant genomics – Genome annotation – Maize – Sequence alignment – Phylogenetic analysis – Invertebrate genomics – Functional genomics
Zdroje
1. Foreign Agricultural Service, United States Department of Agriculture. All grain summary comparison [Internet]. 2019. Available at https://apps.fas.usda.gov/psdonline/circulars/grain.pdf (p. 15)
2. Schnable PS, Ware D, Fulton RS, Stein JC, Wei F, Pasternak S, et al. The B73 maize genome: complexity, diversity, and dynamics. Science. 2009;326: 1112–1115. doi: 10.1126/science.1178534 19965430
3. National Human Genome Research Institute. Cost per raw megabase of DNA sequence. 2017. Available at https://www.genome.gov/images/content/costpermb_2017.jpg
4. Barone L, Williams J, Micklos D. Unmet needs for analyzing biological big data: A survey of 704 NSF principal investigators. PLS Comput Biol. 2017;13: e1005755. doi: 10.1371/journal.pcbi.1005755 29049281
5. Pennisi E. Ideas fly at gene-finding jamboree. Science. 2000;287: 2182–2184. Available at https://www.ncbi.nlm.nih.gov/pubmed/10744542 doi: 10.1126/science.287.5461.2182 10744542
6. Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, et al. Annotation of the Drosophila melanogaster euchromatic genome: a systematic review. Genome Biol. 2002;3: RESEARCH0083. Available at https://www.ncbi.nlm.nih.gov/pubmed/12537572
7. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski F, et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 2012;22: 1760–1774. doi: 10.1101/gr.135350.111 22955987
8. Thurmond J, Goodman JL, Strelets VB, Attrill H, Gramates LS, Marygold SJ, et al. FlyBase 2.0: the next generation. Nucleic Acids Res. 2019;47: D759–D765. doi: 10.1093/nar/gky1003 30364959
9. Harris TW, Chen N, Cunningham F, Tello-Ruiz M, Antoshechkin I, Bastiani C, et al. WormBase: a multi-species resource for nematode biology and genomics. Nucleic Acids Res. 2004;32: D411–7. doi: 10.1093/nar/gkh066 14681445
10. Berardini TZ, Reiser L, Li D, Mezheritsky Y, Muller R, Strait E, et al. The Arabidopsis information resource: Making and mining the “gold standard” annotated reference plant genome. Genesis. 2015;53: 474–485. doi: 10.1002/dvg.22877 26201819
11. Reiser L, Berardini TZ, Li D, Muller R, Strait EM, Li Q, et al. Sustainable funding for biocuration: The Arabidopsis Information Resource (TAIR) as a case study of a subscription-based funding model. Database. 2016. 2016. doi: 10.1093/database/baw018 26989150
12. Attwood TK, Agit B, Ellis LBM. Longevity of Biological Databases. EMBnet.journal. 2015;21: 803. doi: 10.14806/ej.21.0.803
13. Crosby MA, Gramates LS, Dos Santos G, Matthews BB, St Pierre SE, Zhou P, et al. Gene Model Annotations for Drosophila melanogaster: The Rule-Benders. G3. 2015;5: 1737–1749. doi: 10.1534/g3.115.018937 26109356
14. Matthews BB, Dos Santos G, Crosby MA, Emmert DB, St Pierre SE, Gramates LS, et al. Gene Model Annotations for Drosophila melanogaster: Impact of High-Throughput Data. G3. 2015;5: 1721–1736. doi: 10.1534/g3.115.018929 26109357
15. Wilkerson MD, Schlueter SD, Brendel V. yrGATE: a web-based gene-structure annotation tool for the identification and dissemination of eukaryotic genes. Genome Biol. 2006;7: R58. doi: 10.1186/gb-2006-7-7-r58 16859520
16. Available at http://www.plantgdb.org/ZmGDB/DisplayProjects.php
17. Eukaryotic Genome Annotation at NCBI. Available at [Internet]. Available at https://www.ncbi.nlm.nih.gov/genome/annotation_euk/
18. Sequence Read Archive. National Center for Biotechnology Information. Available at. https://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=announcement.
19. Available at https://www.nsf.gov/awardsearch/showAward?AWD_ID=1445025
20. GENCODE. Statistics about the current GENCODE Release (version 29). Available at https://www.gencodegenes.org/human/stats.html.
21. Kulp D, Haussler D, Reese MG, Eeckman FH. A generalized hidden Markov model for the recognition of human genes in DNA. Proc Int Conf Intell Syst Mol Biol. 1996;4: 134–142. Available at https://www.ncbi.nlm.nih.gov/pubmed/8877513 8877513
22. Nasiri J, Naghavi M, Rad SN, Yolmeh T, Shirazi M, Naderi R, et al. Gene identification programs in bread wheat: a comparison study. Nucleosides Nucleotides Nucleic Acids. 2013;32: 529–554. doi: 10.1080/15257770.2013.832773 24124688
23. Weirather JL, de Cesare M, Wang Y, Piazza P. Comprehensive comparison of Pacific Biosciences and Oxford Nanopore Technologies and their applications to transcriptome analysis. ncbi.nlm.nih.gov; 2017. Available at https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5553090.2/
24. Salzberg SL. Next-generation genome annotation: we still struggle to get it right. Genome Biology. 2019;20 (92). doi: 10.1186/s13059-019-1715-2 31097009
25. Hosmani PS, Shippy T, Miller S, Benoit JB, Munoz-Torres M et al. A quick guide for student-driven community genome annotation. PLoS Comput. Biol. 2019; 15(4):e1006682. doi: 10.1371/journal.pcbi.1006682 30943207
26. Leung W, Shaffer CD, Reed LK, Smith ST, Barshop W, Dirkes W, et al. Drosophila muller f elements maintain a distinct set of genomic properties over 40 million years of evolution. G3. 2015;4;5(5):719–40. doi: 10.1534/g3.114.015966 25740935
27. Saha S, Hosmani PS, Villalobos-Ayala K, Miller S, Shippy T, Flores M et al. Improved annotation of the insect vector of citrus greening disease: biocuration by a diverse genomics community. Database. 2019. 2019. doi: 10.1093/database/baz035 30820572
28. Jiao Y, Peluso P, Shi J, Liang T, Stitzer MC, Wang B, et al. Improved maize reference genome with single-molecule technologies. Nature. 2017;546: 524–527. doi: 10.1038/nature22971 28605751
29. Campbell MS, Holt C, Moore B, Yandell M. Genome Annotation and Curation Using MAKER and MAKER-P. Curr Protoc Bioinformatics. 2014;48: 4.11.1–39. doi: 10.1002/0471250953.bi0411s48 25501943
30. Eilbeck K, Moore B, Holt C, Yandell M. Quantitative measures for the management and comparison of annotated genomes. BMC Bioinformatics. 2009;10: 67. doi: 10.1186/1471-2105-10-67 19236712
31. Dunn NA, Unni DR, Diesh C, Munoz-Torres M, Harris NL, Yao E, et al. Apollo: Democratizing genome annotation. PLoS Comput Biol. 2019;15: e1006790. doi: 10.1371/journal.pcbi.1006790 30726205
32. Schnable JC, Freeling M. Genes identified by visible mutant phenotypes show increased bias toward one of two subgenomes of maize. PLoS One. 2011;6: e17855. doi: 10.1371/journal.pone.0017855 21423772
33. Available at https://www.maizegdb.org/associated_genes?type=classical&style=table
34. Tello-Ruiz MK, Naithani S, Stein JC, Gupta P, Campbell M, Olson A, et al. Gramene 2018: unifying comparative genomics and pathway resources for plant research. Nucleic Acids Res. 2018;46: D1181–D1189. doi: 10.1093/nar/gkx1111 29165610
35. Frank MJ, Cartwright HN, Smith LG. Three Brick genes have distinct functions in a common pathway promoting polarized cell division and cell morphogenesis in the maize leaf epidermis. Development. 2003;130(4):753–62. doi: 10.1242/dev.00290 12506005
36. Escobar B, de Cárcer G, Fernández-Miranda G, Cascon A, Bravo-Cordero JJ, Montoya MC, et al. Brick1 is an essential regulator of actin cytoskeleton required for embryonic development and cell transformation. Cancer Res. 2010; 15; 70(22):9349–59. doi: 10.1158/0008-5472.CAN-09-4491 20861187
37. Juárez-Colunga S, López-González C, Morales-Elías NC, Massange-Sánchez JA, Trachsel S, Tiessen A. Genome-wide analysis of the invertase gene family from maize. Plant Mol Biol. 2018;97: 385–406. doi: 10.1007/s11103-018-0746-5 29948658
38. Sturm A. Invertases. Primary structures, functions, and roles in plant development and sucrose partitioning. Plant Physiol. 1999;121: 1–8. Available at https://www.ncbi.nlm.nih.gov/pubmed/10482654 doi: 10.1104/pp.121.1.1 10482654
39. Verhaest M, Lammens W, Le Roy K, De Coninck B, De Ranter CJ, Van Laere A, et al. X-ray diffraction structure of a cell-wall invertase from Arabidopsis thaliana. Acta Crystallogr D Biol Crystallogr. 2006;62: 1555–1563. doi: 10.1107/S0907444906044489 17139091
40. Yao Y, Geng M-T, Wu X-H, Liu J, Li R-M, Hu X-W, et al. Genome-wide identification, 3D modeling, expression and enzymatic activity analysis of cell wall invertase gene family from cassava (Manihot esculenta Crantz). Int J Mol Sci. Multidisciplinary Digital Publishing Institute; 2014;15: 7313–7331. Available at https://www.mdpi.com/1422-0067/15/5/7313/htm doi: 10.3390/ijms15057313 24786092
41. Yao Y, Geng M-T, Wu X-H, Liu J, Li R-M, Hu X-W, et al. Genome-Wide Identification, Expression, and Activity Analysis of Alkaline/Neutral Invertase Gene Family from Cassava (Manihot esculenta Crantz). Plant Mol Biol Rep. 2015;33: 304–315. doi: 10.1007/s11105-014-0743-z
42. Cunningham F, Achuthan P, Akanni W, Allen J, Amode MR, Armean IM, et al. Ensembl 2019. Nucleic Acids Res. 2019;47: D745–D751. doi: 10.1093/nar/gky1113 30407521
43. Soderlund C, Descour A, Kudrna D, Bomhoff M, Boyd L, Currie J, et al. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs. PLoS Genet. 2009;5: e1000740. doi: 10.1371/journal.pgen.1000740 19936069
44. Law M, Childs KL, Campbell MS, Stein JC, Olson AJ, Holt C, et al. Automated update, revision, and quality control of the maize genome annotations using MAKER-P improves the B73 RefGen_v3 gene models and identifies new genes. Plant Physiol. 2015;167: 25–39. doi: 10.1104/pp.114.245027 25384563
45. Wang B, Tseng E, Regulski M, Clark TA, Hon T, Jiao Y, et al. Unveiling the complexity of the maize transcriptome by single-molecule long-read sequencing. Nat Commun. 2016;7: 11708. doi: 10.1038/ncomms11708 27339440
46. Herrero J, Muffato M, Beal K, Fitzgerald S, Gordon L, Pignatelli M, et al. Ensembl comparative genomics resources. Database. 2016;2016. doi: 10.1093/database/baw053 27141089
Článok vyšiel v časopise
PLOS One
2019 Číslo 10
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Těžké menstruační krvácení může značit poruchu krevní srážlivosti. Jaký management vyšetření a léčby je v takovém případě vhodný?
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Risk factors associated with IgA vasculitis with nephritis (Henoch–Schönlein purpura nephritis) progressing to unfavorable outcomes: A meta-analysis