Biomarker discovery in inflammatory bowel diseases using network-based feature selection
Autoři:
Mostafa Abbas aff001; John Matta aff002; Thanh Le aff003; Halima Bensmail aff001; Tayo Obafemi-Ajayi aff003; Vasant Honavar aff004; Yasser EL-Manzalawy aff004
Působiště autorů:
Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar
aff001; Department of Computer Science, Southern Illinois University Edwardsville, Edwardsville, IL, United States of America
aff002; Engineering Program, Missouri State University, Springfield, MO, United States of America
aff003; College of Information Sciences and Technology, Pennsylvania State University, University Park, PA, United States of America
aff004; Geisinger Health System, Danville, PA, United States of America
aff005
Vyšlo v časopise:
PLoS ONE 14(11)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0225382
Souhrn
Reliable identification of Inflammatory biomarkers from metagenomics data is a promising direction for developing non-invasive, cost-effective, and rapid clinical tests for early diagnosis of IBD. We present an integrative approach to Network-Based Biomarker Discovery (NBBD) which integrates network analyses methods for prioritizing potential biomarkers and machine learning techniques for assessing the discriminative power of the prioritized biomarkers. Using a large dataset of new-onset pediatric IBD metagenomics biopsy samples, we compare the performance of Random Forest (RF) classifiers trained on features selected using a representative set of traditional feature selection methods against NBBD framework, configured using five different tools for inferring networks from metagenomics data, and nine different methods for prioritizing biomarkers as well as a hybrid approach combining best traditional and NBBD based feature selection. We also examine how the performance of the predictive models for IBD diagnosis varies as a function of the size of the data used for biomarker identification. Our results show that (i) NBBD is competitive with some of the state-of-the-art feature selection methods including Random Forest Feature Importance (RFFI) scores; and (ii) NBBD is especially effective in reliably identifying IBD biomarkers when the number of data samples available for biomarker discovery is small.
Klíčová slova:
Network analysis – Biomarkers – Metagenomics – Inflammatory bowel disease – Centrality – Network resilience – Biopsy – Microbial ecology
Zdroje
1. Schmidt C, Stallmach A. Etiology and pathogenesis of inflammatory bowel disease. Minerva gastroenterologica e dietologica. 2005;51(2):127–145. 15990703
2. Van Assche G, Dignass A, Panes J, Beaugerie L, Karagiannis J, Allez M, et al. The second European evidence-based consensus on the diagnosis and management of Crohn’s disease: definitions and diagnosis. Journal of Crohn’s and Colitis. 2010;4(1):7–27. doi: 10.1016/j.crohns.2009.12.003
3. Gevers D, Kugathasan S, Denson LA, Vazquez-Baeza Y, Van Treuren W, Ren B, et al. The treatment-naive microbiome in new-onset Crohn’s disease. Cell host and microbe. 2014;15(3):382–392. doi: 10.1016/j.chom.2014.02.005
4. Kamada N, Seo SU, Chen GY, Nunez G. Role of the gut microbiota in immunity and inflammatory disease. Nature Reviews Immunology. 2013;13(5):321. doi: 10.1038/nri3430 23618829
5. Kostic AD, Xavier RJ, Gevers D. The microbiome in inflammatory bowel disease: current status and the future ahead. Gastroenterology. 2014;146(6):1489–1499. doi: 10.1053/j.gastro.2014.02.009 24560869
6. Manichanh C, Reeder J, Gibert P, Varela E, Llopis M, Antolin M, et al. Reshaping the gut microbiome with bacterial transplantation and antibiotic intake. Genome research. 2010. doi: 10.1101/gr.107987.110 20736229
7. Ruemmele FM, Targan SR, Levy G, Dubinsky M, Braun J, Seidman EG. Diagnostic accuracy of serological assays in pediatric inflammatory bowel disease. Gastroenterology. 1998;115(4):822–829. doi: 10.1016/s0016-5085(98)70252-5 9753483
8. Pascal V, Pozuelo M, Borruel N, Casellas F, Campos D, Santiago A, et al. A microbial signature for Crohn’s disease. Gut. 2017; p. gutjnl–2016. doi: 10.1136/gutjnl-2016-313235
9. Holtman GA, Lisman-van Leeuwen Y, Reitsma JB, Berger MY. Noninvasive tests for inflammatory bowel disease: a meta-analysis. Pediatrics. 2016;137(1):e20152126. doi: 10.1542/peds.2015-2126
10. Viennois E, Zhao Y, Merlin D. Biomarkers of inflammatory bowel disease: from classical laboratory tools to personalized medicine. Inflammatory bowel diseases. 2015;21(10):2467–2474. doi: 10.1097/MIB.0000000000000444 25985250
11. Shanahan F, Quigley EM. Manipulation of the microbiota for treatment of IBS and IBD: challenges and controversies. Gastroenterology. 2014;146(6):1554–1563. doi: 10.1053/j.gastro.2014.01.050
12. Segata N, Izard J, Waldron L, Gevers D, Miropolsky L, Garrett WS, et al. Metagenomic biomarker discovery and explanation. Genome biology. 2011;12(6):R60. doi: 10.1186/gb-2011-12-6-r60 21702898
13. Weiss S, Xu ZZ, Peddada S, Amir A, Bittinger K, Gonzalez A, et al. Normalization and microbial differential abundance strategies depend upon data characteristics. Microbiome. 2017;5(1):27. doi: 10.1186/s40168-017-0237-y 28253908
14. Anders S, Huber W. Differential expression analysis for sequence count data. Genome biology. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106 20979621
15. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616 19910308
16. Paulson JN, Stine OC, Bravo HC, Pop M. Differential abundance analysis for microbial marker-gene surveys. Nature methods. 2013;10(12):1200. doi: 10.1038/nmeth.2658 24076764
17. Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microbial ecology in health and disease. 2015;26(1):27663.26028277
18. Guyon I, Elisseeff A. An introduction to variable and feature selection. Journal of machine learning research. 2003;3(Mar):1157–1182.
19. Abbas M, EL-Manzalawy Y. Predictive and Comparative Network Analysis of the Gut Microbiota in Type 2 Diabetes. In: Proceedings of the 8th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM; 2017. p. 313–320.
20. Abbas M, Le T, Bensmail H, Honavar V, El-Manzalawy Y. Microbiomarkers discovery in inflammatory bowel diseases using network-based feature selection. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. ACM; 2018. p. 172–177.
21. Matta J, Obafemi-Ajayi T, Borwey J, Wunsch D, Ercal G. Robust graph-theoretic clustering approaches using node-based resilience measures. In: Data Mining (ICDM), 2016 IEEE 16th International Conference on. IEEE; 2016. p. 320–329.
22. Ng SC, Shi HY, Hamidi N, Underwood FE, Tang W, Benchimol EI, et al. Worldwide incidence and prevalence of inflammatory bowel disease in the 21st century: a systematic review of population-based studies. The Lancet. 2017;390(10114):2769–2778. doi: 10.1016/S0140-6736(17)32448-0
23. Faust K, Sathirapongsasuti JF, Izard J, Segata N, Gevers D, Raes J, et al. Microbial co-occurrence relationships in the human microbiome. PLoS computational biology. 2012;8(7):e1002606. doi: 10.1371/journal.pcbi.1002606 22807668
24. Friedman J, Alm EJ. Inferring correlation networks from genomic survey data. PLoS computational biology. 2012;8(9):e1002687. doi: 10.1371/journal.pcbi.1002687 23028285
25. Kurtz ZD, Müller CL, Miraldi ER, Littman DR, Blaser MJ, Bonneau RA. Sparse and compositionally robust inference of microbial ecological networks. PLoS computational biology. 2015;11(5):e1004226. doi: 10.1371/journal.pcbi.1004226 25950956
26. Meinshausen N, Bühlmann P, et al. High-dimensional graphs and variable selection with the lasso. The annals of statistics. 2006;34(3):1436–1462. doi: 10.1214/009053606000000281
27. Deng Y, Jiang YH, Yang Y, He Z, Luo F, Zhou J. Molecular ecological network analyses. BMC bioinformatics. 2012;13(1):113. doi: 10.1186/1471-2105-13-113 22646978
28. Faust K, Lima-Mendez G, Lerat JS, Sathirapongsasuti JF, Knight R, Huttenhower C, et al. Cross-biome comparison of microbial association networks. Frontiers in microbiology. 2015;6:1200. doi: 10.3389/fmicb.2015.01200 26579106
29. El-Manzalawy Y. Proxi: a Python package for proximity network inference from metagenomic data. bioRxiv. 2018; p. 357764.
30. Matta J, Obafemi-Ajayi T, Borwey J, Sinha K, Wunsch D, Ercal G. Node-Based Resilience Measure Clustering with Applications to Noisy and Overlapping Communities in Complex Networks. Applied Sciences. 2018;8(8):1307. doi: 10.3390/app8081307
31. Hagberg A, Swart P, Chult DS. Exploring network structure, dynamics, and function using NetworkX. Los Alamos National Lab.(LANL), Los Alamos, NM (United States); 2008.
32. Matta J, Ercal G, Borwey J. The vertex attack tolerance of complex networks. RAIRO-Operations Research. 2017;51(4):1055–1076. doi: 10.1051/ro/2017008
33. Ercal G. On Vertex Attack Tolerance in Regular Graphs. arXiv preprint arXiv:14092172. 2014.
34. Barefoot CA, Entringer R, Swart H. Vulnerability in graphs—a comparative survey. J Combin Math Combin Comput. 1987;1(38):13–22.
35. Cozzens M, Moazzami D, Stueckle S. The tenacity of a graph. In: Proc. Seventh International Conference on the Theory and Applications of Graphs, Wiley, New York; 1995. p. 1111–1122.
36. Matta J, Ercal G, Borwey J. The vertex attack tolerance of complex networks. RAIRO-Operations Research. 2017;51(4):1055–1076. doi: 10.1051/ro/2017008
37. Matta J. A Comparison of Approaches to Computing Betweenness Centrality for Large Graphs. In: International Workshop on Complex Networks and their Applications. Springer; 2017. p. 3–13.
38. Breiman L. Random forests. Machine learning. 2001;45(1):5–32. doi: 10.1023/A:1010933404324
39. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. Journal of machine learning research. 2011;12(Oct):2825–2830.
40. Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Methodological). 1996; p. 267–288. doi: 10.1111/j.2517-6161.1996.tb02080.x
41. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics. 2000;16(5):412–424. doi: 10.1093/bioinformatics/16.5.412 10871264
42. Ditzler G, Morrison JC, Lan Y, Rosen GL. Fizzy: feature subset selection for metagenomics. BMC bioinformatics. 2015;16(1):358. doi: 10.1186/s12859-015-0793-8 26538306
43. Pasolli E, Truong DT, Malik F, Waldron L, Segata N. Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS computational biology. 2016;12(7):e1004977. doi: 10.1371/journal.pcbi.1004977 27400279
44. Sokol H, Leducq V, Aschard H, Pham HP, Jegou S, Landman C, et al. Fungal microbiota dysbiosis in IBD. Gut. 2017;66(6):1039–1048. doi: 10.1136/gutjnl-2015-310746 26843508
45. Menon R, Ramanan V, Korolev KS. Interactions between species introduce spurious associations in microbiome studies. PLoS computational biology. 2018;14(1):e1005939. doi: 10.1371/journal.pcbi.1005939 29338008
46. Strauss J, Kaplan GG, Beck PL, Rioux K, Panaccione R, DeVinney R, et al. Invasive potential of gut mucosa-derived Fusobacterium nucleatum positively correlates with IBD status of the host. Inflammatory bowel diseases. 2011;17(9):1971–1978. doi: 10.1002/ibd.21606 21830275
47. Wang L, Christophersen CT, Sorich MJ, Gerber JP, Angley MT, Conlon MA. Increased abundance of Sutterella spp. and Ruminococcus torques in feces of children with autism spectrum disorder. Molecular autism. 2013;4(1):42. doi: 10.1186/2040-2392-4-42 24188502
48. Lavelle A, Lennon G, O’sullivan O, Docherty N, Balfe A, Maguire A, et al. Spatial variation of the colonic microbiota in patients with ulcerative colitis and control volunteers. Gut. 2015; p. gutjnl–2014. doi: 10.1136/gutjnl-2014-307873 25596182
49. Mukhopadhya I, Hansen R, Nicholl CE, Alhaidan YA, Thomson JM, Berry SH, et al. A comprehensive evaluation of colonic mucosal isolates of Sutterella wadsworthensis from inflammatory bowel disease. PLoS One. 2011;6(10):e27076. doi: 10.1371/journal.pone.0027076 22073125
50. Hiippala K, Kainulainen V, Kalliomäki M, Arkkila P, Satokari R. Mucosal Prevalence and Interactions with the Epithelium Indicate Commensalism of Sutterella spp. Frontiers in microbiology. 2016;7:1706. doi: 10.3389/fmicb.2016.01706 27833600
51. Machiels K, Joossens M, Sabino J, De Preter V, Arijs I, Eeckhaut V, et al. A decrease of the butyrate-producing species Roseburia hominis and Faecalibacterium prausnitzii defines dysbiosis in patients with ulcerative colitis. Gut. 2014;63(8):1275–1283. doi: 10.1136/gutjnl-2013-304833 24021287
52. Joossens M, Huys G, Cnockaert M, De Preter V, Verbeke K, Rutgeerts P, et al. Dysbiosis of the faecal microbiota in patients with Crohn’s disease and their unaffected relatives. Gut. 2011; p. gut–2010. doi: 10.1136/gut.2010.223263
53. Tye H, Yu CH, Simms LA, de Zoete MR, Kim ML, Zakrzewski M, et al. NLRP1 restricts butyrate producing commensals to exacerbate inflammatory bowel disease. Nature communications. 2018;9(1):3728. doi: 10.1038/s41467-018-06125-0 30214011
54. Delday M, Mulder I, Logan ET, Grant G. Bacteroides thetaiotaomicron ameliorates colon inflammation in preclinical models of Crohn’s disease. Inflammatory bowel diseases. 2018;25(1):85–96. doi: 10.1093/ibd/izy281
55. Konikoff T, Gophna U. Oscillospira: a central, enigmatic component of the human gut microbiota. Trends in microbiology. 2016;24(7):523–524. doi: 10.1016/j.tim.2016.02.015 26996766
56. Bader GD, Hogue CW. An automated method for finding molecular complexes in large protein interaction networks. BMC bioinformatics. 2003;4(1):2. doi: 10.1186/1471-2105-4-2 12525261
57. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research. 2003;13(11):2498–2504. doi: 10.1101/gr.1239303 14597658
58. Morgan XC, Tickle TL, Sokol H, Gevers D, Devaney KL, Ward DV, et al. Dysfunction of the intestinal microbiome in inflammatory bowel disease and treatment. Genome biology. 2012;13(9):R79. doi: 10.1186/gb-2012-13-9-r79 23013615
59. Goenawan IH, Bryan K, Lynn DJ. DyNet: visualization and analysis of dynamic molecular interaction networks. Bioinformatics. 2016;32(17):2713–2715. doi: 10.1093/bioinformatics/btw187 27153624
60. Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nature communications. 2017;8(1):1784. doi: 10.1038/s41467-017-01973-8 29209090
61. Flemer B, Warren RD, Barrett MP, Cisek K, Das A, Jeffery IB, et al. The oral microbiota in colorectal cancer is distinctive and predictive. Gut. 2018;67(8):1454–1463. doi: 10.1136/gutjnl-2017-314814 28988196
62. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI. The human microbiome project. Nature. 2007;449(7164):804. doi: 10.1038/nature06244 17943116
63. Debelius JW, Vázquez-Baeza Y, McDonald D, Xu Z, Wolfe E, Knight R. Turning participatory microbiome research into usable data: lessons from the American Gut Project. Journal of microbiology & biology education. 2016;17(1):46. doi: 10.1128/jmbe.v17i1.1034
64. Waldor MK, Tyson G, Borenstein E, Ochman H, Moeller A, Finlay BB, et al. Where next for microbiome research? PLoS Biology. 2015;13(1):e1002050. doi: 10.1371/journal.pbio.1002050 25602283
65. Kyrpides NC, Eloe-Fadrosh EA, Ivanova NN. Microbiome data science: understanding our microbial planet. Trends in microbiology. 2016;24(6):425–427. doi: 10.1016/j.tim.2016.02.011 27197692
66. Weiss S, Van Treuren W, Lozupone C, Faust K, Friedman J, Deng Y, et al. Correlation detection strategies in microbial data sets vary widely in sensitivity and precision. The ISME journal. 2016;10(7):1669. doi: 10.1038/ismej.2015.235 26905627
67. Jeh G, Widom J. SimRank: a measure of structural-context similarity. In: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM; 2002. p. 538–543.
68. Chen HH, Giles CL. ASCOS: an asymmetric network structure context similarity measure. In: Advances in Social Networks Analysis and Mining (ASONAM), 2013 IEEE/ACM International Conference on. IEEE; 2013. p. 442–449.
69. Koutra D, Vogelstein JT, Faloutsos C. Deltacon: A principled massive-graph similarity function. In: Proceedings of the 2013 SIAM International Conference on Data Mining. SIAM; 2013. p. 162–170.
70. Goldstein M, Uchida S. A comparative evaluation of unsupervised anomaly detection algorithms for multivariate data. PloS one. 2016;11(4):e0152173. doi: 10.1371/journal.pone.0152173 27093601
71. van Dam S, Vosa U, van der Graaf A, Franke L, de Magalhaes JP. Gene co-expression analysis for functional classification and gene–disease predictions. Briefings in bioinformatics. 2017;19(4):575–592.
72. He Y, Evans A. Graph theoretical modeling of brain connectivity. Current opinion in neurology. 2010;23(4):341–350.20581686
73. Fan J, Fan Y, Lv J. High dimensional covariance matrix estimation using a factor model. Journal of Econometrics. 2008;147(1):186–197. doi: 10.1016/j.jeconom.2008.09.017
74. Bickel PJ, Levina E, et al. Regularized estimation of large covariance matrices. The Annals of Statistics. 2008;36(1):199–227. doi: 10.1214/009053607000000758
75. Avella-Medina M, Battey HS, Fan J, Li Q. Robust estimation of high-dimensional covariance and precision matrices. Biometrika. 2018;105(2):271–284. doi: 10.1093/biomet/asy011 30337763
76. Ravikumar P, Wainwright MJ, Raskutti G, Yu B, et al. High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence. Electronic Journal of Statistics. 2011;5:935–980. doi: 10.1214/11-EJS631
77. EL-Manzalawy Y, Hsieh TY, Shivakumar M, Kim D, Honavar V. Min-redundancy and max-relevance multi-view feature selection for predicting ovarian cancer survival using multi-omics data. BMC Medical Genomics. 2018;11(3):71. doi: 10.1186/s12920-018-0388-0 30255801
78. EL-Manzalawy Y. CCA based multi-view feature selection for multi-omics data integration. In: 2018 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB); 2018. p. 1–8.
79. Sun Y, Bui N, Hsieh TY, Honavar V. Multi-View Network Embedding Via Graph Factorization Clustering and Co-Regularized Multi-View Agreement. In: 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE; 2018. p. 1006–1013.
Článok vyšiel v časopise
PLOS One
2019 Číslo 11
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Úspěšná resuscitativní thorakotomie v přednemocniční neodkladné péči
- Dlouhodobá recidiva a komplikace spojené s elektivní operací břišní kýly
Najčítanejšie v tomto čísle
- A daily diary study on maladaptive daydreaming, mind wandering, and sleep disturbances: Examining within-person and between-persons relations
- A 3’ UTR SNP rs885863, a cis-eQTL for the circadian gene VIPR2 and lincRNA 689, is associated with opioid addiction
- A substitution mutation in a conserved domain of mammalian acetate-dependent acetyl CoA synthetase 2 results in destabilized protein and impaired HIF-2 signaling
- Molecular validation of clinical Pantoea isolates identified by MALDI-TOF