A novel one-class classification approach to accurately predict disease-gene association in acute myeloid leukemia cancer
Autoři:
Akram Vasighizaker aff001; Alok Sharma aff002; Abdollah Dehzangi aff007
Působiště autorů:
Electrical & Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
aff001; Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Queensland, Australia
aff002; Department of Medical Science Mathematics, Medical Research Institute, Tokyo Medical and Dental University (TMDU), Tokyo, Japan
aff003; Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Kanagawa, Japan
aff004; School of Engineering and Physics, Faculty of Science Technology and Environment, University of the South Pacific, Suva, Fiji
aff005; CREST, JST, Tokyo, Japan
aff006; Department of Computer Science, Morgan State University, Baltimore, Maryland, United States of America
aff007
Vyšlo v časopise:
PLoS ONE 14(12)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0226115
Souhrn
Disease causing gene identification is considered as an important step towards drug design and drug discovery. In disease gene identification and classification, the main aim is to identify disease genes while identifying non-disease genes are of less or no significant. Hence, this task can be defined as a one-class classification problem. Existing machine learning methods typically take into consideration known disease genes as positive training set and unknown genes as negative samples to build a binary-class classification model. Here we propose a new One-class Classification Support Vector Machines (OCSVM) method to precisely classify candidate disease genes. Our aim is to build a model that concentrate its focus on detecting known disease-causing gene to increase sensitivity and precision. We investigate the impact of our proposed model using a benchmark consisting of the gene expression dataset for Acute Myeloid Leukemia (AML) cancer. Compared with the traditional methods, our experimental result shows the superiority of our proposed method in terms of precision, recall, and F-measure to detect disease causing genes for AML. OCSVM codes and our extracted AML benchmark are publicly available at: https://github.com/imandehzangi/OCSVM.
Klíčová slova:
Gene expression – Algorithms – Drug discovery – Machine learning – Gene prediction – Support vector machines – Acute myeloid leukemia – Kernel methods
Zdroje
1. Luo, P., Tian, L. P., Ruan, J., and Wu, F. X., Identifying disease genes from PPI networks weighted by gene expression under different conditions. in 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2016. IEEE.
2. Asif M., Martiniano H. F., Vicente A. M., and Couto F. M., Identifying disease genes using machine learning and gene functional similarities, assessed through Gene Ontology. PloS one, 2018. 13(12): p. e0208626. doi: 10.1371/journal.pone.0208626 30532199
3. McBride D. L., Large Genetic Study Uncovers 14 New Genes Responsible for Developmental Disorders in Children. Journal of pediatric nursing, 2017. 35: p. 1–2. doi: 10.1016/j.pedn.2017.02.002 28728758
4. Adie E. A., Adams R. R., Evans K. L., Porteous D. J., and Pickard B. S., Speeding disease gene discovery by sequence based candidate prioritization. BMC bioinformatics, 2005. 6(1): p. 55.
5. Xu J. and Li Y., Discovering disease-genes by topological features in human protein–protein interaction network. Bioinformatics, 2006. 22(22): p. 2800–2805. doi: 10.1093/bioinformatics/btl467 16954137
6. Smalter, A., Lei, S. F., and Chen, X. W., Human disease-gene classification with integrative sequence-based and topological features of protein-protein interaction networks. in Bioinformatics and Biomedicine, 2007. BIBM 2007. IEEE International Conference on. 2007. IEEE.
7. Zhou H. and Skolnick J., A knowledge-based approach for predicting gene–disease associations. Bioinformatics, 2016. 32(18): p. 2831–2838. doi: 10.1093/bioinformatics/btw358 27283949
8. Ata S. K., Ou-Yang L., Fang Y., Kwoh C. K., Wu M., and Li X. L., Integrating node embeddings and biological annotations for genes to predict disease-gene associations. BMC systems biology, 2018. 12(9): p. 138.
9. Luo P., Li Y., Tian L. P., and Wu F. X., Enhancing the prediction of disease—gene associations with multimodal deep learning. Bioinformatics, 2019.
10. Han, P., Yang, P., Zhao, P., Shang, S., Liu, Y., Zhou, J., et al., GCN-MF: Disease-Gene Association Identification By Graph Convolutional Networks and Matrix Factorization. in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. ACM.
11. Mordelet F. and Vert J. P., ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples. BMC bioinformatics, 2011. 12(1): p. 389.
12. Yang P., Li X. L., Mei J. P., Kwoh C. K., and Ng S. K., Positive-unlabeled learning for disease gene identification. Bioinformatics, 2012. 28(20): p. 2640–2647. doi: 10.1093/bioinformatics/bts504 22923290
13. Jowkar G. H. and Mansoori E. G., Perceptron ensemble of graph-based positive-unlabeled learning for disease gene identification. Computational biology and chemistry, 2016. 64: p. 263–270. doi: 10.1016/j.compbiolchem.2016.07.004 27475237
14. Yousef A. and Charkari N.M., SFM: a novel sequence-based fusion method for disease genes identification and prioritization. Journal of theoretical biology, 2015. 383: p. 12–19. doi: 10.1016/j.jtbi.2015.07.010 26209022
15. S Singh-Blom U. M., Natarajan N., Tewari A., Woods J. O., Dhillon I. S., and Marcotte E. M., Prediction and validation of gene-disease associations using methods inspired by social network analyses. PloS one, 2013. 8(5): p. e58977. doi: 10.1371/journal.pone.0058977 23650495
16. Vasighizaker A. and Jalili, C-PUGP: A Cluster-based Positive Unlabeled learning method for disease Gene Prediction and prioritization. Computational biology and chemistry, 2018.
17. Tatusova T., DiCuccio M., Badretdin A., Chetvernin V., Nawrocki E. P., Zaslavsky L., et al., NCBI prokaryotic genome annotation pipeline. Nucleic acids research, 2016. 44(14): p. 6614–6624. doi: 10.1093/nar/gkw569 27342282
18. Stirewalt D. L., Meshinchi S., Kopecky K. J., Fan W., Pogosova-Agadjanyan E. L., Engel J. H.,et al., Identification of genes with abnormal expression changes in acute myeloid leukemia. Genes, Chromosomes and Cancer, 2008. 47(1): p. 8–20. doi: 10.1002/gcc.20500 17910043
19. Yang P., Li X., Chua H. N., Kwoh C. K., and Ng S. K., Ensemble positive unlabeled learning for disease gene identification. PloS one, 2014. 9(5): p. e97079. doi: 10.1371/journal.pone.0097079 24816822
20. Maji P., Shah E., and Paul S., RelSim: An integrated method to identify disease genes using gene expression profiles and PPIN based similarity measure. Information Sciences, 2017. 384: p. 110–125.
21. Khan, S. S. and Madden, M. G., A survey of recent trends in one class classification. in Irish conference on artificial intelligence and cognitive science. 2009. Springer.
22. Schölkopf B., Platt J. C., Shawe-Taylor J., Smola A. J., and Williamson R. C., Estimating the support of a high-dimensional distribution. Neural computation, 2001. 13(7): p. 1443–1471. doi: 10.1162/089976601750264965 11440593
23. Tax D. M. and Duin R.P., Support vector domain description. Pattern recognition letters, 1999. 20(11–13): p. 1191–1199.
24. De Bie T., Tranchevent L. C., Van Oeffelen L. M., and Moreau Y., Kernel-based data fusion for gene prioritization. Bioinformatics, 2007. 23(13): p. i125–i132. doi: 10.1093/bioinformatics/btm187 17646288
25. Tran Q. A., Li X., and Duan H., Efficient performance estimate for one-class support vector machine. Pattern Recognition Letters, 2005. 26(8): p. 1174–1182.
26. Lee, W. S. and Liu, B., Learning with positive and unlabeled examples using weighted logistic regression. in ICML. 2003.
27. Liu, B., Dai, Y., Li, X., Lee, W. S., and Philip, S. Y., Building text classifiers using positive and unlabeled examples. in Data Mining, 2003. ICDM 2003. Third IEEE International Conference on. 2003. IEEE.
Článok vyšiel v časopise
PLOS One
2019 Číslo 12
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Těžké menstruační krvácení může značit poruchu krevní srážlivosti. Jaký management vyšetření a léčby je v takovém případě vhodný?
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Methylsulfonylmethane increases osteogenesis and regulates the mineralization of the matrix by transglutaminase 2 in SHED cells
- Oregano powder reduces Streptococcus and increases SCFA concentration in a mixed bacterial culture assay
- The characteristic of patulous eustachian tube patients diagnosed by the JOS diagnostic criteria
- Parametric CAD modeling for open source scientific hardware: Comparing OpenSCAD and FreeCAD Python scripts