Streaming chunk incremental learning for class-wise data stream classification with fast learning speed and low structural complexity

English version

Autoři: Prem Junsawang ^aff001; Suphakant Phimoltares ^aff001; Chidchanok Lursinsap ^aff001
Působiště autorů: Department of Mathematics and Computer Science, Faculty of Science, Chulalongkorn University, Bangkok, Thailand ^aff001
Vyšlo v časopise: PLoS ONE 14(9)
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pone.0220624

Souhrn

Due to the fast speed of data generation and collection from advanced equipment, the amount of data obviously overflows the limit of available memory space and causes difficulties achieving high learning accuracy. Several methods based on discard-after-learn concept have been proposed. Some methods were designed to cope with a single incoming datum but some were designed for a chunk of incoming data. Although the results of these approaches are rather impressive, most of them are based on temporally adding more neurons to learn new incoming data without any neuron merging process which can obviously increase the computational time and space complexities. Only online versatile elliptic basis function (VEBF) introduced neuron merging to reduce the space-time complexity of learning only a single incoming datum. This paper proposed a method for further enhancing the capability of discard-after-learn concept for streaming data-chunk environment in terms of low computational time and neural space complexities. A set of recursive functions for computing the relevant parameters of a new neuron, based on statistical confidence interval, was introduced. The newly proposed method, named streaming chunk incremental learning (SCIL), increases the plasticity and the adaptabilty of the network structure according to the distribution of incoming data and their classes. When being compared to the others in incremental-like manner, based on 11 benchmarked data sets of 150 to 581,012 samples with attributes ranging from 4 to 1,558 formed as streaming data, the proposed SCIL gave better accuracy and time in most data sets.

Klíčová slova:

Biology and life sciences – Cell biology – Biochemistry – Organisms – Eukaryota – Physical sciences – Research and analysis methods – Proteins – Fungi – Yeast – Neuroscience – Cognitive science – Cognitive psychology – Learning – Learning and memory – Psychology – Social sciences – Computer and information sciences – Data management – Mathematics – Simulation and modeling – Cellular types – Animal cells – Protein interactions – Applied mathematics – Algorithms – Cellular neuroscience – Neurons – Neural networks – Protein-protein interactions – Computer networks – Internet

Zdroje

1. Krawczyk B, Minku LL, Gama J, Stefanowski J, Woźniak M. Ensemble learning for data stream analysis: a survey. Inform. Fusion. 2017; 37 : 132–156. https://doi.org/10.1016/j.inffus.2017.02.004.

2. Polikar R, Upda L, Upda SS, Honavar V. Learn++: an incremental learning algorithm for supervised neural networks. IEEE Trans. Syst. Man. Cybern. C Appl. Rev. 2001 Nov; 31(4):497–508. http://dx.doi.org/10.1109/5326.983933

3. Wilson DR, Martinez TR. The General inefficiency of batch training for gradient descent learning. Neural Networks. 2003 Dec.;16(10): 1429–1451. http://dx.doi.org/10.1016/S0893-6080(03)00138-2 14622875

4. Constantinopoulos C, Likas A. An incremental training method for the probabilistic rbf network. IEEE Tran. Neural Networks. 2006 Jul.;17(4):966–974. http://dx.doi.org/10.1109/TNN.2006.875982

5. Shen F, Hasegawa O. A fast nearest neighbor classifier based on self-organizing incremental neural network. Neural Networks. 2008 Dec.;21(10):1537–1547. https://doi.org/10.1016/j.neunet.2008.07.001 18678468

6. Duan H, Shao X, Hou W, He G, Zeng Q. An incremental learning algorithm for lagrangian support vector machines. Pattern Recognit. Lett. 2009 Nov; 11(15):1384–1391. https://doi.org/10.1016/j.patrec.2009.07.006

7. Jaiyen S, Lursinsap C, Phimoltares S. A very fast neural learning for classification using only new incoming datum. IEEE Tran. Neural Networks. 2010 Mar.;21(3):381–392. http://dx.doi.org/10.1109/TNN.2009.2037148

8. Feng W, Yan Z, Ai-ping L, Quan-yuan W. Online classification algorithm for data streams based on fast iterative kernel principal component analysis. Proceedings of International Conference on Natural Computation 2009; Aug. 14; 232–236. http://dx.doi.org/10.1109/ICNC.2009.99

9. Domingos P, Hulten G. Mining high-speed data streams. Proceedings of sixth ACM SIGKDD international conference on knowledge discovery and data mining. 2000 Aug. 20;2 : 71–80. http://doi.acm.org/10.1145/347090.347107

10. Pang S, Ozawa S, Kasabov N. Incremental linear discriminant analysis for classification of data streams. IEEE Trans. Syst. Man Cybern. B Cybern. 2005 Oct.;35(5):905–914. http://dx.doi.org/10.1109/TSMCB.2005.847744 16240767

11. Wan S, Banta EB. Parameter incremental learning algorithm for neural networks. IEEE Tran. Neural Networks. 2006 Nov.;17(6):1424–1438. http://dx.doi.org/10.1109/TNN.2006.880581

12. Ozawa S, Pang S, Kasabov N. Incremental learning of chunk data for online pattern classification systems. IEEE Tran. Neural Networks. 2008 Jun. 19(6):1061–1074. http://dx.doi.org/10.1109/TNN.2007.2000059

13. Xu Y, Shen F, Zhao J. An incremental learning vector quantization algorithm for pattern classification. Neural Comput. Appl. 2012 Sep.;21(6):1205–1215. http://dx.doi.org/10.1007/s00521-010-0511-4

14. Heinen MR, Engel PM, Pinto RC. IGMN: an incremental gaussian mixture network that learns instantaneously from data flows. Proceedings of VIII Encontro Nacional de Inteligencia Artificial 2011.

15. Pinto RC, Engel PM. A fast incremental gaussian mixture model. PLoS ONE. 2015 Oct.;10(10):e0139931. https://doi.org/10.1371/journal.pone.0139931 26444880

16. Khan MA, Khan A, Khan MN, Anwar S. A novel learning method to classify data streams in the internet of things. Proceedings of National Software Engineering Conference 2014; Nov. 11-12; 61–66. http://dx.doi.org/10.1109/NSEC.2014.6998242

17. Srilakshmi Annapoorna PV, Mirnalinee TT. Streaming data classification. Proceedings of International Conference on Recent Trends in Information Technology. 2016; 1-7. https://doi.org/10.1109/ICRTIT.2016.7569525

18. Junsawang P, Phimoltares S, Lursinsap C. A fast learning method for streaming and randomly ordered multi-class data chunks by using discard-after-learn class-wise learning concept. Expert Syst. Appl. 2016 Nov.;63(C):249–266. https://doi.org/10.1016/j.eswa.2016.07.002

19. Kisi O, Shiri J, Karimi S, Adnan RM. Three different adaptive neuro fuzzy computing techniques for forecasting long-period daily streamflows. In: Roy S, Samui P, Deo R, Ntalampiras S, editors. Big data in engineering applications. Studies in Big Data, 44. Springer; 2018. pp. 303–321. https://doi.org/10.1007/978-981-10-8476-8_15

20. Benitez VH. Pattern classification and its applications to control of biomechatronic systems. In: Alanis AY, Arana-Daniel N, López-Franco C, editors. Artificial neural networks for engineering applications. Academic Press; 2019. pp. 139–154. https://doi.org/10.1016/B978-0-12-818247-5.00020-4

21. Steele AJ, Denaxas SC, Shah AD, Hemingway H, Luscombe NM. Machine learning models in electronic health records can outperform conventional survival models for predicting patient mortality in coronary artery disease. PLoS ONE. 2018; 13(8): e0202344. https://doi.org/10.1371/journal.pone.0202344 30169498

22. Kim D, You S, So S, Lee J, Yook S, Jang DP, et al. (2018) A data-driven artificial intelligence model for remote triage in the prehospital environment. PLoS ONE. 2018; 13(10): e0206006. https://doi.org/10.1371/journal.pone.0206006 30352077

23. Goto S, Kimura M, Katsumata Y, Goto S, Kamatani T, Ichihara G, et al. Artificial intelligence to predict needs for urgent revascularization from 12-leads electrocardiography in emergency patients. PLoS ONE. 14(2019): e0210103. https://doi.org/10.1371/journal.pone.0210103 30625197

24. Bollena J, Maoa H, Zeng X. Twitter mood predicts the stock market. Int. j. comput. sci. 2011; 2(1): 1–8. https://doi.org/10.1016/j.jocs.2010.12.007

25. Jain AP, Katkar VD. Sentiments analysis of Twitter data using data mining. Proceedings of International Conference on Information Processing. 2015 Dec 16-19. https://doi.org/10.1109/INFOP.2015.7489492

26. Alharbi ASM, Doncker E. Twitter sentiment analysis with a deep neural network: an enhanced approach using user behavioral information. Cogn. Syst. Res. 54(2019): 50–61. https://doi.org/10.1016/j.cogsys.2018.10.001

27. Chen H, Chiang RHL, Storey VC. Business intelligence and analytics: From big data to big impact. MIS Quart. 2012; 36 : 1,165–1,188. https://doi.org/10.2307/41703503

28. Moghaddama AH, Moghaddam MH, Esfandyari M. Stock market index prediction using artificial neural network. Journal of Economics, Finance and Administrative Science. 2016; 21(41): 89–93 https://doi.org/10.1016/j.jefas.2016.07.002

29. Vidgen R, Shaw S, Grant DB. Management challenges in creating value from business analytics. Eur. J. Oper. Res. 2017; 261(2): 626–639. https://doi.org/10.1016/j.ejor.2017.02.023

30. Jurgovsky J, Granitzer M, Ziegler K, Calabretto S, Portier PE, He-Guelton L, Caelen O. Sequence classification for credit-card fraud detection. Expert Syst. Appl. 2018; 100 : 234–245. https://doi.org/10.1016/j.eswa.2018.01.037

31. Tealab A, Hefny H, Badr A. Forecasting of nonlinear time series using ANN. Future Computing and Informatics Journal. 2017; 2 : 39–47. https://doi.org/10.1016/j.fcij.2017.05.001

32. Guo T, Xu Z, Yao X, Chen H, Aberer K, Funaya K. Robust online time series prediction with recurrent neural networks. Proceedings of IEEE International Conference on Data Science and Advanced Analytics. 2016; 816-825. https://doi.org/10.1109/DSAA.2016.92

33. Mori U, Mendiburu A, Keogh E, Lozano JA. Reliable early classification of time series based on discriminating the classes over time. Data Min. Knowl. Disc. 2017; 31 : 233–263. https://doi.org/10.1007/s10618-016-0462-1

34. Martínez-Rego D, Pérez-Sánchez B, Fontenla-Romero O, Alonso-Betanzos A. A robust incremental learning method for non-stationary environments. Neurocomputing. 2011 May.;74(11):1800–1808. http://dx.doi.org/10.1016/j.neucom.2010.06.037

35. Lichman M. UCI Machine Learning Repository. Univ. of California, Irvine, 2013. https://archive.ics.uci.edu/ml/datasets.php

36. Thanathamathee P, Lursinsap C. Predicting protein-protein interactions using correlation coefficient and principle component analysis. Proceedings of International Conference on Bioinformatics and Biomedical Engineering. 2009 June 11-13. https://doi.org/10.1109/ICBBE.2009.5163211

37. Nguyen HL, Woon YK, Ng WK. A survey on data stream clustering and classification. Knowl Inf Syst. 2015; 45; 535–569. https://doi.org/10.1007/s10115-014-0808-1

Streaming chunk incremental learning for class-wise data stream classification with fast learning speed and low structural complexity

Souhrn

Klíčová slova:

Zdroje

PLOS One