Rapid visual categorization is not guided by early salience-based selection
Autoři:
John K. Tsotsos aff001; Iuliia Kotseruba aff001; Calden Wloka aff001
Působiště autorů:
Department of Electrical Engineering and Computer Science, York University, Toronto, ON, Canada
aff001
Vyšlo v časopise:
PLoS ONE 14(10)
Kategorie:
Research Article
prolekare.web.journal.doi_sk:
https://doi.org/10.1371/journal.pone.0224306
Souhrn
The current dominant visual processing paradigm in both human and machine research is the feedforward, layered hierarchy of neural-like processing elements. Within this paradigm, visual saliency is seen by many to have a specific role, namely that of early selection. Early selection is thought to enable very fast visual performance by limiting processing to only the most salient candidate portions of an image. This strategy has led to a plethora of saliency algorithms that have indeed improved processing time efficiency in machine algorithms, which in turn have strengthened the suggestion that human vision also employs a similar early selection strategy. However, at least one set of critical tests of this idea has never been performed with respect to the role of early selection in human vision. How would the best of the current saliency models perform on the stimuli used by experimentalists who first provided evidence for this visual processing paradigm? Would the algorithms really provide correct candidate sub-images to enable fast categorization on those same images? Do humans really need this early selection for their impressive performance? Here, we report on a new series of tests of these questions whose results suggest that it is quite unlikely that such an early selection process has any role in human rapid visual categorization.
Klíčová slova:
Algorithms – Behavior – Eyes – Vision – Computer vision – Human performance – Visual system – Eye movements
Zdroje
1. Rosenblatt F. Principles of neurodynamics: Perceptrons and the theory of brain mechanisms. Washington DC: Spartan; 1965.
2. Fukushima K. Cognitron: A self-organizing multilayered neural network. Biological cybernetics. 1975;20(3-4):121–136. doi: 10.1007/bf00342633 1203338
3. Fukushima K, Miyake S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and cooperation in neural nets. Springer; 1982. p. 267–285.
4. Rumelhart DE, McClelland JL. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Cambridge MA: MIT Press; 1986.
5. LeCun Y, Bengio Y. Convolutional networks for images, speech, and time series. In: Arbib MA, editor. The handbook of brain theory and neural networks. Cambridge MA: MIT Press; 1995. p. 255–258.
6. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems; 2012. p. 1097–1105.
7. Tsotsos JK. A ‘complexity level’ analysis of immediate vision. International Journal of Computer Vision. 1988;1(4):303–320. doi: 10.1007/BF00133569
8. Potter MC, Levy EI. Recognition memory for a rapid sequence of pictures. Journal of experimental psychology. 1969;81(1):10–15. doi: 10.1037/h0027470 5812164
9. Potter MC, Faulconer BA. Time to understand pictures and words. Nature. 1975;253(5491):437–438. doi: 10.1038/253437a0 1110787
10. Potter MC. Meaning in visual search. Science. 1975;187(4180):965–966.
11. Thorpe S, Fize D, Marlot C. Speed of processing in the human visual system. nature. 1996;381(6582):520–522. doi: 10.1038/381520a0 8632824
12. Potter MC, Wyble B, Hagmann CE, McCourt ES. Detecting meaning in RSVP at 13 ms per picture. Attention, Perception, & Psychophysics. 2014;76(2):270–279. doi: 10.3758/s13414-013-0605-z
13. Feldman JA, Ballard DH. Connectionist models and their properties. Cognitive science. 1982;6(3):205–254. doi: 10.1207/s15516709cog0603_1
14. Fukushima K. A neural network model for selective attention in visual pattern recognition. Biological Cybernetics. 1986;55(1):5–15. doi: 10.1007/bf00363973 3801530
15. Hochreiter S, Schmidhuber J. Long short-term memory. Neural computation. 1997;9(8):1735–1780. doi: 10.1162/neco.1997.9.8.1735 9377276
16. Sutskever I. Training Recurrent Neural Networks [PhD Thesis]. University of Toronto; 2012.
17. Tsotsos JK. The complexity of perceptual search tasks. In: Proceedings of 11th International Joint Conference on Artificial Intelligence. vol. 89; 1989. p. 1571–1577.
18. Koch C, Ullman S. Shifts in selective visual attention: towards the underlying neural circuitry. In: Matters of intelligence. Springer; 1987. p. 115–141.
19. Treisman AM, Gelade G. A feature-integration theory of attention. Cognitive psychology. 1980;12(1):97–136. doi: 10.1016/0010-0285(80)90005-5 7351125
20. Broadbent D. Perception and communication. Pergamon Press, NY; 1958.
21. Deutsch JA, Deutsch D. Attention: Some theoretical considerations. Psychological review. 1963;70(1):80–90. doi: 10.1037/h0039515 14027390
22. Mackay DG. Aspects of the theory of comprehension, memory and attention. Quarterly Journal of Experimental Psychology. 1973;25(1):22–40. doi: 10.1080/14640747308400320
23. Moray N. Attention: Selective processes in vision and hearing. London, Hutchinson Educational; 1969.
24. Norman DA. Toward a theory of memory and attention. Psychological review. 1968;75(6):522–536. doi: 10.1037/h0026699
25. Treisman AM. The effect of irrelevant material on the efficiency of selective listening. The American Journal of Psychology. 1964;77(4):533–546. doi: 10.2307/1420765 14251963
26. Clark JJ, Ferrier NJ. Modal control of an attentive vision system. In: Proceedings of the Second IEEE International Conference on Computer Vision; 1988. p. 514–523.
27. Sandon PA. Simulating visual attention. Journal of Cognitive Neuroscience. 1990;2(3):213–231. doi: 10.1162/jocn.1990.2.3.213 23972045
28. Culhane SM, Tsotsos JK. An attentional prototype for early vision. In: Proceedings of the European Conference on Computer Vision; 1992. p. 551–560.
29. Itti L, Koch C, Niebur E. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1998;20(11):1254–1259. doi: 10.1109/34.730558
30. Bylinskii Z, DeGennaro EM, Rajalingham R, Ruda H, Zhang J, Tsotsos JK. Towards the quantitative evaluation of visual attention models. Vision research. 2015;116:258–268. doi: 10.1016/j.visres.2015.04.007 25951756
31. Bruce ND, Wloka C, Frosst N, Rahman S, Tsotsos JK. On computational modeling of visual saliency: Examining what’s right, and what’s left. Vision research. 2015;116:95–112. doi: 10.1016/j.visres.2015.01.010 25666489
32. Bylinskii Z, Judd T, Oliva A, Torralba A, Durand F. What do different evaluation metrics tell us about saliency models? IEEE Transactions on Pattern Analysis and Machine Intelligence. 2018;. doi: 10.1109/TPAMI.2018.2815601 29993800
33. Tsotsos JK, Eckstein MP, Landy MS. Computational models of visual attention. Vision research. 2015;116(Pt B):93. doi: 10.1016/j.visres.2015.09.007 26420739
34. Ba J, Mnih V, Kavukcuoglu K. Multiple object recognition with visual attention. arXiv preprint arXiv:14127755. 2014;.
35. Zhang J, Bargal SA, Lin Z, Brandt J, Shen X, Sclaroff S. Top-down neural attention by excitation backprop. International Journal of Computer Vision. 2018;126(10):1084–1102. doi: 10.1007/s11263-017-1059-x
36. Shashua A, Ullman S. Structural Saliency: The Detection Of Globally Salient Structures using A Locally Connected Network. In: Proceedings of IEEE International Conference on Computer Vision; 1988. p. 321–327.
37. Olshausen BA, Anderson CH, Van Essen DC. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. Journal of Neuroscience. 1993;13(11):4700–4719. doi: 10.1523/JNEUROSCI.13-11-04700.1993 8229193
38. Itti L, Koch C. Computational modelling of visual attention. Nature reviews neuroscience. 2001;2(3):194. doi: 10.1038/35058500 11256080
39. Walther D, Itti L, Riesenhuber M, Poggio T, Koch C. Attentional selection for object recognition—a gentle way. In: International Workshop on Biologically Motivated Computer Vision; 2002. p. 472–479.
40. Li Z. A saliency map in primary visual cortex. Trends in cognitive sciences. 2002;6(1):9–16. doi: 10.1016/S1364-6613(00)01817-9 11849610
41. Zhaoping L. Understanding vision: theory, models, and data. Oxford University Press, USA; 2014.
42. Deco G, Rolls ET. A neurodynamical cortical model of visual attention and invariant object recognition. Vision research. 2004;44(6):621–642. doi: 10.1016/j.visres.2003.09.037 14693189
43. Itti L. Models of bottom-up attention and saliency. In: Neurobiology of attention. Elsevier; 2005. p. 576–582.
44. Chikkerur S, Serre T, Tan C, Poggio T. What and where: A Bayesian inference theory of attention. Vision research. 2010;50(22):2233–2247. doi: 10.1016/j.visres.2010.05.013 20493206
45. Zhang Y, Meyers EM, Bichot NP, Serre T, Poggio TA, Desimone R. Object decoding with attention in inferior temporal cortex. Proceedings of the National Academy of Sciences. 2011;108(21):8850–8855. doi: 10.1073/pnas.1100999108
46. Buschman TJ, Kastner S. From behavior to neural dynamics: an integrated theory of attention. Neuron. 2015;88(1):127–144. doi: 10.1016/j.neuron.2015.09.017 26447577
47. Yan Y, Zhaoping L, Li W. Bottom-up saliency and top-down learning in the primary visual cortex of monkeys. Proceedings of the National Academy of Sciences. 2018;115(41):10499–10504. doi: 10.1073/pnas.1803854115
48. Horwitz GD, Newsome WT. Separate signals for target selection and movement specification in the superior colliculus. Science. 1999;284(5417):1158–1161. doi: 10.1126/science.284.5417.1158 10325224
49. Kustov AA, Robinson DL. Shared neural control of attentional shifts and eye movements. Nature. 1996;384(6604):74. doi: 10.1038/384074a0 8900281
50. McPeek RM, Keller EL. Saccade target selection in the superior colliculus during a visual search task. Journal of neurophysiology. 2002;88(4):2019–2034. doi: 10.1152/jn.2002.88.4.2019 12364525
51. Koch C. A theoretical analysis of the electrical properties of an X-cell in the Cat’s LGN: Does the spine-triad circuit subserve selective visual attention. Artificial Intelligence Memo. 1984;787.
52. Sherman S, Koch C. The control of retinogeniculate transmission in the mammalian lateral geniculate nucleus. Experimental Brain Research. 1986;63(1):1–20. doi: 10.1007/bf00235642 3015651
53. Petersen SE, Robinson DL, Morris JD. Contributions of the pulvinar to visual spatial attention. Neuropsychologia. 1987;25(1):97–105. doi: 10.1016/0028-3932(87)90046-7 3574654
54. Posner MI, Petersen SE. The attention system of the human brain. Annual review of neuroscience. 1990;13(1):25–42. doi: 10.1146/annurev.ne.13.030190.000325 2183676
55. Robinson DL, Petersen SE. The pulvinar and visual salience. Trends in Neurosciences. 1992;15(4):127–132. doi: 10.1016/0166-2236(92)90354-b 1374970
56. Thompson KG, Bichot NP, Schall JD. Dissociation of visual discrimination from saccade programming in macaque frontal eye field. Journal of neurophysiology. 1997;77(2):1046–1050. doi: 10.1152/jn.1997.77.2.1046 9065870
57. Gottlieb JP, Kusunoki M, Goldberg ME. The representation of visual salience in monkey parietal cortex. Nature. 1998;391(6666):481. doi: 10.1038/35135 9461214
58. Bruce ND, Tsotsos JK. Saliency, attention, and visual search: An information theoretic approach. Journal of vision. 2009;9(3):5–5. doi: 10.1167/9.3.5 19757944
59. Bylinskii Z, Judd T, Borji A, Itti L, Durand F, Oliva A, et al. MIT saliency benchmark; 2015.
60. Huang X, Shen C, Boix X, Zhao Q. SALICON: Reducing the semantic gap in saliency prediction by adapting deep neural networks. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. p. 262–270.
61. Zhang J, Sclaroff S. Saliency detection: A boolean map approach. In: Proceedings of the IEEE international conference on computer vision; 2013. p. 153–160.
62. Vig E, Dorr M, Cox D. Large-scale optimization of hierarchical features for saliency prediction in natural images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2014. p. 2798–2805.
63. Riche N, Mancas M, Duvinage M, Mibulumukini M, Gosselin B, Dutoit T. Rare2012: A multi-scale rarity-based saliency detection with its comparative statistical analysis. Signal Processing: Image Communication. 2013;28(6):642–658.
64. Kümmerer M, Wallis TS, Bethge M. DeepGaze II: Reading fixations from deep features trained on object recognition. arXiv preprint arXiv:161001563. 2016;.
65. Judd T, Ehinger K, Durand F, Torralba A. Learning to predict where humans look. In: Proceedings of the IEEE International Conference on Computer Vision; 2009. p. 2106–2113.
66. Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556. 2014;.
67. Alexe B, Deselaers T, Ferrari V. What is an object? In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on; 2010. p. 73–80.
68. Borji A, Cheng MM, Jiang H, Li J. Salient object detection: A benchmark. IEEE transactions on image processing. 2015;24(12):5706–5722. doi: 10.1109/TIP.2015.2487833 26452281
69. Thorpe SJ, Imbert M. Biological constraints on connectionist modelling. Connectionism in perspective. 1989; p. 63–92.
70. Tsotsos JK, Culhane SM, Wai WYK, Lai Y, Davis N, Nuflo F. Modeling visual attention via selective tuning. Artificial Intelligence. 1995;78(1-2):507–545. doi: 10.1016/0004-3702(95)00025-9
71. van der Heijden AH, Schreuder R, Wolters G. Enhancing single-item recognition accuracy by cueing spatial locations in vision. The Quarterly Journal of Experimental Psychology Section A. 1985;37(3):427–434. doi: 10.1080/14640748508400943
72. Fabre-Thorpe M, Richard G, Thorpe SJ. Rapid categorization of natural images by rhesus monkeys. Neuroreport. 1998;9(2):303–308. doi: 10.1097/00001756-199801260-00023 9507973
73. Herzog MH, Clarke AM. Why vision is not both hierarchical and feedforward. Frontiers in computational neuroscience. 2014;8:135. doi: 10.3389/fncom.2014.00135 25374535
74. Tsotsos J, Kotseruba I, Wloka C. A focus on selection for fixation. Journal of Eye Movement Research. 2016;9(5).
75. Wloka C, Kotseruba I, Tsotsos JK. Active fixation control to predict saccade sequences. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2018. p. 3184–3193.
Článok vyšiel v časopise
PLOS One
2019 Číslo 10
- Metamizol jako analgetikum první volby: kdy, pro koho, jak a proč?
- Nejasný stín na plicích – kazuistika
- Masturbační chování žen v ČR − dotazníková studie
- Těžké menstruační krvácení může značit poruchu krevní srážlivosti. Jaký management vyšetření a léčby je v takovém případě vhodný?
- Fixní kombinace paracetamol/kodein nabízí synergické analgetické účinky
Najčítanejšie v tomto čísle
- Correction: Low dose naltrexone: Effects on medication in rheumatoid and seropositive arthritis. A nationwide register-based controlled quasi-experimental before-after study
- Combining CDK4/6 inhibitors ribociclib and palbociclib with cytotoxic agents does not enhance cytotoxicity
- Experimentally validated simulation of coronary stents considering different dogboning ratios and asymmetric stent positioning
- Risk factors associated with IgA vasculitis with nephritis (Henoch–Schönlein purpura nephritis) progressing to unfavorable outcomes: A meta-analysis