Tel.: +49 541 969-3380
Institute of Cognitive Science,
49090 Osnabrück, Germany
Incorporating motion into PeriNet - a computational model for central and peripheral vision
The human visual system can be split into two subsystems - central and peripheral vision. Central vision is characterized by recognizing high spatial frequencies (fine details), color and shape, while peripheral vision recognizes low spatial frequencies, flicker and motion better. The peripheral system serves several purposes, such as fast reaction to visual stimuli, spatial orientation and capturing the gist of a scene. The vast majority of the area in the human retina contributes to it, while only a small area (fovea centralis) on the retina contributes to central vision. Despite of this fact, their representation in the human visual cortex is reciprocal. In general peripheral vision is fast and coarse, while central vision is slow and localized but accurate.\par
The balance of both systems is crucial for human visual capabilities and day to day life. The majority of computational models, however, largely ignore this split reported in biological studies. Convolutional Neural Networks (CNNs) are the state of the art models for image classification and object detection. They are largely used under the model of the central vision. However, with increase in the size of the processed images they become very slow and expensive to train. Typical CNNs are characterized by learning tens or hundreds of millions of parameters and require expensive hardware to train. In order to address this problem we have developed a biologically inspired computational model that accounts for the split in peripheral and central vision (PeriNet). PeriNet is an end-to-end hard-attention classification model, that is supervised on categorical labels only. It takes as input a downscaled, grayscale and blurred counterpart of the images, applies an attention model to determine the locations of interesting regions on the image and produces a small crop of the original high resolution, color image in these coordinates. This crop accounts for central vision and is then processed in the classical way to determine the categorical label of the image.\par
This work continues previous research on the PeriNet model in order to exploit the continuously transforming nature of the world. By accounting for motion, several saccades and fixations can be made over time, in order to improve the classification accuracy of the model and maintain stable awareness of the presented visual stimuli. The results are important for advancing our understanding of the human visual system and developing efficient, biologically plausible end-to-end computational models for vision.