Medical Research Council, Cambridge
Abstract: Vision and brain computation in general can be understood as the transformation of representational geometry from one area to the next, and across time as recurrent dynamics converge in each area. The geometry of a representation can be usefully characterized by a representational distance matrix computed by comparing the patterns of brain activity elicited by a set of stimuli. This approach enables us to summarise population codes and compare representations between brain areas, between latencies after stimulus onset, between individuals and species, between brain and behaviour, and between brains and computational models. Results from cell recordings and fMRI suggest that the early visual representation is transformed into an object representation in inferior temporal (IT) cortex that strongly emphasises behaviourally important categorical divisions, while also distinguishing exemplars within each category. We compared 37 computational model representations to the IT representation. The more similar a model representation was to IT, the better the model performed at object categorisation. Most models did not come close to explaining our IT data, because they missed categorical distinctions prominent in primate brains. A deep neural network model that was trained by supervision with over a million category-labelled images came closest to explaining IT. This model reached the noise ceiling when its representational features were appropriately linearly remixed (using independent data to fit the weights). Deep neural networks are currently driving exciting advances in artificial intelligence and notably in computer vision. Beyond artificial intelligence, deep neural nets provide an attractive new framework also for more neurobiologically faithful modelling of high-level brain information processing.