Multimodal Machines from the Perspective of Humans

Sunit Bhattacharya (ÚFAL MFF UK)
Deep Neural Networks have rapidly become the most dominant approach to solve many complicated learning problems in recent times. Although initially inspired by biological neural networks, the current deep learning systems are much more motivated by practical engineering needs and performance requirements. And yet, some of these networks exhibit a lot of similarities with human brains. This talk explores how recent work has established similarities between representations learnt by deep learning systems and cognitive data collected from the human brain. The talk specifically discusses two questions concerning the comparison of humans with machines. One, given the same multimodal tasks, how could the performance of machines and current AI systems be compared? Two, how do we use multimodal deep learning systems to make predictions about human cognitive data? The talk introduces our Eyetracked Multi-Modal Translation (EMMT) corpus, a dataset containing monocular eye movement recordings, audio and 4-electrode electroencephalogram (EEG) data of 43 people as a tool to find answers to the questions posed above. The dataset consists of reading, sight translation and multimodal sight translation tasks involving different text and image stimuli settings when translating from English to Czech. Some analysis from the data, especially describing the observation of a variant of Stroop effect in the experiment data will also feature in this talk.
views: 404

Attachments: (video, slides, etc.)
44.0 MB