Deep Learning Paper Accepted in Top Journal Neurocomputing
We are happy to announce that a paper by LML member, Mingrui Lao, has been accepted in one of the top neural network journals, Neurocomputing
A paper on Visual-Question Answering by Mingrui Lao has been accepted in one of the top neural network journals in the world.

Multi-Stage Hybrid Embedding Fusion Network for Visual Question Answering


Multimodal fusion is a crucial component of Visual Question Answering (VQA), which involves joint understanding and semantic integration be- tween visual and textual information. Existing VQA learning frameworks focus mainly on Latent Embedding Fusion (LEF) method, by projecting vi- sual and textual features into a common latent space, and fusing them with element-wise multiplication. In this paper, we intend to achieve multiple and fine-grained multimodal interactions for enhancing fusion performance. To this end, we propose a Multi-stage Hybrid Embedding Fusion (MHEF) network to fulfill our improvements in two folds: First, we introduce a Dual Embedding Fusion (DEF) approach that transforms one modal input into the reciprocal embedding space before integration, and the DEF is further incorporated with the LEF to form a novel Hybrid Embedding Fusion (HEF). Second, we design a Multi-stage Fusion Structure (MFS) for the HEF to form the MHEF network, so as to obtain diverse and profound fusion features for answer prediction. By jointly training the multi-stage framework, we can not only improve the performance in each single stage, but also get further accuracy boost when integrating all prediction results from each stage. Ex- tensive experiments verify both our proposed HEF and MFS are beneficial to multi-modal fusion. The full MHEF model outperforms the base- line LEF model with 2% accuracy boosts, and achieves promising performance on the VQA-v1 and VQA-v2 datasets.

Media Lab Overview
LIACS Homepage
MM Conf
ACM Multimedia
Science Direct
IEEE Library
LIACS Publications
ACM Digital Library