Proposed MQA Pipeline

Our proposed VQA system takes as input an image, its corresponding meta-data, and an associated question.
Implementation Overview

The overall workflow diagram of our project.
Data Flow Diagram

The data flow diagram of our proposed MQA system.
System Design
Instance Mask Generation Module

Our instance mask generation module utilizes Mask2Former, which is the recent SOTA framework for generating instance masks.
Scene Graph Generation Module

Scene graph generation module is responsible for generating a scene graph for the image if that is not provided as an input. Scene graphs represent visual scenes as objects and their attributes as nodes of the graph and the relationships between them as edges.
Sub-components of Feature Extraction Module

The feature extraction module takes the 4 inputs and processes them individually into a format that is then consumed by the QA model.
QA Model Backbone 1: Single Stream - ViLT

We use ViLT as one of our baseline QA models, which is a recently proposed parameter efficient model following the single-stream architecture. ViLT processes the images similar to the textual inputs.
QA Model Backbone 2: Dual Stream - LEXMERT

The LXMERT QA model has 2 separate streams for image and text modalities. Each stream focuses on individual modalities and encodes the relationships and interactions between different entities present in them.
Parameter-Efficient Fine-Tuning: Adapter

The architecture of the adapter layer and how it is incorporated into the transformer block.
Parameter-Efficient Fine-Tuning: Prefix Tuning

Prefix tuning is a light-weight alternative strategy to fine-tuning and is able to achieve comparable results especially in low-data resource settings. It is also shown to be more robust and generalisable when compared to fine-tuning.
Dataset Analysis

GQA question distribution analysis.

VQA-2.0 question distribution analysis.

TallyQA question distribution analysis.
Dataset Creation
Data Generation Workflow

The workflow for generating our dataset for our VQA task.
Question Generation Overview

The question generation overview illustrates our question generation image sources and proposed numbers of generated questions for each split.
Question Generation Workflow

The question generation workflow summarizes the process of our sampling and balancing process on GQA, VQA-2.0 and TallyQA.
Question Sampling Overview

The question sampling overview illustrates our sampling sources and sampling number for each split.
Question Balancing Example on VQA-2.0

Comparison of our question balancing result on VQA-2.0 train split’s color questions.