MUSTER: MUltimodal processing of Spatial and TEmporal expRessions
Specific programme: European Coordinated Research on Long-term Challenges in Information and Communication Sciences & Technologies ERA-NET (CHIST ERA)
UPV/EHU Partner Status: Partner
UPV/EHU PI: Aitor Soroa
Project start: 01/01/2016
Project end: 31/12/2018
Brief description: The MUSTER project is a fundamental pilot research project which introduces a new multi-modal framework for the machine-readable representation of meaning. The focus of MUSTER lies on exploiting visual and perceptual input in the form of images and videos coupled with textual modality for building structured multi-modal semantic representations for the recognition of objects and actions, and their spatial and temporal relations. The MUSTER project will investigate whether such novel multi-modal representations will improve the performance of automated understanding of human language. MUSTER starts from the current state-of-the-work platform for human language representation learning known as text embeddings, but introduces the visual modality to provide contextual world knowledge which text-only models lack while humans possess such knowledge when understanding language. MUSTER will propose a new pilot framework for joint representation learning from text and vision data tailored for spatial and temporal language processing. The constructed framework will be evaluated on a series of HLU tasks (i.e., semantic textual similarity and disambiguation, spatial role labeling, zero-shot learning, temporal action ordering) which closely mimic the processes of human language acquisition and understanding.