Learning Interlingual Representations of Words and Concepts
Specific programme: Google Global Faculty Research Awards
UPV/EHU Partner Status: Beneficiary
UPV/EHU PI: Eneko Agirre
Project start: 01/03/2016
Project end: 28/02/2017
Brief description: Recent developments in word representation have shown that distributional semantic representations derived from text corpora effectively capture word similarity notions, and allow for improvements across many applications. Despite the widespread success, there is still room for further improvements: moving into concepts in order to distinguish between different meanings of ambiguous words (bank as financial institution vs. bank as river shore); Linking to Knowledge Graphs (KG) like WordNet or DBpedia to allow further inference capability; and exploiting complementarity between languages at the concept level. The goal of this project is to build an interlingual concept based representations which combines information from KG and corpora.
Contrary to previous work which adds a limited number of restrictions from KGs to preexisting distributional representations, we build powerful knowledge based representations and combine them head to head with distributional representations. The project will show that the technique can be used to build concept representations in the same embedding space, and that it can be easily extended to accommodate multilingual information from parallel corpora and multilingual KGs, yielding interlingual representations of words and concepts in the same embedding space.
The new techniques will open research on interlingual meaning representation across languages, exploring other KG such as DBpedia or Freebase, enabling interlingual disambiguation for concepts and instances, and across the board improvement of mono and cross lingual NLP applications including information retrieval, extraction and organization.