Automatic Extraction of Multilingual Collocation Equivalents

This project aims to automatically extract massive instances of multilingual collocation equivalents in Portuguese, Spanish, and English. These multilingual collocations are useful to both improve second language learning and to enrich machine translation systems.

The collocations are extracted from parallel, comparable, and monolingual corpora, combining dependency parsing and statistical association measures with distributional semantics techniques.

The project is being developed by members of LyS (Language and Information Society) Group at the Faculty of Philology (UdC), from September 2017 to June 2019.

This project is supported by a 2017 Leonardo Grant for Researchers and Cultural Creators, BBVA Foundation. The Foundation takes no responsibility for the opinions, statements and visual content of the project, which are entirely the responsibility of its authors.

We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.