Resources

This page contains the resources obtained in this research project.

Multilingual corpora with annotation of collocations and lexical functions: github (paper).
Data sets of bilingual collocations: github (paper).
Automatically aligned collocations in English, Portuguese, and Spanish (link).

Due to space limitations, the following resources are only available under request:

English, Portuguese, and Spanish word embeddings trained on large corpora (with lemma_PoS-TAG entries).
Cross-lingual models in these three languages (and also in French and Catalan).
Corpora in English, Portuguese, and Spanish (with more than one trillion tokens each) lemmatized, PoS-tagged and parsed in Universal Dependencies.