Resources
This page contains the resources obtained in this research project.
- Multilingual corpora with annotation of collocations and lexical functions: github (paper).
 - Data sets of bilingual collocations: github (paper).
 - Automatically aligned collocations in English, Portuguese, and Spanish (link).
 
Due to space limitations, the following resources are only available under request:
- English, Portuguese, and Spanish word embeddings trained on large corpora (with lemma_PoS-TAG entries).
 - Cross-lingual models in these three languages (and also in French and Catalan).
 - Corpora in English, Portuguese, and Spanish (with more than one trillion tokens each) lemmatized, PoS-tagged and parsed in Universal Dependencies.
 

