Annotation Guidelines
This page includes the annotation guidelines referred in the paper Pay attention when you pay the bills. A multilingual corpus with dependency-based and semantic annotation of collocations, presented at The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).
1. Introduction
This is a proposal for a collocation annotation guide within the frame of the project Automatic Extraction of Multilingual Collocation Equivalents, working with collocations in Portuguese (Pt), Spanish (Es), and English (En). We conceive of collocations as defined in Mel’čuk (1995): a collocation is a combination of two lexical items (A and B) in which the meaning of B is empty (or redundant with respect to A, i.e. Pt. fazer um gol) or is expressed by B in the context of A (i.e. Es.: ojos castaños, lápiz *castaño).
In the case of light verb constructions we follow the guidelines of the PARSEME project. In the case of noun+adjective patterns we have annotated adjectives that perform the lexical functions (Mel’čuk, 1996) most conspicuously associated to this category. For the description of predicative nouns, we have used a combination of Polguère (2012) and PARSEME guidelines.
2. Verb+noun collocations
2.1 LVC.full and LVC.cause collocations
2.1.1. Light verb constructions generally are combinations of verb plus object combinations in which the meaning of the verb is redundant with that of the noun or adds the sense of causality (some verb+prepositional objects are also considered LVC: e.g., put into contact with, but for this project we limit ourselves to direct objects). The relevant dependency in Universal Dependencies (UD) is object (obj).
Examples:
- Pt. X faz/marca um golobj.
- Es. X marca un golobj.
- En. X scores a goalobj.
2.1.2. The noun must be predicative, i.e. it has to denote a state or an event. Furthermore, it must have semantic arguments (e.g. Maria’sx fear of spidersy, or Charlesx walk through the parky.).
2.1.3. Light verbs must not add a meaning to what is expressed by the predicative noun. This can be proven in nominalizations: e.g. Anne took a walk → Anne’s walk (cf. ??Anne’s taking of a walk). Causative verbs add a sense of causation (Lack of sleep gives [~ ‘causes’] me headaches).
PARSEME guidelines distinguishes two types of annotation: LVC.full is the label used for light pure verbs and LVC.cause is the label used for causative verbs. The criterion to distinguish both in PARSEME is not the presence of the meaning ‘cause’, but the fact that causative verbs add a new actant encoded as the verb syntactic subject:
- En. X’s headache → TelevisionY gives JohnX headaches LVC.cause.
- En. X’s reaction to Y → TelevisionY caused John’sX reaction LVC.full.
According to this, those constructions that are normally considered causative (i.e.: dar miedo ‘cause fear’ in Es.) must be annotated as support verb or LVC.full (medo de X a Y ‘X’s fear of Y’ → Y dá medo a X ‘Y causes fear to X’, in Pt.).
PARESEME’s guidelines offer a set of formal tests to identify light verb and causative constructions (link).
2.2. Noun-verb collocations other than LVC
2.2.1. Collocations in which the verb conveys the meaning ‘cause to end’ and the noun is predicative, like those of 2.1. E.g.: quench thirst.
2.2.2. Idiomatic collocations: A criterion to identify this kind of collocations is the lack of congruent translations in other languages. E.g: Pt. aprovar um lei → En. pass a law.
3. Noun+adjective collocations.
The criteria to identify noun+adjective collocations are essentially semantic and are based on Mel’čuk’s lexical functions:
3.1. Adjectives expressing intensification and/or attenuation (Magn, AntiMagn):
- Pt. chuva forte → En. heavy rain.
- Pt. chuva fraca → En. light rain.
NB: We do not annotate combinations in which the adjective expresses the size of a physical object, rather than intensification: big table.
3.2. Adjectives expressing a positive/negative evaluation on the part of the speaker (Bon, AntiBon):
- Es. futuro positivo → En. positive future.
- Pt. boa notícia → En. good news.
3.3. Adjectives expressing a positive/negative evaluation on the part of one of the noun semantic arguments (Pos, AntiPos):
- Es. buena evaluación → En. good evaluation.
- Es. crítica dura → En. harsh criticism.
3.4. Adjectives expressing the sense ‘proper, as it should be’ (Ver, AntiVer):
- Es. lección [~ castigo] merecida → En. deserved lesson.
- Pt. voto válido → En. valid vote.
3.5. Adjectives expressing a specific meaning only in the context of the noun (Non-Standard):
- Es. cabello/ojos castaño(s) (cf. rotulador marrón/*castaño ‘brown marker’) → En. brown hair/eyes.
- Es. año bisiesto → En. leap year.
3.6. Adjectives expressing the sense of ‘intensification+quantification’. or its opposite (Magn_quant, Anti.Magn_quant).
- En. unanimous respect → Es. respeto unánime.
3.7. Adjectives combining the sense ‘intensification’ and ‘time’, and its opposite:
- Es. moda pasajera → En. passing fad.
4. Noun+noun collocations
4.1. Nouns that convey the sense of ‘head of’ (LF Cap):
- En. university dean → Es. decano de la universidad.
4.2. Nouns expressing the sense ‘a unit of’ (Sing) or ‘a set of’ (Mult):
- En. fit of rage → Es. ataque de cólera.
- En. sheep herd → Pt. rebanho de ovelhas.
4.3. Nouns expressing the sense of ‘centre of’ (Centr):
- En. heart of darkness → Es. corazón de las tinieblas.
4.4. Nouns conveying the sense ‘culmination of’ (Culm):
- Es. la cima de la escalinata → En. the top of the staircase.
4.5. Nouns expressing a generic concept that encompass the sense of the second noun (Gener):
- En. feeling of guilt → Es. sentimiento de culpa.
- En. colera disease → Es. enfermedad del cólera.
4.6. Nouns conveying the sense ‘inception’ (Germ):
- En. bout of rabies → Es. brote de rabia.
NB: we only annotate this kind of collocations if head=collocate, and dependent=base.
References
- Mel'čuk, I. A. 1995. “Phrasemes in Language and Phraseology in Linguistics.” In M. Everaert, E.-J. Van der Linden, A. Schenk & R. Schreuder (eds.) Idioms: Structural and Psychological perspectives, 167–232. Hillsdale, NJ: Lawrence Erlbaum.
- Mel’čuk, I. A. (1996). Lexical Functions: A Tool for the Description of Lexical Relations in a Lexicon. In L. Wanner (ed.), Lexical Functions in Lexicography and Natural Language Processing, 37–113. Amsterdam/Philadelphia: John Benjamins.
- PARSEME shared task 1.1 annotation guidelines (last updated on November 30, 2017), § 5.2: link.
- Polguère, A. (2012). Propriétés sémantiques et combinatoires des quasi-prédicats sémantiques. Scolia, 26, 131–152.