Annotation Guidelines

This page includes the annotation guidelines referred in the paper Pay attention when you pay the bills. A multilingual corpus with dependency-based and semantic annotation of collocations, presented at The 57th Annual Meeting of the Association for Computational Linguistics (ACL 2019).

1. Introduction

This is a proposal for a collocation annotation guide within the frame of the project Automatic Extraction of Multilingual Collocation Equivalents, working with collocations in Portuguese (Pt), Spanish (Es), and English (En). We conceive of collocations as defined in Mel’čuk (1995): a collocation is a combination of two lexical items (A and B) in which the meaning of B is empty (or redundant with respect to A, i.e. Pt. fazer um gol) or is expressed by B in the context of A (i.e. Es.: ojos castaños, lápiz *castaño).

In the case of light verb constructions we follow the guidelines of the PARSEME project. In the case of noun+adjective patterns we have annotated adjectives that perform the lexical functions (Mel’čuk, 1996) most conspicuously associated to this category. For the description of predicative nouns, we have used a combination of Polguère (2012) and PARSEME guidelines.

2. Verb+noun collocations

2.1 LVC.full and LVC.cause collocations

2.1.1. Light verb constructions generally are combinations of verb plus object combinations in which the meaning of the verb is redundant with that of the noun or adds the sense of causality (some verb+prepositional objects are also considered LVC: e.g., put into contact with, but for this project we limit ourselves to direct objects). The relevant dependency in Universal Dependencies (UD) is object (obj).

Examples:

2.1.2. The noun must be predicative, i.e. it has to denote a state or an event. Furthermore, it must have semantic arguments (e.g. Maria’sx fear of spidersy, or Charlesx walk through the parky.).

2.1.3. Light verbs must not add a meaning to what is expressed by the predicative noun. This can be proven in nominalizations: e.g. Anne took a walkAnne’s walk (cf. ??Anne’s taking of a walk). Causative verbs add a sense of causation (Lack of sleep gives [~ ‘causes’] me headaches).

PARSEME guidelines distinguishes two types of annotation: LVC.full is the label used for light pure verbs and LVC.cause is the label used for causative verbs. The criterion to distinguish both in PARSEME is not the presence of the meaning ‘cause’, but the fact that causative verbs add a new actant encoded as the verb syntactic subject:

According to this, those constructions that are normally considered causative (i.e.: dar miedo ‘cause fear’ in Es.) must be annotated as support verb or LVC.full (medo de X a Y ‘X’s fear of Y’ → Y dá medo a X ‘Y causes fear to X’, in Pt.).

PARESEME’s guidelines offer a set of formal tests to identify light verb and causative constructions (link).

2.2. Noun-verb collocations other than LVC

2.2.1. Collocations in which the verb conveys the meaning ‘cause to end’ and the noun is predicative, like those of 2.1. E.g.: quench thirst.

2.2.2. Idiomatic collocations: A criterion to identify this kind of collocations is the lack of congruent translations in other languages. E.g: Pt. aprovar um lei → En. pass a law.

3. Noun+adjective collocations.

The criteria to identify noun+adjective collocations are essentially semantic and are based on Mel’čuk’s lexical functions:

3.1. Adjectives expressing intensification and/or attenuation (Magn, AntiMagn):

NB: We do not annotate combinations in which the adjective expresses the size of a physical object, rather than intensification: big table.

3.2. Adjectives expressing a positive/negative evaluation on the part of the speaker (Bon, AntiBon):

3.3. Adjectives expressing a positive/negative evaluation on the part of one of the noun semantic arguments (Pos, AntiPos):

3.4. Adjectives expressing the sense ‘proper, as it should be’ (Ver, AntiVer):

3.5. Adjectives expressing a specific meaning only in the context of the noun (Non-Standard):

3.6. Adjectives expressing the sense of ‘intensification+quantification’. or its opposite (Magn_quant, Anti.Magn_quant).

3.7. Adjectives combining the sense ‘intensification’ and ‘time’, and its opposite:

4. Noun+noun collocations

4.1. Nouns that convey the sense of ‘head of’ (LF Cap):

4.2. Nouns expressing the sense ‘a unit of’ (Sing) or ‘a set of’ (Mult):

4.3. Nouns expressing the sense of ‘centre of’ (Centr):

4.4. Nouns conveying the sense ‘culmination of’ (Culm):

4.5. Nouns expressing a generic concept that encompass the sense of the second noun (Gener):

4.6. Nouns conveying the sense ‘inception’ (Germ):

NB: we only annotate this kind of collocations if head=collocate, and dependent=base.

References