Natural Languages

Facultade de Informática da Coruña
Computer Science Engineering
2011-2012

Outline:

Faculty
Rooms and Timeline
Programmme
Bibliography
Student time
Practical works
Evaluation
Links of interest

REMARK: there exists a official page of the course in the web site of the Faculty.

IMPORTANT REMARK: Information provided in this pages does not substitute the official information published in official media.

Faculty

Rooms and Timeline

Theory: the first half of the term, Room 2.6, Monday 16:30-18:30 and Friday 15:30-17:30
Practical works: the second half of the term, Lab. 1.3, Friday 15:30-17:30

Programme

Introduction

Levels of analysis
Ambiguity

Linguistic Resources

Tag-sets
Dictionaries
Tagged texts
Tree-banks

Lexical Analysis

Text segmentation
Flexive and derivative morphology
Modelizing large dictionaries
Numbered acyclic deterministic finite-state automata
Finite-state transducers and two-level morphology

Tagging

Hidden Markov Models
Efficient execution of Hidden Markov Models
Smoothing techniques
Dealing with unknown words
Transformation-based and error-driven tag learning

Context-free parsing

Parsing schemata
Bottom-up parsing
Earley's parser
Push-down automata and dynamic programming
Generalized LR parsers
Shared forest
Probabilistic parsing

Parsing of mildly context-sensitive languages

Tree adjoining grammars
Parsing tree adjoining grammars
Automata for parsing tree adjoining grammars
Derivation trees
Probabilistic Representación compartida de los árboles de derivación

Semantic analysis

Feature structures and unification-based formalisms
Lexical relations: WordNet and EuroWordNet

Information Retrieval (IR)

Basic concepts
Retrieval models: boolean, vector and probabilistic
Indexing and retrieval
Evaluation of IR systems
Wen IR. A case in point: Google
Applications of natural language processing to IR: linguistic variation

Information Extraction (IE)

Basic concepts
Arquitecture of an IE system
IE tasks
Evaluation of IE systema
Examples of IE sytems: FASTUS and others

Question Answering (QA)

Basic conceptos
QA vs. IR/IE
Arquitecture of a QA syetem
Question processing
Retrieving and selectinf documents/passages
Answer extraction
Evaluaction of QA systems

Machine Translation (MT)

Basic concepts and open issues
"Classic" approaches
Statistical approaches
Applications in multilingual IR

Basic Bibliography

Daniel Jurafsky y James H. Martin, Speech and Language Processing. Second Edition, Pearson Education, Upper Saddle River, New Jersey, 2009.
Christopher D. Manning and Hinrich Schütze, Foundations of Statistical Natural Language Processing, The MIT Press, Cambridge (Massachusetts) and London (England), 1999.
ChristopherD. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval, Cambridge University Press, Cambridge, 2008.

Additional Bibliography:

In shelves I28 of the librtary you can found a lot of books on Natural Language Processing. We strongly recommend to visit that part of the library.

Ricardo Baeza-Yates and Berthier Ribeiro-Neto, Modern Information Retrieval, Addison Wesley and ACM Press, Harlow, England, 1999.
Marie-Francine Moens, Information Extraction: Algorithms and Prospects in a Retrieval Context, Springer, Dordrecht, 2006.
Klaas Sikkel, Parsing Schemata - A Framework for Specification and Analysis of Parsing Algorithms, Texts in Theoretical Computer Science - An EATCS Series. Springer-Verlag, Berlin/Heidelberg/New York, 1997 (a former version of this book is available at ftp://ftp.cs.utwente.nl/pub/doc/Parlevink/PhD/Sikkel/).
Robert Dale, Hermann Moisi and Harold Somers (editores), Handbook of Natural Language Processing, Marcel Dekker, Inc., New York and Basel, 2000.
James Allen, Natural Language Understanding, The Benjamin/Cummings Publishing Company, Inc., Redwood City, CA, USA, second edition, 1995.

Slides:

Lecture on linguistics resources
Lectures on lexical analysis

Lectures on tagging

HMM
Brill

Lecture on context-free parsing
Lecture on Mildly Context-Sensitive parsing
Lecture about unification-based grammars

Example

Lecture about semantic representation and analysis

Example

Lecture about shallow parsing

On-line demos:

Freeling 2.1 demo (including Spanish and Galician)
Cognitive Computation Group (CCG) demo (Univ. of Illinois at Urbana-Champaign)
Memory-Based Shallow Parsing (MBSP) demo, Computational Linguistics and Psycholinguistics (CLiPS) Research Centre, University of Antwerp

Lecture about lexical semantics
Lecture about information retrieval

Tutorial about the probabilistic model: notes, slides
Example of index generation

Lecture about information extraction

FASTUS system: web, local (protected)

Lecture about question answering

On-line QA systems:

START (general purpose)
EAGLi (genomics)

Lecture about machine translation

Lecture notes:

Análisis léxico:

preprocesamiento.pdf (artículo conjunto del Prof. Jorge Graña, Fco. Mario Barcala y Jesús Vilares sobre la segmentación y el preprocesamiento)
diccionario.pdf (material preparado por el Prof. Jorge Graña sobre la implemenatción eficiente de grandes diccionarios)

Etiquetación:

HMM.pdf (material preparado por el Prof. Jorge Graña sobre los modelos de Markov ocultos)
brill.pdf (material preparado por el Prof. Jorge Graña sobre aprendizaje de etiquetas basado en transformaciones y dirigido por el error)

Análisis sintáctico de gramáticas independientes del contexto:

parsing_schemata.pdf (material preparado por el Prof. Miguel A. Alonso sobre los esquemas de análisis sintáctico)
cfg_parsing.pdf (material preparado por el Prof. Miguel A. Alonso sobre los algoritmos CYK y Earley)
PDA.pdf (material preparado por el Prof. Miguel A. Alonso sobre la interpretación en programación dinámica de los autómatas a pila no deterministas)
PCFG.pdf (material preparado por el Prof. Jorge Graña para introducir el análisis sintáctico probabilístico)

Análisis sintáctico de gramáticas suavemente dependientes del contexto:

TAG.pdf (material preparado por el Prof. Miguel A. Alonso sobre las gramáticas de adjunción de árboles)
parsing_TAG.pdf (material preparado por el Prof. Miguel A. Alonso sobre el análisis sintáctico de las gramáticas de adjunción de árboles)
LIA.pdf (material preparado por el Prof. Miguel A. Alonso sobre los autómatas lineales de índices)

Análisis semántico:

feature_structures.pdf (material sobre estructuras de rasgos extraído del capítulo 7 del libro de Sikkel)
parsing_unification.pdf (material sobre análisis sintáctico de gramáticas de unificación extraído del capítulo 8 del libro de Sikkel)
wordnet.pdf (cinco artículos sobre WordNet)

Recuperación y extracción de información:

ir.pdf (Introducción a la recuperación de informació:n realizada por el Prof. jesús Vilares)
ir_pobabilistico.pdf (Introducción a los modelos probabilísticos de recuperación de informació:n realizada por el Prof. jesús Vilares)
slides_IR.pdf (transparencias del capítulo 15 del libro de Manning & Schütze)
pagerank.pdf (artículo de Page, Brin, Motwani & Winograd sobre el algoritmo PageRank usado por Google)
ie.pdf (tutorial de Appelt & Israel en IJCAI'99 sobre extracción de información)
agrep.pdf (technical Report de Wu y Manber sobre pattern matchning con errores)

Student time

See the web page of the Faculty

Practical works

Evaluation

Evaluation:
Final mark of the course is based on pratical works. Written examination is an option.