PhD. Dissertation

General info
Abstract
Contents
Publications
Download
More info

[Esta página en español]

Title

Tabular Interpretation of Automata for Tree Adjoining Languages

Author

D. Miguel A. Alonso Pardo

Supervisors

Dr. Manuel Vilares Ferro (Universidade da Coruña, Spain)
Dr. Eric Villemonte de la Clergerie (INRIA, France)

Date

25 September 2000

Dissertation Committee

President: Dr. Josep Miró (Universitat de les Illes Balears, Spain)
Member: Dr. José Mira Mira (UNED, Spain)
Member: Dr. Pierre Boullier (INRIA, France)
Member: Dr. Mark-Jan Nederhof (DFKI, Germany)
Secretary: Dr. Antonio Blanco Ferro (Universidade da Coruña, Spain)

Grade

Sobresaliente Cum Laude

Abstract

Tree adjoining grammars are an extension of context-free grammars that use trees instead of productions as the primary representing structure and that are considered to be adequate to describe most of syntactic phenomena occurring in natural languages. These grammars generate the class of tree adjoining languages, which is equivalent to the class of languages generated by linear indexed grammars and other mildly context-sensitive formalisms.

In the first part of this dissertation, we introduce the problem of parsing tree adjoining grammars and linear indexed grammars, creating, for both formalisms, a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum.

In the second part, we define several models of automata that accept the class of tree adjoining languages, proposing techniques for their efficient execution. The use of automata for parsing is interesting because they allow us to separate the problem of the definition of parsing algorithms from the problem of their execution. We have considered the following types of automata:

Top-down and bottom-up embedded push-down automata, two extensions of push-down automata working on nested stacks. A new definition is provided in which the finite-state control has been eliminated and several kinds of normalized transition have been defined, preserving the equivalence with tree adjoining languages.
Logical push-down automata restricted to the case of tree adjoining languages. Depending on the set of allowed transitions, we obtain three different types of automata.
Linear indexed automata, left-oriented and right-oriented to describe parsing strategies in which adjuntions are recognized top-down and bottom-up, respectively, and strongly-driven to define parsing strategies recognizing adjunctions top-down and/or bottom-up.
2-stack automata, an extension of push-down automata working on a pair of stacks, a master stack driving the parsing process and an auxiliary stack restricting the set of transitions that can be applied at a given moment. Strongly-driven 2-stack automata can be used to describe bottom-up, top-down or mixed parsing strategies for tree adjoining languages with respect to the recognition of the adjunctions. Bottom-up 2-stack automata are specifically designed for parsing strategies recognizing adjunctions bottom-up.

Compilation schemata for these models of automata have been defined. A compilation schema allow us to obtain the set of transitions corresponding to the implementation of a parsing strategy for a given grammar.

All the presented automata can be executed in polynomial time with respect to the length of the input string by applying tabulation techniques. A tabular technique makes possible to interpret an automaton by means of the manipulation of collapsed representation of configurations (called items) instead of actual configurations. Items are stored into a table in order to be reused, avoiding redundant computations.

Finally, we have studied the relations among the different classes of automata, the main difference being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that can be implemented, we can distinguish three kinds of automata: bottom-up automata, including bottom-up embedded push-down automata, bottom-up restricted logic push-down automata, right-oriented linear indexed automata and bottom-up 2-stack automata; top-down automata, including (top-down) embedded push-down automata, top-down restricted logic push-down automata and left-oriented linear indexed automata; and general automata, including strongly-driven linear indexed automata and strongly-driven 2-stack automata.

The dissertation is structured in three parts. The first part is devoted to introduce Tree Adjoining Languages (TAL) and the parsing techniques available for them, focusing on Tree Adjoining Grammars (TAG) and Linear Indexed Grammars (LIG). The second part defines several classes of automata for tree adjoining languages and the tabular techniques that can applied in order to get polynomial execution time. The third part is formed by a set of appendices.

Chapter 1. Introduction

Part I. Tree Adjoining Languages

Chapter 2. Tree Adjoining Languages

We present this class of languages, focusing on TAG and LIG. Other grammar formalisms also generating TAL are presented briefly.

Chapter 3. Parsing Algorithms for TAG

We describe several tabular algorithms for Tree Adjoining Grammar parsing, creating a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum.

Chapter 4. Parsing Algorithms for LIG

We develop a set of new tabular parsing algorithms for Linear Indexed Grammars, including bottom-up algorithms and Earley-like algorithms with and without the valid prefix property, creating a continuum in which one algorithm can in turn be derived from another. The output of these algorithms is a shared forest in the form of a context-free grammar that encodes all possible derivations for a given input string.

Part II. Models of Automata for Tree Adjoining Languages

Chapter 5. Push-Down Automata

We present push-down automata (PDA) as an operational mechanism for context-free grammars (CFG), showing compilation schemata from CFG into PDA and the tabulation techniques proposed by Lang (Based on PUSH transitions) and Nederhof (based on SWAP transitions) for this class of automata.

Chapter 6. Embedded Push-Down Automata.

We present the classical definition of Embedded Push-Down Automata (EPDA) to continue with a redefinition of EPDA in which the finite-state control has been eliminated and several kinds of normalized transition have been defined. We also show that the new definition preserves the equivalence with tree adjoining languages and that tabulation techniques are possible to execute automata in polynomial time.

Chapter 7. Bottom-up Embedded Push-Down Automata

The class of Bottom-up Embedded Push-Down Automata (BEPDA) is the dual of EPDA. They have been informally present in several papers but a formal and consistent definition has not been published. We provide a formal definition of BEPDA and we show that finite-state control can be eliminated, obtaining a new definition in which transitions are in a form useful to describe compilation schemata for TAG and suitable for tabulation.

Chapter 8. Restricted Logical Push-down Automata

Based on the close relationship between LIG and DCG, we try to restrict Logic Push-down Automata (LPDA) to make it adequate to describe parsing strategies TAL: we have described parsing strategies for LIG and TAG into LPDA, we have identified the types of transition involved and we have designed tabulation techniques based on those types of transitions.

Chapter 9. Linear Indexed Automata

This class of automata defined by Nederhof includes two classes of automata: Right-oriented Linear Indexed Automata (R-LIA) to describe parsing strategies for LIG in which the indices lists are recognized bottom-up and Left-oriented Linear Indexed Automata (L-LIA) to describe parsing strategies for LIG in which the indices lists are recognized bottom-up. We show the equivalence of R-LIA with a subclass of the Restricted Logical Push-down Automata presented in chapter 8 and we restrict L-LIA transitions in order to be able to design a tabulation technique less complicated than the technique proposed by Nederhof. We also propose a third class of automata, Strongly-Driven Linear Indexed Automata (SD-LIA), which can be used to describe bottom-up, top-down or mixed parsing strategies for LIG with respect to the recognition of the indices lists. SD-LIA seems to be equivalent to the recently presented Universal-Linear Indexed Automata (U-LIA).

Chapter 10. Two Stack Automata

Push-down automata working on two stacks has been identified as an operational device for parsing TAL. However, it is known that unrestricted use of two stacks give us the power of a Turing Machine. The remedy is to consider asymmetric stacks, one being the Master Stack where most of the work is done and the other being the Auxiliary Stack mainly used for restricted bookkeeping. The resulting Strongly-driven 2-Stack Automata (SD-2SA) can be used to describe bottom-up, top-down or mixed parsing strategies for TAG and LIG with respect to the recognition of the adjunctions or indices lists. A tabular technique is designed to allow SD-2SA be executed in polynomial time. We also define a version of SD-2SA specifically designed for bottom-up parsing strategies: Bottom-up 2-Stack Automata (BU-2SA).

Chapter 11. Putting all together

There exists a close relation among the different classes of automata for TAL, the main difference being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that cen be implemented, we can distinguish three kinds of automata:

Bottom-up automata: BEPDA, R-LIA, Bottom-up Restricted LPDA and BU-2SA.
Top-down Automata: EPDA, L-LIA and Top-down Restricted LPDA.
General automata: SD-LIA and SD-2SA.

Chapter 11. Conclusion.

Part III. Appendices

Appendix A. Parsing Schemata

A brief introduction to Parsing Schemata, the framework used to describe the parsing algorithms in chapters 3 and 4.

Appendix B. Parsing Algorithms for Context-Free Grammars

A brief introduction to CYK and Earley's algorithm for context-free grammars. These algorithms are the base of the algorithms for TAG and LIG shown in chapters 3 and 4.

Appendix C. Generalized LR Parsing

We show how LR parsers for the analysis of arbitrary context-free grammars can be derived from classical Earley's parsing algorithm. The result is a Generalized LR parsing algorithm working at complexity O(n³) in the worst case, which is achieved by the use of tabulation to represent the non-deterministic evolution of the stack instead of graph-structured stack representations, as has often been the case in previous approaches. Then, this algorithms is extended to deal with Definite Clause Grammars and Linear Indexed Grammars.

Publications

Most of chapters in this dissertation has been the origin of articles, book chapters and conference papers, which can be used by people not speaking Spanish to understand the ideas presented in the dissertation. The following is a non-exhaustive list of such publications.

Chapter 3.

Miguel A. Alonso Pardo, David Cabrero Souto, Eric de la Clergerie, and Manuel Vilares Ferro. Tabular algorithms for TAG parsing. In Proc. of EACL'99, Ninth Conference of the European Chapter of the Association for Computational Linguistics, pages 150-157, Bergen, Norway, June 1999. ACL. [AloCabCleVil99a.ps.gz, 83 K] [AloCabCleVil99a.pdf, 83 K]

Chapter 4.

Miguel A. Alonso Pardo, Eric de la Clergerie, Jorge Graña Gil, and Manuel Vilares Ferro. New tabular algorithms for LIG parsing. In Proc. of the Sixth International Workshop on Parsing Technologies (IWPT 2000), pages 29-40, Trento, Italy, February 2000. ACL/SIGPARSE. [AloCleGraVil2000a.ps.gz, 95 K] [AloCleGraVil2000a.pdf, 95 K]

Chapter 6.

Miguel A. Alonso Pardo, Eric de la Clergerie, and Manuel Vilares Ferro. A redefinition of Embedded Push-Down Automata. In Proc. of the 5th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+5), pages 19-26, Paris, France, May 2000. [AloCleVil2000a.ps.gz, 45 K] [AloCleVil2000a.pdf, 45 K]

Chapter 7.

Miguel A. Alonso Pardo, Eric de la Clergerie, and Manuel Vilares Ferro. A formal definition of Bottom-up Embedded Push-Down Automata and their tabulation technique. In P. de Groote, G. Morril and C. Retoré (eds.), Logical Aspects of Computational Linguistics, volume 2099 of Lecture Notes in Artificial Intelligence, pages 44-61, Springer-Verlag, Berlin-Heidelberg-New York, 2001. ISBN 3-540-42273-0. [AloCleVil2001a.ps.gz, 118 K]

AloCleVil2001a.pdf

Chapter 8.

Miguel A. Alonso Pardo, Eric de la Clergerie, and David Cabrero Souto. Tabulation of automata for tree adjoining languages. In Proc. of the Sixth Meeting on Mathematics of Language (MOL 6), pages 127-141, Orlando, Florida, USA, July 1999. [AloCleCab99a.ps.gz, 107 K] [AloCleCab99a.pdf, 107 K]

Chapter 9.

The ideas of this chapter are shown in a different (but equivalent) way in:

Miguel A. Alonso Pardo, Mark-Jan Nederhof, and Eric de la Clergerie. Tabulation of automata for tree adjoining languages. Grammars, 3(2/3):89-110, 2000. ISSN 1386-7393. [AloNedCle2000a.ps.gz] [AloNedCle2000a.pdf] [Offprints available on request]

Chapter 10.

Section 10.3:

Eric de la Clergerie and Miguel A. Alonso Pardo. A tabular interpretation of a class of 2-Stack Automata. In COLING-ACL'98, 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Proceedings of the Conference, volume II, pages 1333-1339, Montreal, Quebec, Canada, August1998. ACL. [CleAlo98a.ps.gz, 111 K] [CleAlo98a.pdf, 111 K]
Miguel A. Alonso Pardo, Djamé Seddah, and Eric de la Clergerie. Practical aspects in compiling tabular TAG parsers. In Proc. of 5th International Workshop on Tree Adjoining Grammars and Related Formalisms (TAG+5), pages 27-32, Paris, France, May2000. [AloSedCle2000a.ps.gz, 56 K] [AloSedCle2000a.pdf, 56 K]

Section 10.4:

Eric de la Clergerie, Miguel A. Alonso Pardo, and David Cabrero Souto. A tabular interpretation of bottom-up automata for TAG. In Proc. of Fourth International Workshop on Tree-Adjoining Grammars and Related Frameworks (TAG+4), pages 42-45, Philadelphia, PA, USA, August 1998. [CleAloCab8a.ps.gz, 82 K] [CleAloCab8a.pdf, 82 K]

Appendix C.

Sections C.1-C.6:

Miguel A. Alonso Pardo, David Cabrero Souto, and Manuel Vilares Ferro. Construction of efficient generalized LR parsers. In Derick Wood y Sheng Yu, editors, Automata Implementation, volume 1436 of Lecture Notes in Computer Science, pages 7-24. Springer-Verlag, Berlin-Heidelberg-Nueva York, 1998. [AloCabVil97c.ps.gz, 104 K] [AloCabVil97c.pdf, 104 K]

Section C.7:

Miguel A. Alonso Pardo, David Cabrero Souto, and Manuel Vilares Ferro. Generalized LR parsing for extensions of context-free grammars. In Nicolas Nicolov y Ruslan Mitkov, editors, Recent Advances in Natural Language Processing II, volume 189 of Current Issues in Linguistic Theory. John Benjamins Publishing Company, Amsterdam & Philadelphia, 2000. [AloCabVil97b.ps.gz] [AloCabVil97b.pdf]

Section C.8:

Miguel A. Alonso Pardo}, Eric de la Clergerie, and Manuel Vilares Ferro. Automata-based parsing in dynamic programming for Linear Indexed Grammars. In A. S. Narin'yani, editor, Proc. of DIALOGUE'97 Computational Linguistics and its Applications International Workshop, pages 22-27, Moscow, Russia, June 1997. [AloCleVil97a.ps.gz, 71 K] [AloCleVil97a.pdf, 71 K]

Download

Note: the dissertation is written in Spanish. If you are not fluent in Spanish, you can read some of the publications originated from the thesis.

Full version [Alo2000a.ps.gz, 1312 K] [Alo2000a.pdf, 2518 K]
Cover, acknowledgments, abstract, contents, list of figures and list of tables [Alo2000a_00.ps.gz, 205 K] [Alo2000a_00.pdf, 127 K]
Chapter 1 [Alo2000a_01.ps.gz, 192 K] [Alo2000a_01.pdf, 103 K]
Chapter 2 [Alo2000a_02.ps.gz, 261 K] [Alo2000a_02.pdf, 283 K]
Chapter 3 [Alo2000a_03.ps.gz, 337 K] [Alo2000a_03.pdf, 459 K]
Chapter 4 [Alo2000a_04.ps.gz, 236 K] [Alo2000a_04.pdf, 247 K]
Chapter 5 [Alo2000a_05.ps.gz, 192 K] [Alo2000a_05.pdf, 97 K]
Chapter 6 [Alo2000a_06.ps.gz, 280 K] [Alo2000a_06.pdf, 313 K]
Chapter 7 [Alo2000a_07.ps.gz, 259 K] [Alo2000a_07.pdf, 283 K]
Chapter 8 [Alo2000a_08.ps.gz, 324 K] [Alo2000a_08.pdf, 392 K]
Chapter 9 [Alo2000a_09.ps.gz, 295 K] [Alo2000a_09.pdf, 322 K]
Chapter 10 [Alo2000a_10.ps.gz, 306 K] [Alo2000a_10.pdf, 337 K]
Chapter 11 [Alo2000a_11.ps.gz, 194 K] [Alo2000a_11.pdf, 92 K]
Chapter 12 [Alo2000a_12.ps.gz, 182 K] [Alo2000a_12.pdf, 39 K]
Appendix A [Alo2000a_A.ps.gz, 188 K] [Alo2000a_A.pdf, 100 K]
Appendix B [Alo2000a_B.ps.gz, 181 K] [Alo2000a_B.pdf, 65 K]
Appendix C [Alo2000a_C.ps.gz, 242 K] [Alo2000a_C.pdf, 311 K]
Bibliography and index [Alo2000a_ZZ.ps.gz, 220 K] [Alo2000a_ZZ.pdf, 164 K]

More info

Comments and suggestions about this dissertation are welcome. You can contact the author in the address

            Miguel A. Alonso Pardo
            Departamento de Computación
            Facultad de Informática
            Campus de Elviña s/n
            15071 La Coruña (SPAIN)

or by an e-mail addressed to alonso@dc.fi.udc.es.

The web pages of the author contain additional information about the dissertation and related work. The URL is http://www.dc.fi.udc.es/~alonso/

Miguel Angel Alonso Pardo / alonso@dc.fi.udc.es

Last modified: Tue Nov 28 12:32:58 CET 2006