PhD. Dissertation

General info
Abstract
Contents
Publications
Download
More info

[Esta página en español]


Title

Tabular Interpretation of Automata for Tree Adjoining Languages

Author

D. Miguel A. Alonso Pardo

Supervisors

Date

25 September 2000

Dissertation Committee

Grade

Sobresaliente Cum Laude


Abstract

Tree adjoining grammars are an extension of context-free grammars that use trees instead of productions as the primary representing structure and that are considered to be adequate to describe most of syntactic phenomena occurring in natural languages. These grammars generate the class of tree adjoining languages, which is equivalent to the class of languages generated by linear indexed grammars and other mildly context-sensitive formalisms.

In the first part of this dissertation, we introduce the problem of parsing tree adjoining grammars and linear indexed grammars, creating, for both formalisms, a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum.

In the second part, we define several models of automata that accept the class of tree adjoining languages, proposing techniques for their efficient execution. The use of automata for parsing is interesting because they allow us to separate the problem of the definition of parsing algorithms from the problem of their execution. We have considered the following types of automata:

Compilation schemata for these models of automata have been defined. A compilation schema allow us to obtain the set of transitions corresponding to the implementation of a parsing strategy for a given grammar.

All the presented automata can be executed in polynomial time with respect to the length of the input string by applying tabulation techniques. A tabular technique makes possible to interpret an automaton by means of the manipulation of collapsed representation of configurations (called items) instead of actual configurations. Items are stored into a table in order to be reused, avoiding redundant computations.

Finally, we have studied the relations among the different classes of automata, the main difference being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that can be implemented, we can distinguish three kinds of automata: bottom-up automata, including bottom-up embedded push-down automata, bottom-up restricted logic push-down automata, right-oriented linear indexed automata and bottom-up 2-stack automata; top-down automata, including (top-down) embedded push-down automata, top-down restricted logic push-down automata and left-oriented linear indexed automata; and general automata, including strongly-driven linear indexed automata and strongly-driven 2-stack automata.


Contents

The dissertation is structured in three parts. The first part is devoted to introduce Tree Adjoining Languages (TAL) and the parsing techniques available for them, focusing on Tree Adjoining Grammars (TAG) and Linear Indexed Grammars (LIG). The second part defines several classes of automata for tree adjoining languages and the tabular techniques that can applied in order to get polynomial execution time. The third part is formed by a set of appendices.

Chapter 1. Introduction

Part I. Tree Adjoining Languages

Chapter 2. Tree Adjoining Languages

We present this class of languages, focusing on TAG and LIG. Other grammar formalisms also generating TAL are presented briefly.

Chapter 3. Parsing Algorithms for TAG

We describe several tabular algorithms for Tree Adjoining Grammar parsing, creating a continuum from simple pure bottom-up algorithms to complex predictive algorithms and showing what transformations must be applied to each one in order to obtain the next one in the continuum.

Chapter 4. Parsing Algorithms for LIG

We develop a set of new tabular parsing algorithms for Linear Indexed Grammars, including bottom-up algorithms and Earley-like algorithms with and without the valid prefix property, creating a continuum in which one algorithm can in turn be derived from another. The output of these algorithms is a shared forest in the form of a context-free grammar that encodes all possible derivations for a given input string.

Part II. Models of Automata for Tree Adjoining Languages

Chapter 5. Push-Down Automata

We present push-down automata (PDA) as an operational mechanism for context-free grammars (CFG), showing compilation schemata from CFG into PDA and the tabulation techniques proposed by Lang (Based on PUSH transitions) and Nederhof (based on SWAP transitions) for this class of automata.

Chapter 6. Embedded Push-Down Automata.

We present the classical definition of Embedded Push-Down Automata (EPDA) to continue with a redefinition of EPDA in which the finite-state control has been eliminated and several kinds of normalized transition have been defined. We also show that the new definition preserves the equivalence with tree adjoining languages and that tabulation techniques are possible to execute automata in polynomial time.

Chapter 7. Bottom-up Embedded Push-Down Automata

The class of Bottom-up Embedded Push-Down Automata (BEPDA) is the dual of EPDA. They have been informally present in several papers but a formal and consistent definition has not been published. We provide a formal definition of BEPDA and we show that finite-state control can be eliminated, obtaining a new definition in which transitions are in a form useful to describe compilation schemata for TAG and suitable for tabulation.

Chapter 8. Restricted Logical Push-down Automata

Based on the close relationship between LIG and DCG, we try to restrict Logic Push-down Automata (LPDA) to make it adequate to describe parsing strategies TAL: we have described parsing strategies for LIG and TAG into LPDA, we have identified the types of transition involved and we have designed tabulation techniques based on those types of transitions.

Chapter 9. Linear Indexed Automata

This class of automata defined by Nederhof includes two classes of automata: Right-oriented Linear Indexed Automata (R-LIA) to describe parsing strategies for LIG in which the indices lists are recognized bottom-up and Left-oriented Linear Indexed Automata (L-LIA) to describe parsing strategies for LIG in which the indices lists are recognized bottom-up. We show the equivalence of R-LIA with a subclass of the Restricted Logical Push-down Automata presented in chapter 8 and we restrict L-LIA transitions in order to be able to design a tabulation technique less complicated than the technique proposed by Nederhof. We also propose a third class of automata, Strongly-Driven Linear Indexed Automata (SD-LIA), which can be used to describe bottom-up, top-down or mixed parsing strategies for LIG with respect to the recognition of the indices lists. SD-LIA seems to be equivalent to the recently presented Universal-Linear Indexed Automata (U-LIA).

Chapter 10. Two Stack Automata

Push-down automata working on two stacks has been identified as an operational device for parsing TAL. However, it is known that unrestricted use of two stacks give us the power of a Turing Machine. The remedy is to consider asymmetric stacks, one being the Master Stack where most of the work is done and the other being the Auxiliary Stack mainly used for restricted bookkeeping. The resulting Strongly-driven 2-Stack Automata (SD-2SA) can be used to describe bottom-up, top-down or mixed parsing strategies for TAG and LIG with respect to the recognition of the adjunctions or indices lists. A tabular technique is designed to allow SD-2SA be executed in polynomial time. We also define a version of SD-2SA specifically designed for bottom-up parsing strategies: Bottom-up 2-Stack Automata (BU-2SA).

Chapter 11. Putting all together

There exists a close relation among the different classes of automata for TAL, the main difference being the storage structure used: embedded stacks, indices lists or coupled stacks. According to the strategies that cen be implemented, we can distinguish three kinds of automata:

Chapter 11. Conclusion.

Part III. Appendices

Appendix A. Parsing Schemata

A brief introduction to Parsing Schemata, the framework used to describe the parsing algorithms in chapters 3 and 4.

Appendix B. Parsing Algorithms for Context-Free Grammars

A brief introduction to CYK and Earley's algorithm for context-free grammars. These algorithms are the base of the algorithms for TAG and LIG shown in chapters 3 and 4.

Appendix C. Generalized LR Parsing

We show how LR parsers for the analysis of arbitrary context-free grammars can be derived from classical Earley's parsing algorithm. The result is a Generalized LR parsing algorithm working at complexity O(n3) in the worst case, which is achieved by the use of tabulation to represent the non-deterministic evolution of the stack instead of graph-structured stack representations, as has often been the case in previous approaches. Then, this algorithms is extended to deal with Definite Clause Grammars and Linear Indexed Grammars.


Publications

Most of chapters in this dissertation has been the origin of articles, book chapters and conference papers, which can be used by people not speaking Spanish to understand the ideas presented in the dissertation. The following is a non-exhaustive list of such publications.

Chapter 3.

Chapter 4.

Chapter 6.

Chapter 7.

Chapter 8.

Chapter 9.

The ideas of this chapter are shown in a different (but equivalent) way in:

Chapter 10.

Appendix C.


Download

Note: the dissertation is written in Spanish. If you are not fluent in Spanish, you can read some of the publications originated from the thesis.


More info

Comments and suggestions about this dissertation are welcome. You can contact the author in the address

            Miguel A. Alonso Pardo
            Departamento de Computación
            Facultad de Informática
            Campus de Elviña s/n
            15071 La Coruña (SPAIN)
      

or by an e-mail addressed to alonso@dc.fi.udc.es.

The web pages of the author contain additional information about the dissertation and related work. The URL is http://www.dc.fi.udc.es/~alonso/


Miguel Angel Alonso Pardo / alonso@dc.fi.udc.es
Last modified: Tue Nov 28 12:32:58 CET 2006