AspCcgTk: Towards Syntactic Parsing with Semantic Disambiguation by Means of Declarative Programming

By

Yuliya Lierler  University of Nebraska at Omaha
Peter Schüller Marmara University

Natural language expressions are often ambiguous, allowing multiple interpretations. For example, the sentence – I eat spaghetti with chopsticks – is syntactically ambiguous. In one syntactic interpretation, the prepositional phrase “with chopsticks” modifies the verbal phrase “eat spaghetti” (which suggests that chopsticks serve as a tool for eating). In another interpretation, “with chopsticks” modifies the noun phrase “spaghetti” (which suggests that chopsticks are part of what is being eaten). The semantic constraints that arise from meanings of individual words in this sentence allow us to disambiguate the sentence and consider its former syntactic interpretation as the correct one. In this note we describe an approach that integrates syntactic analysis with semantic constraints in a system called AspCcgTk. This system is based on Answer Set Programming — a popular declarative constraint programming paradigm.

Typical natural language processing systems consist of several components including syntactic and semantic analyzers that are organized in a pipeline.

Y0

A common assumption in such a design is that these components are separate and independent: input is processed in a certain order that enriches a representation of the given text in an additive (monotonic) way. Recall the sentence

I eat spaghetti with chopsticks  (1)

Its verb phrase allows for two syntactic structures:

Y1

The sentence

I eat spaghetti with meatballs   (3)

is syntactically ambiguous in a similar manner. In order to assign the proper syntactic structure to each of these sentences one has to take into account selectional restrictions, i.e., the semantic restrictions that a word imposes on the environment in which it occurs. For instance, in (1) the fact that a chopstick is an instrument suggests that “with chopsticks” modifies “eat spaghetti” as a tool for eating. Thus, an approach that integrates syntactic and semantic processing is essential for proper analysis of such sentences. Common statistical methods take into account selectional restrictions implicitly by assigning most probable syntactic structure based on observed co-occurrences of words and structures in corpora. Yet, this is often not sufficient: for example advanced parsers, including those from Stanford [12] and Berkeley [10], favor the same structure for both sentences (1) and (3).

The AspCcgTk system is an English language parser based on answer set programming [3] that provides a framework for an integration of syntax and semantic analysis. The AspCcgTk system consists of four main components:

  • a logic program aspccg.lp that captures the parsing problem so that extending aspccg.lp by a given sentence encoded as a logic program results in a program whose “answer sets” (solutions) represent syntactic parses of the sentence,
  • grounder gringo [4] – a popular front-end of answer set solvers,
  • state-of-the-art answer set solver clasp [5], and
  • IDPDraw [13] – a system that is used to produce a two-dimensional image for each found parse tree.

Given an input of n words, AspCcgTk instantiates the Cocke-Younger-Kasami (CYK) chart [6] using the gringo system. The CYK chart is a n × n triangular grid. Answer set solver clasp generates parse trees using the content of this grid. Although this answer set programming realization of CYK parsing does not outperform dedicated CYK implementations, it has the advantage of declarativity and reaches sufficient performance for being useful as a natural language processing tool [11]. For example, the AspCcgTk system is used as part of the NL2KR natural language processing system [2].

The declarativity of the AspCcgTk approach paves the way to flexible and intuitive extensions of the parser with other solution-generating and solution-constraining logic programming modules. [8] take advantage of this observation. In particular, they illustrate how semantic information collected in framenet [1] helps to disambiguate syntactic structures for sentences (1) and (3) within AspCcgTk. For instance, the frame food of framenet corresponds to the word “spaghetti”. This frame contains information that food only takes other food as constituents. Thus modifying “spaghetti” with “chopsticks” in a parse tree for (1) yields a forbidden situation. Answer set programming provides convenient, flexible means for formulating such “forbidden situations”. Given framenet-based information about the verb “eat”, AspCcgTk is able to find proper syntactic structures for sentences (1) and (3). This case study suggests that AspCcgTk may serve as a framework for integral syntactic and semantic processing as in the following figure.

Y2

Although the AspCcgTk system gives a promise to advance parsing accuracy, its effectiveness relies on availability of structured knowledge about natural language words and concepts. It remains to be seen whether an automatic interface between framenet and AspCcgTk can be established and to what extent such an interface improves wide coverage parsing accuracy. Moreover, besides framenet there are other lexical databases such as verbnet [7] and probank [9] which may serve as further resources for semantic disambiguation of natural language structures.

References

[1]    Colin F. Baker, Charles J. Fillmore, and John B. Lowe. The Berkeley FrameNet Project. In International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (ACL/COLING), pages 86–90, San Mateo, CA, 1998.

[2]    Chitta Baral, Juraj Dzifcak, Kanchan Kumbhare, and Nguyen H Vo. The NL2KR system. In Workshop on Natural Language Processing and Automated Reasoning (NLPAR), pages 37–47, 2013.

[3]    Gerhard Brewka, Ilkka Niemelä, and Mirosław Truszczyński. Answer set programming at a glance. Communications of the ACM, 54(12):92–103, 2011.

[4]    Martin Gebser, Roland Kaminski, Benjamin Kaufmann, Max Ostrowski, Torsten Schaub, and Sven Thiele. A user’s guide to gringo, clasp, clingo, and iclingo. http://sourceforge.net/projects/potassco/files/potassco_guide/2010-10-04/guide.pdf, 2010.

[5]    Martin Gebser, Benjamin Kaufmann, and Torsten Schaub. Conflict-driven answer set solving: From theory to practice. Artificial Intelligence, 187:52–89, 2012.

[6]    T. Kasami. An efficient recognition and syntax analysis algorithm for context-free languages. Technical Report AFCRL-65-758, Air Force Cambridge Research Laboratory, Bedford, Massachusetts, 1965.

[7]    Karin Kipper-Schuler. VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. PhD thesis, University of Pennsylvania, 2005.

[8]    Yuliya Lierler and Peter Schüller. Towards a tight integration of syntactic parsing with semantic disambiguation by means of declarative programming. In International Conference on Computational Semantics (IWCS), 2013.

[9]    Martha Palmer, Daniel Gildea, and Paul Kingsbury. The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1):71–106, March 2005.

[10]    Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (ACL/COLING), pages 433–440, July 2006.

[11]    Peter Schüller. Flexible combinatory categorial grammar parsing using the CYK algorithm and answer set programming. In International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR), pages 499–511, 2013.

[12]    Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. Parsing with compositional vector grammars. In Annual Meeting of the Association for Computational Linguistics (ACL), 2013.

[13]    Johan Wittocx. IDPDraw, 2009. Katholieke Universiteit Leuven, http://dtai.cs.kuleuven.be/krr/software/download.