Building Natural Language Generation Systems - Ehud Reiter and Robert Dale
I want to write an agent that I can talk to and that can talk back to me. The talking back parts exists of turning some semantic representation (in predicate logic?) into a sentence that a human can read. Somewhere in the middle should be a syntax tree, an hierarchical representation of the sentence. That's all I knew. So there was a large gap between the stack of propositions and the surface level representation. And I had no clue as to where to start.
So I bought this book because I got the impression from Amazon reviews that it is the best book on NLG (natural language generation) available, even though it is ten years old. And I am glad I did. Because the book takes you through the jungle that is called Natural Language Processing and even tells you how to builld your house in it. I am calling it a jungle because there is way too much information available in this field. And the information does not seem to form a coherent body. There are many conceptual views and they all cover but part of the field. And I have no intention, nor the time, to understand everything that has been produced.
The authors of the book (BNLGS) understand this and they really do a great job of making this as simple as possible. The book is about document generation, not discourse planning (having a conversation with someone), and the main flow of document generation is as follows:
Document Planning -> Microplanning -> Surface Realization
Document Planning takes as input a communicative goal and delivers as output a document plan. A communicative goal is a simple statement of what the document is trying to achieve. A document plan is a tree whose branches are rhetorical relations (or discourse relations) and whose leaves are messages. The document planner creates the structure of the text as a whole. There is a predefined set of messages that can be produced. What these messages look like, is completely up to the application. There are no standards.
Microplanning takes this document plan as input and produces proto-phrase specifications. It does this by applying templates to the messages and then applying lexicalization, aggregation, and referring expression generation. A proto-phrase specification is not just a syntactic structure, it contains semantic information as well.
Surface Realization takes these proto-phrase specifications as input and produces a sentence as output. Existing surfice realizers take several types of proto-phrase specifications as input, but mainly these: lexicalized case frames, and abstract syntactic structures.
The only drawback of the book is that it completely skips the implementation of the surface realizer. The reason for this is that the authors claim that it is not smart to build one from scratch, because several advanced one exist, notably KPML, SURGE, and RealPro. I have to disagree with them, because the realizers have a "non-trivial learning curve" and require some conceptual preconceptions.
Natural Language Generation applications often use very specific types of grammars. The important ones are Systemic Functional Grammar (KPML, SURGE) and Functional Unification Grammar (RealPro). These are different from the ones oft used for Natural Language Understanding, because NLG is about choice management and NLU is about hypotheses management. There are many ways to express the same meaning and these grammars deal with the choices to be made better than, say, HPSG.
So what exactly was missing in my idea of language generation before I read this book? Well mainly that you can't go for a system that tries to generate just any sentence. Choose your domain and create some domain specific rules and structures. This is what keeps in manageable. And keep semantics involved as long as possible. After all, you are trying to get your meaning across.
In conclusion: I love this book. It gave me exactly what I needed, some structure in this complicated field, and it deals both with theory and its practical application. And very accessible too.
- 18-09-11 19:50 - Sense and reference in an NLP parser
- 06-09-11 20:17 - Semantic analysis: predicates and arguments