Speech and Language Processing

Main content

Speech and Language Processing

donderdag 25 januari 2007 19:12

An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition - Daniel Jurafsky & James H. Martin, Prentice Hall Press, 2000

I bought this book because I was looking for the following:

A good extended introduction in the field of computational linguistics, that covered specifically sentence parsing and semantic analysis.
A means to derive semantics from a syntax tree; I had no idea to go about doing this
How to place this semantic content into something like "working memory"
What is a good sentence parser that creates or selects the best alternative from a number of ambiguous syntax trees.

This book has really helped me with this. I now have the feeling that all concepts I need are in place, and for some of my questions actual algorithms are available.

The book covers Language Understanding, both from written and spoken text. It only hints at the process of Language Production, but at this point I was not looking for this information yet.

The algorithms given for sentence parsing are very close to those given in "Artificial Intelligence, a modern approach (= AIMA)" (from 1995), which is not surprising, since that book's authors are also editors of this book's series. However, the algorithm in the current book is called the Earley algorithm, and that name is hardly mentioned in the AIMA work. This puzzles me.

Disambiguation is the main theme in Natural Language Understanding, and the book provides techniques for disambiguation on all levels of perception, including at the syntax level. In this case the technique to use is using parts-of-speech probabilities.

Semantics are derived from a sentence using the same tree the syntax parser uses. The principle of semantic analysis is laid out quite clearly. However, an actual implementation will still require a lot of work from the implementer (except when semantic grammars are available publicly).

The book uses First Order Predicate Calculus as its meaning representation. I noticed however, that all predicates used had 1 or 2 arguments, which means that a semantic net implementation will probably work just as well, and does not require reification techniques.

The book is neatly organized in 4 sections: Words, Syntax, Semantics, Pragmatics. The pragmatics section goes a great length into explaining 'what to do with this acquired semantic knowledge'. And some principles of a 'conversational agent' , keeping a dialog,are laid down. There are no algorithms here, but the range of possible uses alone would make this impossible, I guess. Anyway, you really get a feel of what's possible, and what others have done.

90% of the book was perfectly well readable to me. Occasionally the book dives deeply into some obscure technique I don't intend to use, and I had to skip such parts. But its main achievements are that it has really given me a good grasp on all levels of the field of NLP (my knowledge was too fragmented before); and answered my burning questions that I mentioned above. The book is written in the same spirit as the AIMA book, it can even be seen as an elaboration on a few chapters of that book.

I knew I liked the book as soon as I opened it. It starts with a quote from HAL 9000 in 2001: A Space Odyssey :)

PS: check out this link for the upcoming second edition (July 2007?)