Skip links
Main content


vrijdag 10 augustus 2012 10:15

Historically, natural language processing programs are written in LISP and Prolog. LISP, because it is the AI language of a whole generation of AI students. Prolog, because it contains some built-in techniques (such as feature matching) that are handy for NLP. These programs often produce an output structure in the same language.

Now, LISP and Prolog are not mainstream languages that are commonly used in business environments. Therefore this list-oriented output needs to be mapped to another language first, before it can be used.

Another thing, again historically, it is normal for an NLP program to work with predicate logic. The output of this program contains universal and existensial quantifiers to express statements about objects, and the output-expressions need custom extensions to predicate logic, since PL is rather limited in this respect.

When you combine these things a sentence like "Two boys carried a dog" is represented like this:

    quant( set( R^I^[geq, I, 2]) ),
        [boy, B],
        quant(exists, E, [event, E], [carry, E, B, dog]) )

(from: The Core Language Engine)
Most developers I know would not be able to read this. It requires a programmer to learn a new language to be able to handle these types of structures. Many developers are not well versed in predicate logic. They may have seen it before, but it is not second nature, something they use every day.

I think NLP programs should work with objects. The result of a parse should be an object structure and you should be able to build sentences from object structures.

$Determiner = new Determiner();
$Boys = new Entity();
$Dog = new Entity();
$Relation = new Relation();
$Sentence = new Sentence();

This is a structure that developers work with all the time. They can use autocomplete in their IDE to find out the methods that a certain class provides and this flattens the learning curve.


Albert Gatt and Ehud Reither (one of the authors of Building Natural Language Generation Systems) have written a Java-based generation system that practises this approach to NLP: SimpleNLG. It is easy to use and appears to be robust. A source of inspiration!


« Terug