Skip links
Main content

Implementation of an Earley parser with unification

maandag 05 december 2011 20:50

I just completed a PHP implementation of the Earley parser, extended with unification, that is described in the book "Speech and Language Processing" by Jurafsky and Martin. I thought it might be a good idea to share my code since it took me quite some time and effort to create it and, what's more important, I could not find any other implementation of it on the internet.

The Earley algorithm is a top-down chart parser that parses natural language and does this very efficiently. Given a grammar that contains syntax rules and the words of a language, it parses a sentence into all possible syntax trees. The unification extension adds further restrictions to the syntax rules. An example of such a restriction is that the rule S => NP VP only applies if the number and person of the NP agrees with that of the VP. When NP is "The children", then the VP may be "are playing", but cannot be "is playing" (a restriction on number) or "am playing" (a restriction on person). The grammar should provide extra feature structures for the syntax rules. The unification extension of the Earley parser applies these structures and ensures that a sentence with a feature error will not be parsed.

I just provide the code to give you some idea and hints on to implement it. The code has not been tested very much since I just finished it. I place it under MIT license.

Earley parser source code

The algorithm may be found in the book "Speech and language Processing" (first edition), chapter 11, page 431. In the second edition of the book the algorithm may be found in chapter 15. The rest of this article presumes you have read it, since I cannot quickly repeat all the information given in chapter 11.

The book first describes the basic Earley algorithm, which can be found on Wikipedia. Some implementations are available there as well. It is described in chapter 10 of the book but can be hard to really understand it, because of the undescriptive use of variable names like A and B, and that it uses two functions that are not specified, NEXT-CAT and PARTS-OF-SPEECH.

Syntax rules and feature structures

For the basic algorithm I use syntax rules like the following S => NP VP

    array('cat' => 'S'),
    array('cat' => 'NP'),
    array('cat' => 'VP'),

The same rule, extended with feature structures looks like this:

    array('cat' => 'S', 'features' => array('head-1' => null)),
    array('cat' => 'NP', 'features' => array('head' => array('agreement-2' => null))),
    array('cat' => 'VP', 'features' => array('head-1' => array('agreement-2' => null))),

This implements the feature structure that is described on page 428 of the book:

Features of the implementation

The implementation of the algorithm handles unification for syntax rules like this,


as described on the bottom of page 406. The rule as two identical consequents. If the grammatical categories of the rule are implemented as keys in a hashtable (as they are in my code) then they need to be discerned. I do that by adding the position of the category in the rule.

The book says: VP => NP(1) NP(2)

in the implementation it looks like this: VP@0 => NP@1 NP@2 (but you don't need to think about this when you write the syntax rules)

It also deals with rules like this NP => NP PP where one consequent is identical to the antecedent.

Remarks on the algorithm

The meaning of the function "FOLLOW-PATH" is not described in the book, but I presume that it means that it returns a subset of its argument. It starts out with the label "cat" and follows it to the end.

The function "UNIFY-STATES" of the original algorithm performed a "FOLLOW-PATH" of "dag2". I think this is an error in the algorithm, because it would corrupt the feature set of the charted state.

My version of unifyStates is also somewhat different since I take into account that a rule may have the same category at different positions (as in the VP => NP NP example).

Labeled DAGs

Unifying DAGs is a very elegant to implement unification. I could not follow the explanation of the unification process as described on page 423, unfortunately. That's why I decided to create a different implementation of this function. I introduced a class to separate the logic of the datastructure: LabeledDAG, a labeled directed acyclic graph (or DAG). Which reminds me to thank my employer at Procurios, Bert Slagter, for inadvertently providing me with a perfect datastructure for this task.


I also added the possibility to stop parsing when the first complete sentence is found. For some applications you are only interested in a single interpretation; it might as well be the first one, especially if you succeed to order your syntax rules in such a way that the best interpretation is found first.


« Terug

Reacties op 'Implementation of an Earley parser with unification'

1 2 Laatste pagina
Geplaatst op: 17-10-2013 01:33 Quote
lv cyber Monday
Geplaatst op: 17-10-2013 17:42 Quote
333 Put the price of a $100 watch as many as $150 and shoppers will convert absent. If you are it correct the basic gladness of the
lv cyber Monday
Geplaatst op: 21-10-2013 05:40 Quote
Geplaatst op: 22-10-2013 11:46 Quote
louis vuitton handbags outlet
Geplaatst op: 24-10-2013 13:23 Quote
999 "I hope my daughter won't read this!" says the mom of Lily.. The Damier Canvas Speedy 30 is a really really popuar bag in Big apple .
louis vuitton handbags outlet
Geplaatst op: 26-10-2013 06:21 Quote
michael kors outlet handbags
Geplaatst op: 28-10-2013 16:11 Quote
3333 Marc has famously battled drug dependancy around the previous and has common therapy classes. Which is certainly why irrespective of the shaky financial local climate, Zara's strategy for globe domination just isn't faltering.
michael kors outlet handbags
Geplaatst op: 30-10-2013 11:48 Quote
Geplaatst op: 03-11-2013 22:25 Quote
Geplaatst op: 07-11-2013 01:58 Quote
Geplaatst op: 11-11-2013 07:04 Quote
Geplaatst op: 15-11-2013 11:56 Quote
cheap uggs online sale
Geplaatst op: 19-11-2013 07:54 Quote
999 The creation of this new antitheft not simply stunning but will also easy. She would not discuss carats or worth tags although demonstrating me a hoop comparable to your one particular Ponder obtained that was designed by Precision Set.
cheap uggs online sale
bobby pins ornament
Geplaatst op: 19-11-2013 08:09 Quote
999 The invention of this new antitheft not simply outstanding but also convenient. She would not speak carats or value tags even while demonstrating me a hoop very much the same towards the a particular Ponder bought which was intended by Precision Established.
bobby pins ornament
Geplaatst op: 21-11-2013 02:57 Quote
Geplaatst op: 25-11-2013 04:11 Quote
Geplaatst op: 29-11-2013 02:33 Quote
Geplaatst op: 01-12-2013 05:30 Quote
Geplaatst op: 03-12-2013 15:21 Quote
Geplaatst op: 05-12-2013 21:21 Quote
Geplaatst op: 06-12-2013 22:52 Quote
Geplaatst op: 11-12-2013 03:48 Quote
oakley sunglasses coupons
Geplaatst op: 25-02-2014 12:11 Quote
In closing, while most people will never own a Hermes Birkin bag, those that do fall in love with them. The allure of these bags is guaranteed to continue for the many years as people scramble to own a piece of fashion history. The Pleasure Of Having Authentic Hermes Birkin Bag By Hermes Birkin
<a href="" >oakley sunglasses coupons</a>
oakley sunglasses coupons
ray ban australia
Geplaatst op: 25-02-2014 12:11 Quote
c.Full assistance: while dealing with Eurohandbag you will get full assistance possible from the staff. Right from telling the needs to finalization of the product, the fully trained staff will be there to help you out.
<a href="" >ray ban australia</a>
ray ban australia
1 2 Laatste pagina
Nieuw bericht