Skip links
Main content

Mapping to a relational database

zondag 06 januari 2013 21:17

Up until now I had always looked at the mapping of a semantic sentence representation to the tables of a database as something that required some hacking. But now I found a proper way to deal with it.

Given a logical representation of a sentence, say

In what year did Mary Shelley write Frankenstein?

write(e, a, b)
name(a, "Mary Shelley")
name(b, "Frankenstein")
time(e, t)
question(t)

you want to use a database to find out the answer (t).

In this article I want to share a key concept and work it out with some examples.

The key concept is:

Create an explicit, declarative mapping of logical relations to database relations.

The predications I named before serve the purpose of relations, as they describe true relations of given objects. Relational databases also contain relations (yes, really :) ). But these relations are not the same. So, in order to access the database, there needs to be some form of transformation of the predications to the query language of the database. Many ways are thinkable, but I know of only one elegant method. It is described in [TEAM-1987]. In this paper Grosz et al describe a proof-of-concept system that does exactly that what we need. I am only going to name the parts of the 80-page article that are needed to make my point. The system itself is much more complex, of course.

The query is done, essentially, in three phases:

  1. Rewrite the logical relations to database relations
  2. Transform the database relations to a database query
  3. Execute the query

To me, the most important step is the first. It requires the system developer to create transformation rules that map the logical relations and their arguments to corresponding database relations and arguments. This step needs to be performed once for each new database for which a natural language interface is needed.

Step 2 transforms and accumulates the database relations into a database query (for example SQL, or SPARQL; the paper itself uses SODA). This transformation needs to be programmed out for each database query language.

Examples

Given our example predications, let's look at some sample database structure and the way the relations map. This is also a good excercise for myself, since I have had no experience with this at all ;)

Database 1

Publications
Author Book Year
5 18 1818
6 22 1866

Authors
 ID Name
 5 Mary Shelley
 6 Fyodor Mikhailovich Dostoyevsky

Books
 ID Name
18 Frankenstein
22 Crime and Punishment

This is the mapping:

  1. write(e, a, b) => publications(author = a, book = b)
  2. write(e, a, b), name(a, c) => authors(id = a, name = c)
  3. write(e, a, b), name(b, d) => books(id = b, name = d)
  4. time(e, t), write(e, a, b) => publications(author = a, book = b, year = t)

Let's examine this mapping:

  • The first mapping is 1-to-1. A single relation is mapped to a single relation.
  • The second and third mappings show that this is not always the case. These are n-to-1 mappings.
  • And, as to prove that 1-to-n mappings are also needed, mapping 4 provides one.
  • While possible, it is probably wise not to use n-to-m mappings. They look complicated and are less flexible than the 1-to-n / n-to-1 mappings
  • Notice too, that while the logical form uses events, these have no explicit representation in the database. I thought this would be a major issue, but apparently it can be overcome.

Now, let's try a completely different database:

Database 2

AuthorPublications
Author Publication
5 101
6 104

Publication
 ID Book Year
101 Frankenstein 1818
104 Crime and Punishm 1866

Authors
 ID Name
 5 Mary Shelley
 6 Fyodor Mikhailovich Dostoyevsky

This is the mapping:

  1. write(e, a, b) => authorpublications(author = a, publication = e)
  2. write(e, a, b), name(a, c) => authors(id = a, name = c)
  3. write(e, a, b), name(b, d) => publication(id = e, book = d)
  4. time(e, t), write(e, a, b) => publication(id = e, year = t)

Remarks:

  • Note that the event variable e is used in this example. That is because a publication is treated as an event here.
  • Make sure that all relations and all arguments of the logical form are represented in the relational form. Otherwise some constraints will be missing and you will get too many results from your query.

Building the query

It is probably a good idea to give at least an example of what a query would look like that is generated from a set of database relations. So I will have a go.

Using the relations for the second database I came up with this SQL query. It is just to give you an idea of how such a thing may be done:

SELECT p2.year AS t
FROM authorpublications ap
INNER JOIN authors a
INNER JOIN publication p1
INNER JOIN publication p2
WHERE
    a.id = ap.author AND a.name = "Mary Shelley"
    p1.id = ap.publication AND p1.book = "Frankenstein"
    p2.id = ap.publication

Notes:

When the logical query is executed, variables are used for the arguments. These variables determine the join constraints. For example, when the same variable is used for authorpublications.publication as it is for publication.id, the constraint authorpublications.publication = publication.id is added.

References

[TEAM-1987] TEAM: An Experiment in the Design of Transportable Natural-Language Interfaces - Barbara J. Grosz, Douglas E. Appelt, Paul Martin, Fernando Pereira

Labels
nlp

« Terug