Skip links
Main content

Question space

zondag 10 mei 2009 22:36

Much machine-interpretable information has been made available already. Think of OpenCYC, DBPedia, and FreeBase, just to name a few. See this graph for an idea of this. Now is the time to create a good, useful, Google-like query interface to all this data. One that normal people can use without prior education or training.

A "natural language" interface would seem to be suitable for this kind of queries. Just type in your question and receive an answer derived from all these data sources.

However, even if a natural language parser succeeds to create a semantic representation for a sentence, and this is not a minor task, it is not clear how this representation would map to the information present in the data sources.

A possibly even more important point is the fact that the available knowledge is (though huge) extremely limited. Even very basic knowledge is missing. A beginning user of the system will try some "simple questions" and come soon to the comclusion that it doesn't know a lot.

So, in my view, there are two things that need to happen. First, the user will need to get an idea of what questions he could ask the system. He or she has an idea of what he/she wants to ask. And the system will tell him/her what types of questions are possible to ask. This should be done in a subtle way, a way that does not require much effort on the part of the user. The questions that the user can pose should be made available by the data sources, or the system should create a wrapper around these data sources to expose the types of questions that can be asked of it. The user may only pose certain questions, not all questions, and though this is a limitation, it will feel comfortable, because the system is helping the user to ask the question. I call this space of allowable questions "question space".

The second thing is more technical in nature. The query system, which after all is only a portal to the external knowledge basis, needs to store a compact representation of all allowable questions, per knowledge source. It is not an option to copy all external sources in their entirety to determine the allowable questions. So if the external source has information about movies, and the query is about "Titanic", the query portal should know that information about Titanic, the movie, is available from the external source on movies, without copying the entire content of the data source.

This is a challenge!

Labels
my agent

« Terug