21 May, 2005 at 10:04 Leave a comment

Google’s search for meaning

  • 29 January 2005
  • From New Scientist Print Edition. Subscribe and get 4 free issues.
  • Duncan Graham-Rowe

COMPUTERS can learn the meaning of words simply by plugging into Google. The finding could bring forward the day that true artificial intelligence is developed.

Trying to get a computer to work out what words mean – distinguish between “rider” and “horse” say, and work out how they relate to each other – is a long-standing problem in artificial intelligence research.

One of the difficulties has been working out how to represent knowledge in ways that allow computers to use it. But suddenly that is not a problem any more, thanks to the massive body of text that is available, ready indexed, on search engines like Google (which has more than 8 billion pages indexed).

The meaning of a word can usually be gleaned from the words used around it. Take the word “rider”. Its meaning can be deduced from the fact that it is often found close to words like “horse” and “saddle”. Rival attempts to deduce meaning by relating hundreds of thousands of words to each other require the creation of vast, elaborate databases that are taking an enormous amount of work to construct.

But Paul Vitanyi and Rudi Cilibrasi of the National Institute for Mathematics and Computer Science in Amsterdam, the Netherlands, realised that a Google search can be used to measure how closely two words relate to each other. For instance, imagine a computer needs to understand what a hat is.

To do this, it needs to build a word tree – a database of how words relate to each other. It might start off with any two words to see how they relate to each other. For example, if it googles “hat” and “head” together it gets nearly 9 million hits, compared to, say, fewer than half a million hits for “hat” and “banana”. Clearly “hat” and “head” are more closely related than “hat” and “banana”.

To gauge just how closely, Vitanyi and Cilibrasi have developed a statistical indicator based on these hit counts that gives a measure of a logical distance separating a pair of words. They call this the normalised Google distance, or NGD. The lower the NGD, the more closely the words are related.

By repeating this process for lots of pairs of words, it is possible to build a map of their distances, indicating how closely related the meanings of the words are. From this a computer can infer meaning, says Vitanyi. “This is automatic meaning extraction. It could well be the way to make a computer understand things and act semi-intelligently,” he says.

The technique has managed to distinguish between colours, numbers, different religions and Dutch painters based on the number of hits they return, the researchers report in an online preprint (www.arxiv.org/abs/cs.CL/0412098).

The pair’s results do not surprise Michael Witbrock of the Cyc project in Austin, Texas, a 20-year effort to create an encyclopaedic knowledge base for use by a future artificial intelligence. Cyc represents a vast quantity of fundamental human knowledge, including word meanings, facts and rules of thumb. Witbrock believes the web will ultimately make it possible for computers to acquire a very detailed knowledge base. Indeed, Cyc has already started to draw upon the web for its knowledge. “The web might make all the difference in whether we make an artificial intelligence or not,” says Witbrock


Entry filed under: Uncategorized.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Trackback this post  |  Subscribe to the comments via RSS Feed


May 2005
« Apr   Jun »


%d bloggers like this: