Google teaches computers the meaning of words

Computers can learn the meaning of words with the help of the Google search engine. CWI researchers Rudi Cilibrasi and Paul Vitányi found a way to use the world wide web as a massive textbook for computers.

Publication date
2 Feb 2005

Computers can learn the meaning of words with the help of the Google search engine. CWI researchers Rudi Cilibrasi and Paul Vitányi found a way to use the world wide web as a massive textbook for computers.

The meaning of a word can often be derived from words in the neighbourhood. Two related words will therefore be likely to give more hits when they are plugged into Google than two unrelated words. Cilibrasi and Vitányi developed a statistical measure of the 'distance' in meaning between words, based on the number of Google page hits. The lower this so-called normalized Google distance, the more closely words are related.

In this way maps of words can be generated, which the computer could use to learn their meaning. In several tests, Cilibrasi and Vitányi demonstrate the method can distinguish between colors and numbers, between prime numbers and composite numbers, and to distinguish between 17th century Dutch painters. It shows the ability to understand electrical terms, religious terms, emergency incidents terms. Furthermore, the researchers conducted a massive experiment in understanding randomly selected WordNet categories and got an 87.5 percent mean agreement with expert-entered semantic knowledge. The method also exhibited the ability to do a simple automatic English-Spanish translation.

More information can be found on INS4's website, a preprint of a paper describing the technique, Vitányi's homepage or the New Scientist.