At this point, text-based search seems second nature to most of us.  Average internet users are becoming more adept at using search applications, and as a result their queries are becoming longer and more specific.  Depending on who you listen to, the average search engine query is now probably somewhere between 3 and 4 words. 

Likewise, text matching algorithms at search engines continue to evolve.  In the war to gain and sustain usership, the quality and precision of search results is believed to drive satisfaction and therefore influence repeat traffic. 

Certainly, the most mature (and also most popular) form of search is text to text.  A user enters a search phrase, and that phrase is matched to web pages of similar or related themes, ranked by “importance”. 

So, what about searches for other media types?  Text to images?  Text to audio?  Text to video?  These variations have proven to be significantly more difficult to develop, employ and gain usage around.  One reason is the inherent inaccuracy of meta data.  Using webmaster-developed descriptive words and phrases to match these files to search queries is troublesome, without the means for further validation.  And, while image, audio, and even video recognition software has been available for several years– scaling it’s functionality to a global level has held it back from being employed on any mainstream application. 

How can the search engines learn more than meta tags teach them?  It looks like a game may be the answer.

Flying low under the radar, Google has been collecting data to improve image search since August 31st, 2006, using a game it licensed from Luis von Ahn.  The game is called Google Image Labeler (Google has a special way with branding).  In it, two users are put together to see random images from Google’s index, and they enter words and phrases that best describe each picture.  When the words entered by both users match, points are awarded.  The more specific the phrase, the more points that are given (’yellow ford mustang’ gets more points than ‘car’).  Simple, but the game is strangely addictive.  There are no prizes, there is no communication between users.  A small amount of egoboo may come from getting to the top of the results, but certainly less than other games on the web. 

Nonetheless, the game generates a ton of useful data for Google.  It gives them a mechanism to generate an independent second layer of meta data to use in validation, without needing to really understand what they are matching.  If webmaster-generated meta keywords are consistent with two independent players of the Google Image Labeler game, they must be accurate to the content of the image.  Serving accurate results drives increased usership, increased usership means more ad serving opportunities, all meaning more revenue for Google.  Smart.  Very smart.

I can’t imagine audio and video are very far behind.  I’ll be watching, and playing along.

Share/Save/Bookmark