Paperclip Marketing

ppc, seo and the rest of internet marketing discussed

Archive for the ‘Image Search’ Category

Idée Inc, a Toronto-based company, has announced that it has built a new search engine that employs true image recognition.  Currently in private beta, Idée claims:

TinEye is the first image search engine on the web to use image identification technology. Given an image to search for, TinEye tells you where and how that image appears all over the web—even if it has been modified.

Just as you are familiar with entering text in a regular search engine such as Google to find web pages that contain that text, TinEye lets you submit an image to find web pages that contain that image.”

The company released a widget that demonstrates some of the algorithm’s findings:

While I am excited to see progress being made toward moving search algorithms beyond text, I question the utility of such an application.  TinEye relies on images as its search query.  When a user uploads a picture, the program creates a digital “fingerprint” for it.  Then, it compares this fingerprint against its index (said to be rapidly growing).  The results, in theory, are exact or near-matches of the searched image.

Image to image search isn’t new.  I worked for a company in 2000-2001 doing the same thing (never publicly released as a search engine), and not just for images (also video and audio).  The problem this company will have is the same as our problem was then— the application, while cool and novel, has no real practical purpose beyond copyright protection.  According to the company, uses for TinEye include:

  • Find out where and how an image appears online
  • Research products using a product photo
  • Find modified versions of unmodified images or vice versa
  • Research the usage of editorial or stock images
  • Get international, multilingual websites in your search results
  • Research corporate imagery or brand usage online
  • Use a webcam to digitize any image and search for it on the web
  • Search for your images to see where they are being used
  • Yes, copyright protection is important and valuable, even if less so for images than for other media (movies, music).  So, maybe they stand a chance serving a relatively niche market.  But, other than being interesting to try once, this doesn’t solve the bigger issue of providing a useful new feature for mass audiences to adopt.

    How can a company create a truly useful utility out of recognition-based image search, you ask?

    Build a method for generating, validating, and maintaining textual data to accurately describe each “fingerprint”.  Only then can the search query move beyond uploading an image, and starting to use words instead.  Only then will the mass public find it appealing, and adoption of the new technology can begin.  Google’s started on this using their Image Labeler game.  But, with millions upon millions of images on the web, this method of collecting meta data isn’t scalable or effective.  Perhaps the best bet is using meta data collected from image collection management software/services, such as Picasa or Flickr.  But, without validation (something Image Labeler is successful in achieving), such a process would certainly be plagued with bad data and prone to manipulation. 

    Either way, I wish Idée Inc the best of luck and really hope they succeed.  We are definitely overdue to start thinking about how search can move past text-text, and on to other useful applications like text-image or text-video.

    Share/Save/Bookmark

    Google makes a game out of image search

    At this point, text-based search seems second nature to most of us.  Average internet users are becoming more adept at using search applications, and as a result their queries are becoming longer and more specific.  Depending on who you listen to, the average search engine query is now probably somewhere between 3 and 4 words. 

    Likewise, text matching algorithms at search engines continue to evolve.  In the war to gain and sustain usership, the quality and precision of search results is believed to drive satisfaction and therefore influence repeat traffic. 

    Certainly, the most mature (and also most popular) form of search is text to text.  A user enters a search phrase, and that phrase is matched to web pages of similar or related themes, ranked by “importance”. 

    So, what about searches for other media types?  Text to images?  Text to audio?  Text to video?  These variations have proven to be significantly more difficult to develop, employ and gain usage around.  One reason is the inherent inaccuracy of meta data.  Using webmaster-developed descriptive words and phrases to match these files to search queries is troublesome, without the means for further validation.  And, while image, audio, and even video recognition software has been available for several years– scaling it’s functionality to a global level has held it back from being employed on any mainstream application. 

    How can the search engines learn more than meta tags teach them?  It looks like a game may be the answer.

    Flying low under the radar, Google has been collecting data to improve image search since August 31st, 2006, using a game it licensed from Luis von Ahn.  The game is called Google Image Labeler (Google has a special way with branding).  In it, two users are put together to see random images from Google’s index, and they enter words and phrases that best describe each picture.  When the words entered by both users match, points are awarded.  The more specific the phrase, the more points that are given (’yellow ford mustang’ gets more points than ‘car’).  Simple, but the game is strangely addictive.  There are no prizes, there is no communication between users.  A small amount of egoboo may come from getting to the top of the results, but certainly less than other games on the web. 

    Nonetheless, the game generates a ton of useful data for Google.  It gives them a mechanism to generate an independent second layer of meta data to use in validation, without needing to really understand what they are matching.  If webmaster-generated meta keywords are consistent with two independent players of the Google Image Labeler game, they must be accurate to the content of the image.  Serving accurate results drives increased usership, increased usership means more ad serving opportunities, all meaning more revenue for Google.  Smart.  Very smart.

    I can’t imagine audio and video are very far behind.  I’ll be watching, and playing along.

    Share/Save/Bookmark