Tineye Image search Part 1

July 27, 2008 · 0 comments

Tineye is a new image search engine put out by Idée Inc. in Toronto which promises to revolutionize the process of finding images on the web. This is the first of a series of blog posts which will review Tineye from the perspective of photographers, particularly stock photographers

Image Search

The core technology of Tineye is a sophisticated image “fingerprinting” and pattern matching technology which can identify variations or copies of any given image, even if the image has been cropped, flopped, stretched and the colors changed. It does so by looking at patterns in the images themselves only; no keywords, text or metadata are used in the process.

Idée has for several years been providing a custom visual search service for private clients, examining print publications and selected sites for copyright violations or to help identify images used for billing.

What they have added in the new public (beta) offering of Tineye is a massive database of images obtained from crawling the web. Already at 701 million images, the search engine has but a small fraction of the web’s images indexed. But wait. They are moving full speed ahead to ramp up their coverage, adding almost 200 million images a month.

The way it works is simple. After you request an invite and register, you go to the main page and either a) upload an image or b) enter an image URL on the web. Tineye computes the visual fingerprint of the search image and then searches its database and returns all matches, with the best matches first.

Benefits for Photographers

There are three primary ways photographers will benefit from image search technology like Tineye. First, they can use image search to find unauthorized uses of their images, and possibly recover revenue; secondly they can find legitimate usages they were unaware of and get tearsheets and publication credits to help build their portfolio. Third, photographers can be found by a photo buyer who sees their image online and uses Tineye to find the copyright owner to license the image. This last benefit is perhaps the most powerful since it promises to ameliorate much of the Orphan Works problem.

How Well Does it Work?

For an initial review I ran the Tineye search engine against 7660 images currently on my website. Many of the images are represented by stock agencies, and I have maintained detailed sales history on every image.

Of these, 813 are also on the Getty Images website, 5890 are on Alamy and 4000 images are on Digital Railroad Marketplace. Subjects included a wide range of travel subjects from portraits and landscapes to cities, festivals and food. The time online varies from 5 years to a few months.

Results:

Tineye found a total of 996 matches for 319 images. The highest number of matches was 328. Of all the images, 246 images had only one match.

Accuracy:

An ideal image matching program should have minimal false positive matches and at the same time not miss any true matches to the source image. Tineye has built-in thresholds; if a test image is less than x-different from the sample, it is counted a match; if it is more, then no match. Different users will have different needs. A creative designer looking for ideas based on a sample image might want to cast wide net, whereas a photographer looking for uses of his exact image might want a fairly tight net.

So. How did Tineye do? Quite well. Not all of the matches found were real matches to one of my images. A total of 659 of the matches, for 54 unique images, were in fact mismatches. However, more than half were taken up by variations of a single image of Michelangelo’s The Creation of Adam in the Sistine Chapel. Most of the other were images of familiar travel subjects or historical artwork.

Sample mismatches were:

Taj Mahal

0-0-68  stock photo of India, Agra, View through marble lattice screen, Taj Mahal  

Cabo San Lucas, Mexico

0-51-46  stock photo of Mexico, Cabo San Lucas, El Arcos, Lands End  

a door in Sidi Bou Said, Tunisia

3-1100-7  stock photo of Tunisia, Sidi Bou Said, Painted doorway  

There were also some surprises. Tineye seems particularly keen to match written text:

a poster in Poland

4-960-1398  stock photo of Poland, Jelenia Gora, Poster  

And this whopper.

Manchester United stadium

    

Several items of artwork rounded out these false positives. Whereas they are not infringements of the photographers photograph, they could actually be very helpful for a photo buyer seeking editorial coverage.

Athens

9-252-75  stock photo of Greece, Athens, Frieze of Poseidon, Apollo and Artemis, Acropolis Museum  

There were some cases where it was very difficult to tell if the matching image was indeed a copy of my own. Sometimes the shadows and cropping identified the image, or the arrangement f water droplets.

Missed matches

The other measure of Tineye’s accuracy is how many matches it missed. At this point it is impossible to tell unless I know for certain that it had an image in its index that it failed to match on an image search. Without the insertion of carefully crafted test cases it is very difficult to know what it is in the index. Certainly it missed catching several of my book covers on Amazon, but there’s no way to tell if they were on indexed pages.

One type of image that Tineye is not optimized for, though, is the simple graphic image. For example, the following two images do not match:

    

Lastly there are some images which do not have enough detail for Tineye’s algorithm and so do it cannot perform any matches. In my test there were two such images:

4-182-31  stock photo of China, Beijing, Sunset from the Great Wall, Mutianyu    S4-575-1213  stock photo of California, Streetlights

Conclusion (Accuracy)

All in all, if Tineye can be said to err, from the perspective of the photographer, it would be slightly on the side of matching too many non-identical images. However, this might be necessary in order to catch cases where the person using an image has added material or significantly changed the image. The additional work required to examine a few more matches outweighs the chances of missing something.

In this review, 54 of 319 matched images (17%) turned out to be in fact different images. This figure is likely much higher for a travel collection than for others, because of the number of familiar subjects shot from a similar vantage point.

After subtracting out mismatches there remained a total of 337 matches for 265 images. This represents valid matches for about 3.5% of my images tested.

In the next post we will look at the images which did match and which of them might be infringements and which valid publications.

{ 0 comments… add one now }

Leave a Comment

Previous post:

Next post: