$MTBlogName$

« Virtual tour through lab one | Main

Image Tracer: Download + Manual

[Beta version]

This Image Tracer tracks the image URLs found through Google Image Search and documents changes over time. The script checks whether images are still available, whether they have changed in rank (i.e. do they show up higher or lower in the Google image result list?) and whether new ones were added. Unlike Google image search, The Image Tracer meticulously archives the history of any image search you decide to track, and might help you find patterns and significant changes that would otherwise remain invisible.

The script is written by Tsila Hassine and is free software licensed under the GNU GPL. We encourage you to copy, distribute and modify it. More about this script you can find here. This is the Beta version; version 1.0 will be available here from February 2006. A web based version with graphical interface is in preparation.

Requirements

The script runs from the command line, is platform independent, and needs the programming language Python installed (available on most OSX and Linux machines).

How to use the Image Tracer

Download this file: tracer1.tgz 

Create a new folder anywhere on your computer, and name it "tracer" (or choose any other name). Extract the file tracer1.tgz and place it into the folder "tracer".

Open a terminal window (OSX: Applications > Utilities > Terminal) and change the directory to the folder you just created:

username@yourcomputer:~$ cd yourdirectory/tracer

Now define a (set of) search terms you would like to track. Let's imagine you want to look for "Bush" and "Katrina".

Tracer

Start the tracer with:

username@yourcomputer:~/yourdirectory/tracer$ python tracer.py bush katrina

Of course you could replace "bush" and "katrina" by other searchterm(s).

In this case, Google image search found 627 results in its cache. The script will check each of these results to see whether the image actually exists at the given address. It might take a while before this process is finished (15 minutes for 500-700 results is not extraordinary).

An XML-file "bush_katrina.txt" will be automatically created in the folder "tracer".

Bush_katrina_2

The following information is recorded in "bush_katrina.txt":

<url> image URL Google found
<date> date and time search was done
<url_rank> rank of the result
<exists> whether the image was found.

Possible values:

yes = The image exists at the given address
MOVED_PERMANENTLY = The image has been assigned a new permanent URI
NOT_FOUND = The server has not found anything matching the image-URI. No indication is given of whether the condition is temporary or permanent.
FORBIDDEN = The image URI is correct and the server understood the request, but is refusing to fulfill it.
FOUND = The image resides temporarily under a different URI.
BAD_REQUEST = The request could not be understood by the server. because it was written incorrectly.
SERVER_NOT_FOUND =
TEMPORARY_REDIRECT =  The image resides temporarily under a different URI.

A full list of status codes and their explanations can be found here.

When you re-run the script with an existing query (a search term or combination thereof), the Image Tracer will look for the matching document in the "tracer" folder, and add the newly found information to the file. There is no limit to the amount of searches per query.

For each new query the script will simply create a new file.

Known issue

If the script comes across an image url which points to a server which is not working properly, the script might get stuck. If you have to stop the Tracer manually, the .txt file will get corrupted and needs to be re-created, which means you will lose earlier results. We are working on getting this bug fixed; in the mean time please back up your file before running the tracer again.

Checking for Python

If you want to make sure you have Python installed, open a terminal window (OSX: Applications > Utilities > Terminal) and type:

username@yourcomputer:~$ python

On OSX it should return a message such as:

Python 2.3 (#1, Sep 13 2003, 00:49:11)
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin

On Ubuntu:

Python 2.4.2 (#2, Sep 30 2005, 21:19:01)
[GCC 4.0.2 20050808 (prerelease) (Ubuntu 4.0.1-4ubuntu8)] on linux2

if Python is not installed, please go to http://www.python.org/ to download and install Python.

January 8, 2006 | Permalink

Comments

Post a comment