Nov. 21, 1996
Vol. 16, No. 6

current issue
archive / search

    Picture this: Thousands of images at your fingertips

    Researchers create new Web search engine Have you ever wondered whether there's a picture of the Mona Lisa on the World Wide Web? Or wanted to find a good picture of an eagle? Or the aurora borealis, Michael Jordan or the Space Shuttle?

    Now there's a new image-based search engine that lets you find images directly, rather than surfing blindly or using text-based searches. Dubbed "WebSeer," this innovative image-based search engine was designed and built at the University's Intelligent Information Laboratory, founded by Kristian Hammond, Associate Professor in Computer Science.

    Michael Swain, Assistant Professor in Computer Science and the main architect of WebSeer, said, "This is a really cool way to search the Web, and it's the first useful way that people can search for images: photographs as well as drawings."

    WebSeer (http://webseer.cs. uchicago.edu) stores contextual information about images, such as the caption, the title of the Web page and the name of the image file. It can also give the user information about the image, such as whether it's a photograph or a drawing, color or black and white, whether or not there are people's faces in the picture, and if so, how many there are and how close they are in the frame.

    "If you just look at the pixels of an image, it's beyond the current computer vision technology to identify particular objects such as elephants or Saabs or Zepplins," said third-year graduate student Charles Frankel, who developed the architecture and the indexing system for WebSeer. "Because it's so difficult, we have to use a combination of techniques that include the context that the image is in and the image-understanding algorithms that we do have. The architecture allows us to incorporate new image-understanding techniques as they are developed, either here or in collaboration with other universities."

    Thousands of new images

    WebSeer has a sophisticated automatic indexing system that can add up tens of thousands of images a day to its data base. To date there are 500,000 images and growing at a rate of one every two seconds.

    Searching for "Michael Jordan, photograph" retrieves images of Jordan kissing the championship trophy, swinging a baseball bat and driving for the basket. Searching for the aurora borealis, or northern lights, for example, yields several photographs -- as well as a diagram of how the aurorae are generated -- and links to pages such as the Space Plasma Physics Center, tourism in Scandinavia and Alaska, and NASA studies of the solar wind and auroral physics. Clicking on the thumbnail of the image takes you directly to the image itself on the Web; to see it in context, click on the "page" icon next to the thumbnail.

    First it crawls

    WebSeer uses a sophisticated combination of contextual and visual cues invisible to the user to analyze and store information about images on the World Wide Web. WebSeer adds images to its data base by using a Web crawler, which starts at one Web page and then moves through all the links on that page and then through all the subsequent links on those pages, and so on.

    When the crawler finds an image on a Web page, it first analyzes text, such as the caption, the name of the image file and the title of the Web page, to gather clues about what is actually in the image. It weights the information according to how relevant it is likely to be (caption vs. page title, for example) and stores it.

    To determine whether an image is a photograph or drawing, WebSeer analyzes the content of the image using information -- such as the number of sharp edges, color and distinct boundaries -- in a "decision tree" developed by second-year graduate student Vassilis Athitsos and trained on 30,000 images.

    Then WebSeer searches for skin hues to determine whether or not there are people in an image. If so, it uses a neural network, developed at Carnegie Mellon University and trained on thousands of images, to locate each face in the image.

    WebSeer stores in its data base a thumbnail version of the image, its URL and the URL of the page it's on, along with information about the picture, such as whether it's a photograph or a drawing, how many faces are included and the file type of the image.

    Swain plans to add many new features to WebSeer as new vision algorithms are developed. Already, researchers from Berkeley, MIT and Caltech, in addition to Chicago and Carnegie Mellon, are lining up to plug their applications into Swain and Frankel's system.

    Collaborators at Berkeley are working on detecting animals and people in pictures; MIT vision researchers are developing a trainable object-recognition system, and Caltech is developing a face-finding algorithm that will compete with Carnegie Mellon's. At Chicago, Yali Amit, Associate Professor in Statistics, is working on reading text in images using optical character recognition techniques.

    "This is an incredibly valuable research tool for us and researchers at other universities," Swain said, "and I think WebSeer is going to be the most widely used application of computer vision ever."

    -- Diana Steele