A few weeks ago, I was in a shopping mall when I noticed a woman carrying a great handbag with a rope-like strap. Since I’m in the market for a new tote, I contemplated asking her where she got it. But before I could make my move, she disappeared around a corner. When I got home, I tried Googling the bag. But I’m no fashionista, and I found I didn’t have the vocabulary to describe what I’d seen. “Leather handbag with drawstring strap” wasn’t right. Neither was “purse with rope handle” or “bag with cord strap.” Eventually, I gave up.
Now, a new technology aims to help people search for things they can’t necessarily describe in words.
James Hays, a computer scientist at the Georgia Institute of Technology, has created a computer program capable of matching hand-drawn images to photographs. This could eventually lead to a program that can comb internet image search services, such as Google Images, and find photographs that accurately match users' drawings.
“The goal is to be able to relate or match photos and sketches in either direction, just like a human can,” Hays says. “A human can see a badly drawn sketch and figure out what photo it seems to match to. We want to have the same capability computationally.”
To create the program, Hays hired nearly 700 workers from Amazon Mechanical Turk, a crowdsourcing marketplace that matches workers with people who need tasks done. His team showed the workers photos of ordinary objects and animals, such as squirrels, teapots and bananas, allowing them to look at the image for two seconds. The worker would then draw the object from memory. The team eventually gathered more than 75,000 sketches of 12,500 objects. They called this the “Sketchy database.”
The program then analyzed the sketches and matched them with the photograph they most closely resembled. The technology identified the correct photo 37 percent of the time. Humans, in comparison, were correct about 54 percent of the time. While 37 percent may not seem impressive, it’s actually quite a leap for computers.
“Humans are so startlingly good at vision already, we recognize images effortlessly,” Hays says. “It’s actually surprisingly difficult computationally.”
One of the main challenges in improving the program is that most people are pretty lousy artists. As Hays and his team wrote in a paper on the subject, “Shapes and scales are distorted. Object parts are caricatured (big ears on an elephant), anthropomorphized (smiling mouth on a spider), or simplified (stick-figure limbs).”
Historically, research on getting computers to recognizes sketches has focused on things like the distribution of lines in a drawing, the direction the lines go in or where the boundaries of the drawing are. But since humans only draw what’s salient to humans (eyes, for example, are always included in sketches, even though they’re relatively small), it’s important for a computer to “learn” how sketches tend to be similar and how they tend to be different from photographs. For this, the program uses two separate networks, one that evaluates sketches, one that evaluates photographs. By constant analysis of a large dataset, the program can continuously “learn.”
Hays and his team plan to continue improving the program by adding data. Advances in computer learning should also help improve the match rates. As of now, the program has a fairly high match rate when comparing sketches to internet photo databases, including Flickr, though it's difficult to quantify, Hays says.
In addition to the handbag image search I so sorely need, the program has a number of less frivolous potential uses. Police could scan suspect sketches and compare them to a database of criminal photographs. The program could be used by people who speak and write in any language, or can’t write at all.
“One goal of understanding sketches is that they’re a somewhat universal language,” Hays says. “It’s not tied to a particular written language and it’s not even tied to literacy at all. [A program like this could bring] access to information without written language.”
The program could also be used artistically, to create photorealistic scenes out of sketches. Always imagined living in a castle on the moon? Draw it, and the program could one day create a photo image for you by stitching together pieces of other images.
The information gathered by Hays and his team could also help address some neuroscience and psychology questions, Hays says.
“These sketch-photo pairs are saying something about human perception, about what we think is salient, what parts of images capture our attention,” Hays says. “In some ways, this database encodes this pretty well. There could be something to be teased out of that, if you want to say something about humans themselves.”