If a bulky electrical box has to be placed at the edge of a public park, what’s the best way to conceal it so that it won’t detract from its surroundings? How about an air-conditioning condenser beside a historical building, or a portable toilet along a scenic trail?
At the conference on Computer Vision and Pattern Recognition in June, researchers from MIT and several other institutions take a first stab at answering these types of questions, with a new algorithm that can analyze photos of a scene, taken from multiple perspectives, and produce a camouflage covering for an object placed within it.
The researchers developed a range of candidate algorithms and tested them using Amazon’s Mechanical Turk crowdsourcing application, scoring them according to the amount of time volunteers took to locate camouflaged objects in synthetic images. Objects hidden by their best-performing algorithm took, on average, more than three seconds to find — significantly longer than the casual glance the camouflage is intended to thwart.
According to Andrew Owens, an MIT graduate student in electrical engineering and computer science and lead author on the new paper, the problem of disguising objects in a scene is, to some degree, the inverse of the problem of object detection, a major area of research in computer vision.
“Usually these algorithms exploit certain cues — maybe they’re looking for the contours of the object, or the silhouette of the object, the boundaries,” Owens says. “With camouflage, you want to avoid these cues. Conceptually, a cue that would be good for detecting an object is something that you want to remove.”
Greed works
In their new paper, Owens and his collaborators — William Freeman, an MIT professor of computer science and engineering and one of his thesis advisors; Connelly Barnes of the University of Virginia; Flyby Media’s Alex Flint; and Hanumant Singh of the Woods Hole Oceanographic Institution — assume that the object to be concealed is box-shaped, since that’s generally true of the real-world scenarios in which they envision their technology being applied.
All of the candidate algorithms begin at the same place: For each camera angle, they generate a set of coverings for the object’s visible faces that would make it blend into the background perfectly. Of course, a pattern that works well from one camera angle could make the object stick out like a sore thumb from another, so the algorithms then begin making trade-offs.
The simplest algorithm, which the researchers used as a baseline, averages the color values of the patterns produced for each face from each camera angle. The next algorithm simply picks one angle at random for each face. Neither of these fared particularly well on Mechanical Turk.
A slightly more sophisticated algorithm, which the researchers dubbed “greedy,” identifies those camera angles that require the least distortion of the patterns applied to each face. To create the illusion of a circle seen head on, for instance, a face at an oblique angle to the camera would have to have an ellipse superimposed on it. The sharper the angle, the more elongated the ellipse would need to be.
The greedy algorithm tries to minimize the difference between the shape perceived by the viewer and the shape patterned on the object. Once it’s identified a group of angles that meet that criterion, it selects the one that generates coverings for the largest number of faces.
Less is more
The researchers’ first attempt at a more sophisticated algorithm divided each face of the object into a grid, each square of which takes color values corresponding to a single camera angle. The algorithm then selected a value for each square that offered the best approximation of views from as many angles as possible. But it also ensured that the transitions between regions corresponding to different angles were smooth.
This can lead to some odd visual artifacts. As an experiment, the researchers used their algorithm to produce a camouflage covering for a box placed on a bookshelf in their lab. When the viewer faces the box from a particular angle, a book on the shelf appears to fork. A projection of the book’s spine on the top face of the box approaches a corner and, in order not to introduce jarring discontinuities, sends tendrils down both adjacent faces. But as peculiar as that bifurcation looks upon close inspection, the smoothness of the boundaries means that it doesn’t call attention to itself.
In some preliminary tests on Mechanical Turk, however, the greedy algorithm fared surprisingly well. The researchers inferred that that’s because the topology of the environments they were considering naturally limited the number of angles from which photos could be taken. The least distorted pattern from one such privileged angle probably works reasonably well for neighboring angles.
So they created yet another algorithm, which, like the greedy algorithm, selected a pattern drawn from a single perspective for each face. But like the grid algorithm, it performed that selection in a more principled way, choosing for each face the perspective that worked best from as many angles as possible while allowing for smooth transitions between faces. Indeed, Owens says, this algorithm — the best-performing algorithm on the Mechanical Turk tests — could be thought of as a special case of the grid algorithm, in which the grid on each face consists of a single large square.