Seeing With Electric Eyes. Video Recognition goes Disney

Recognizing objects in images, though often easy for humans, remains a challenge for automated systems.

[dropcap style=”font-size:100px; color:#992211;”]W[/dropcap]oo! Disney techies have developed software to help computers evaluate real-life objects by comparing them to what it sees in videos.

Because that really worked for teenage boys viewing Baywatch, after all.

A research group at Disney Research Pittsburgh has developed a computer vision system that, much like humans, can continuously improve its ability to recognize objects by picking up hints while watching videos.

Like most other object recognition systems, the Disney system builds a conceptual model of an object, be it an airplane or a soap dispenser, by using a learning algorithm to analyze a number of example images of the object.

What’s different about the Disney system is that it then uses that model to identify objects, when it can, in videos. As it does, it sometimes is able to glean new information about such objects, enabling it to make its own model of the object more complex. And that in turn enables the system to more readily recognize such objects in a wider variety of conditions.

“This process continues, potentially indefinitely, over the lifetime of the recognition system,” said Leonid Sigal, a senior research scientist at Disney Research Pittsburgh. “This is a learning system that is continuously evolving through unsupervised experience to build a more complete and complex model of the world.”

Recognizing objects in images, though often easy for humans, remains a challenge for automated systems. Systems that learn to recognize objects using one set of images may have difficulty recognizing those same objects in the real world, or under different sets of conditions, or domains.

Rather than try to get a system to more accurately recognize objects using its original model for that object in new domains, the Disney group took a different approach – expanding the object domain incrementally. That means that the system’s model for each object will be continuously fine-tuned as the system encounters new information.

One potential problem is that the system, which does this fine tuning without human supervision, may start ascribing attributes to an object that aren’t pertinent and lead to errors in detection, but thus far this “domain drift” has not been detected by the Disney researchers.

They tested their incremental learning method against several other leading object recognition methods, using two standard video datasets that included a variety of objects found in the home. In most instances, it outperformed the other methods in detecting items such as microwave ovens, mugs and stoves and demonstrated that it not only got better with experience at detecting these objects in the videos, but also in detecting objects from its original training images.

Source: Disney Research
Image: Leozeng