Google’s Objectron uses AI to track 3D objects in 2D video

Coinciding with the kickoff of the 2020 TensorFlow Developer Summit, Google today published a pipeline — Objectron — that spots objects in 2D images and estimates their poses and sizes through an AI model. The company says it has implications for robotics, self-driving vehicles, image retrieval, and augmented reality — for instance, it could help a factory floor robot avoid obstacles in real time.

Tracking 3D objects is a tricky prospect, particularly when dealing with limited compute resources (like a smartphone system-on-chip). And it becomes tougher when the only imagery (usually video) available is 2D due to a lack of data and a diversity of appearances and shapes of objects.

The Google team behind Objectron, then, developed a toolset that allowed annotators to label 3D bounding boxes (i.e., rectangular borders) for objects using a split-screen view to display 2D video frames. 3D bounding boxes were overlaid atop it alongside point clouds, camera positions, and detected planes. Annotators drew 3D bounding boxes in the 3D view and verified their locations by reviewing the projections in 2D video frames, and for static objects, they only had to annotate the target object in a single frame. The tool propagated the object’s location to all frames using ground truth camera pose information from AR session data.

To supplement the real-world data in order to boost the accuracy of the AI model’s predictions, the team developed an engine that placed virtual objects into scenes containing AR session data. This allowed for the use of camera poses, detected planar surfaces, and estimated lighting to generate physically probable placements with lighting that matches the scene, which resulted in high-quality synthetic data with rendered objects that respected the scene geometry and fit seamlessly into real backgrounds. In validation tests, accuracy increased by about 10% with the synthetic data.

Better still, the team says the current version of the Objectron model is lightweight enough to run in real time on flagship mobile devices. With the Adreno 650 mobile graphics chip found in phones like the LG V60 ThinQ, Samsung Galaxy S20+, and Sony Xperia 1 II, it’s able to process around 26 frames per second.

Google Objectron AI
Google Objectron AI
Google Objectron AI

The Objectron is available in MediaPipe, a framework for building cross-platform AI pipelines consisting of fast inference and media processing (like video decoding). Models trained to recognize shoes and chairs are available, as well as an end-to-end demo app.

The team says that in the future, it plans to share additional solutions with the research and development community to stimulate new use cases, applications, and research efforts. Additionally, it intends to scale the Objectron model to more categories of objects and further improve its on-device performance.

Ask the average homeowner whether they would like to see spiders and ants walking across their table, and the answer is likely to be an emphatic “no.” But what if the arachnids and insects were merely digital — part of an interactive augmented reality experience developed by the BBC and narrated by beloved English actor Stephen Fry?

That’s the pitch behind Micro Kingdoms: Senses, an educational AR app released today by BBC Earth and developer Preloaded for the Magic Leap platform. Using AI to simulate the critters’ real-world interactions, the BBC presents a collection of Leaf Cutter Ants and a large Wandering Spider that promise to respond to the environment they’re placed in and the viewer’s presence. In other words, once you’re wearing Magic Leap’s AR glasses, you can expect them to walk across a surface you’ve defined as safe and move defensively if you come too close. Users can also drop a leaf or rock and watch as the insects interact with it.

Ants and spiders may be an ideal match for the capabilities of early Magic Leap hardware. Though the company initially suggested its technology would be able to bring realistic digital whales into school gymnasium-scale settings, the first-generation headset’s limited field of view augments only a small square of space within the wearer’s vision. But a reasonably powerful Nvidia Tegra X2 chipset supports fairly realistic 3D models inside that augmented area, which the developers have used to create believably furry spiders and precisely animated ants, as well as leaves and anthills to interact with.

tephen Fry jovially narrates the experience, which lets users see ants in Central America’s tropical rainforest conditions and the spider in a Brazilian forest. Beyond the immersion value of being able to walk up to the creatures and inspect them, the app introduces some drama, including an up-close look at how the spider hunts down prey and the walking ants suddenly taking flight.

Micro Kingdoms: Senses is available today to download from Magic Leap World, though the cost of the hardware will likely keep the app from reaching average consumers. Magic Leap 1 hardware continues to be available starting at $2,295, with developer suite and enterprise suite packages at $2,495 and $2,995.