Robotics developer Figure made waves on Wednesday when it shared a video demonstration of its first humanoid robot engaged in a real-time conversation, thanks to generative AI from OpenAI.
“With OpenAI, Figure 01 can now have full conversations with people,” Figure said on Twitter, highlighting its ability to understand and react to human interactions instantly.
The company explained that its recent alliance with OpenAI brings high-level visual and language intelligence to its robots, allowing for “fast, low-level, dexterous robot actions.”
In the video, Figure 01 interacts with its creator’s Senior AI Engineer Corey Lynch, who puts the robot through several tasks in a makeshift kitchen, including identifying an apple, dishes, and cups.
Figure 01 identified the apple as food when Lynch asked the robot to give him something to eat. Lynch then had Figure 01 collect trash into a basket and asked it questions simultaneously, showing off the robot’s multitasking capabilities.
On Twitter, Lynch explained the Figure 01 project in more detail.
“Our robot can describe its visual experience, plan future actions, reflect on its memory, and explain its reasoning verbally,” he wrote in an extensive thread.
According to Lynch, they feed images from the robot’s cameras and transcribe text from speech captured by onboard microphones to a large multimodal model trained by OpenAI.
Multimodal AI refers to artificial intelligence that can understand and generate different data types, such as text and images.
Lynch emphasized that Figure 01’s behavior was learned, run at normal speed, and not controlled remotely.
“The model processes the entire history of the conversation, including past images, to come up with language responses, which are spoken back to the human via text-to-speech,” Lynch said. “The same model is responsible for deciding which learned, closed-loop behavior to run on the robot to fulfill a given command, loading particular neural network weights onto the GPU and executing a policy.”
Lynch explained that Figure 01 is designed to describe its surroundings concisely, and can apply “common sense” for decisions, like inferring dishes will be placed in a rack. It can also parse vague statements, such as hunger, into actions, like offering an apple, all the while explaining its actions.
The debut sparked a passionate response on Twitter, many people impressed with the capabilities of Figure 01—and more than a few adding it to the list of mileposts on the way to the singularity.
Please tell me your team has watched every Terminator movie,” one replied.
“We gotta find John Connor as soon as possible,” another added.
For AI developers and researchers, Lynch provided a number of technical details.
“All behaviors are driven by neural network visuomotor transformer policies, mapping pixels directly to actions,” Lynch said. “These networks take in onboard images at 10hz and generate 24-DOF actions (wrist poses and finger joint angles) at 200hz.”
Figure 01’s impactful debut comes as policymakers and global leaders attempt to grapple with the proliferation of AI tools into the mainstream. While most of the discussion has been around large language models like OpenAI’s ChatGPT, Google’s Gemini, and Anthropic’s Claude AI, developers are also looking for ways to give AI physical humanoid robotic bodies.
Figure AI and OpenAI did not immediately respond to Decrypt’s request for comment.
“One is a sort of utilitarian objective, which is what Elon Musk and others are striving for,” UC Berkeley Industrial Engineering Professor Ken Goldberg previously told Decrypt. “A lot of the work that’s going on right now—why people are investing in these companies like Figure—is that the hope is that these things can do work and be compatible,” he said, particularly in the realm of space exploration.
Along with Figure, others working to merge AI with robotics is Hanson Robotics, who in 2016 debuted its Desdemona AI robot.
“Even just a few years ago, I would have thought having a full conversation with a humanoid robot while it plans and carries out its own fully learned behaviors would be something we would have to wait decades to see,” Figure AI’s Senior AI Engineer, Corey Lynch said on Twitter. “Obviously, a lot has changed.”