Google DeepMind Unveils Next-Gen AI Agent SIMA 2

Photo of author

By Global Team

Google DeepMind has unveiled the next-generation AI agent, SIMA 2, accelerating the competition in general AI technology.

SIMA 2 integrates Gemini, expanding its understanding and reasoning abilities in various environments, and significantly enhances its capability to perform complex tasks. The research team explained that SIMA 2 has a structure that allows it to improve performance by learning from its experiences.

Google DeepMind first introduced SIMA 1 in March last year. At the time, SIMA 1 was trained on diverse 3D game data to follow instructions in various virtual environments. However, the completion rate of complex tasks was only 31%, highlighting its limitations. In response, DeepMind embarked on the development of SIMA 2 to address these issues.

Joe Marino, a senior researcher at DeepMind, explained that the performance of SIMA 2 is double that of SIMA 1. He noted that SIMA 2 completes complex tasks even in previously unseen environments.

SIMA 2 has improved performance compared to SIMA 1
SIMA 2 has improved performance compared to SIMA 1

Marino emphasized that the most significant change in SIMA 2 is its ability to perform complex tasks even in unfamiliar environments. He explained that SIMA 2’s self-improvement structure, which enhances capabilities based on experience, can be seen as a step forward in general AI research.

DeepMind highlighted the concept of an Embodied Agent. The term “embodied” means “implemented in a state with a body.” An Embodied Agent performs tasks while directly observing and interacting with the environment, like a robot. In contrast, a non-embodied agent operates within a screen, managing tasks like scheduling or memo management. According to the research team, the ability of an Embodied Agent to work in various environments is a key concept in general AI research.

Jane Wang, a senior researcher at DeepMind, stated that SIMA 2’s improvements extend beyond mere gaming capabilities. She explained that SIMA 2 is designed to understand phenomena and handle user requests in a commonsense manner.

Source: Google DeepMind
Source: Google DeepMind

During a demonstration, DeepMind showcased SIMA 2’s behavioral approach. In ‘No Man’s Sky,’ SIMA 2 described the surrounding environment and recognized a distress beacon, determining its path accordingly. The research team noted, “The agent observed the environment and decided the next action on its own.”

SIMA 2 combines language with color and object information to draw inferences. The research team demonstrated a command to “move to the house with the ripe tomato color.” SIMA 2 linked the color of tomatoes to ‘red’ and selected the target accordingly. The design revealed the internal inference process clearly.

SIMA 2 also executed emoji-based commands. Marino explained, “When the emojis for an axe () and a tree () were input, SIMA 2 performed the task of chopping the tree.”

DeepMind further reported that in the photo-like virtual world created by the generative model Jeanie, SIMA 2 identified and interacted with objects such as benches, trees, and butterflies.

Shifting from a dependence on human data, SIMA 2 emphasizes self-learning. Unlike SIMA 1, which relied on human data, SIMA 2 is structured for self-learning. Initially trained with human play data, SIMA 2 then generates new tasks in new environments based on Gemini. A reward model evaluates its actions, and the agent learns from this feedback to improve its performance.

The process of task generation by SIMA 2
The process of task generation by SIMA 2

DeepMind explained that this approach allows the agent to expand its abilities through trial and error, moving away from a setup where humans need to provide all the data. The autonomy in learning is a meaningful shift.

Frederick Bes, a senior research engineer at DeepMind, stated that SIMA 2 is closer to the high-level decision-making and task comprehension required for actual robotics work. He explained that, for a robot to move to a specific place, it first needs to understand concepts of targets and spatial information, which SIMA 2 addresses.

DeepMind has not yet set a date for applying SIMA 2 in actual robotic systems. The company is developing a separate robot-based model. DeepMind announced that the preview release of SIMA 2 aims to assess potential usability and collaborative feasibility.

Leave a Comment