SIMA: Google DeepMind's AI that can perform tasks in 3D Games

Google DeepMind has released ground-breaking research that shows an AI agent that can execute a variety of tasks in 3D games that it has never seen before.

Kapish Khajuria
New Update

Google DeepMind has released ground-breaking research that shows an AI agent that can execute a variety of tasks in 3D games that it has never seen before. While DeepMind has previously concentrated on creating artificial intelligence (AI) models for games like chess and goes as well as learning games without explicit rule instruction, this is the first time an AI agent has proven to be capable of comprehending different gaming environments and carrying out tasks using instructions in natural language.


What is Google DeepMind?

Google DeepMind is a London-based artificial intelligence research lab acquired by Google in 2014. It is famous for its work in developing deep learning algorithms and applying them to various domains, including healthcare, gaming, robotics, and more. DeepMind gained widespread attention with its AlphaGo program, which defeated a world champion Go player in 2016, marking a significant milestone in AI research.

Since then, DeepMind has continued to push the boundaries of artificial intelligence, exploring areas such as reinforcement learning, neural network architectures, and multi-agent systems. Its research often focuses on creating algorithms that can learn to perform complex tasks by themselves, with minimal human intervention. DeepMind's ultimate goal is to develop artificial general intelligence (AGI), which would be capable of solving a wide range of intellectual tasks at a human level or beyond.


What exactly does Google DeepMind's SIMA AI do?

By collaborating with game studios like Hello Games (No Man's Sky), Tuxedo Labs (Teardown), and Coffee Stain (Valheim and Goat Simulator 3), DeepMind trained the Scalable Instructable Multiworld Agent (SIMA) on nine distinct games. Furthermore, they utilized four research environments, including one constructed in Unity where agents are tasked with creating sculptures using building blocks.

This approach provided SIMA, described as "a generalist AI agent for 3D virtual settings," with a diverse array of environments, visual styles, and perspectives, ranging from first- to third-person. Each game within SIMA's repertoire presents a unique interactive world with various skills to master, such as navigation, resource mining, spaceship piloting, and item crafting.


How does it function?

DeepMind researchers emphasized that mastering the ability to follow instructions for such tasks across different video game environments could lead to the development of more adaptable AI agents capable of operating effectively in any setting. To train SIMA, researchers observed human gameplay and recorded the keyboard and mouse inputs used to perform actions. This data was then used to train SIMA, which employs "precise image-language mapping and a video model predicting on-screen actions."

As a result, SIMA can comprehend diverse environments and perform tasks to achieve specific objectives. Importantly, SIMA does not require access to a game's source code or API; it operates on commercial versions of games and only needs two inputs: on-screen visuals and user instructions. By using the same keyboard and mouse input method as humans, DeepMind asserts that SIMA can function in virtually any virtual environment. The performance evaluation of SIMA focuses on hundreds of basic skills that can be executed within short timeframes across various categories.


DeepMind's future projects in AI

DeepMind's ultimate objective is to enable agents to perform more complex, multi-stage tasks based on natural-language prompts, such as "find resources and build a camp." Regarding performance, SIMA has shown promising results across multiple training criteria. For instance, an agent trained in all nine games significantly outperformed an agent trained in just one game.

Additionally, an SIMA agent trained on eight games performed nearly as well as an agent trained solely on the ninth game, indicating SIMA's ability to generalize beyond its training. However, language input is crucial for SIMA to achieve true success.

DeepMind notes that although the initial findings are encouraging, more study is necessary to improve SIMA's functionality and generalizability. The team plans to improve the agent's comprehension and capacity to carry out increasingly complicated activities in subsequent rounds. Ultimately, they hope to create artificial intelligence (AI) systems that are safe and capable of carrying out a variety of jobs to support people both online and offline.