Text chat with large language models, such as ChatGPT, Google Bard, or MLC LLM, offers various options for engaging with AI-driven conversations. The next advancement in AI involves integrating the power of LLMs into non-player characters (NPCs) in video games, enabling dynamic and open-ended interactions instead of pre-scripted dialogue.
During the Computex 2023 keynote, Nvidia CEO Jensen Huang unveiled ACE for Games, an AI model foundry service that brings game characters to life through natural language conversations, audio-to-facial-expression capabilities, and text-to-speech/speech-to-text functionalities. In a game demo, an NPC named Jin, the owner of a ramen noodle shop, interacted with a human player, providing realistic answers that matched Jin's backstory.
In the demonstration, the player, named Kai, entered Jin's Ramen shop and engaged in a voice conversation. Kai asked how Jin was doing, and they discussed the high crime rate in the area. Kai offered to help, and Jin responded by mentioning rumors about a powerful crime lord named Kumon Aoki, who might be behind the violence. Kai asked about Aoki's whereabouts, and Jin provided the information, setting Kai on their quest.
Huang emphasized that AI would not only contribute to rendering and synthesizing game environments but would also play a significant role in animating characters. He highlighted the importance of AI in the future of video games.
Nvidia ACE for Games provides access to three existing components. The first is Nvidia NeMo, an AI framework for training and deploying LLMs. It includes NeMo Guardrails, a feature designed to prevent inappropriate or "unsafe" AI conversations, ensuring NPCs don't respond to inappropriate or irrelevant prompts. Guardrails also enhance security measures to prevent malicious tampering with the NPCs.
Nvidia Riva is Nvidia's solution for speech-to-text and text-to-speech capabilities. In the ACE for games workflow, a player's voice question is converted to text by Riva and fed to the LLM. The LLM generates a text response, which Riva converts back into speech for the player. Text-based responses are also displayed in the game. Nvidia Riva's speech-to-text and text-to-speech capabilities can be tested on Nvidia's website.
The final component in the ACE for games workflow is Nvidia Omniverse Audio2Face, enabling characters to display facial expressions that match their dialogue. Currently in beta, this product can be tested on Nvidia's website.
The demo, named Kairos, was created by Convai, an AI-in-gaming startup associated with Nvidia's Inception program, which connects emerging companies with venture capital. Convai offers a toolset on its website for game developers to create lifelike NPCs with complex backstories.
The capabilities and tools provided by Convai are showcased in an explanatory video, illustrating players interacting with NPCs and instructing them to interact with in-game objects and other characters. For instance, players can ask an NPC to hand them a gun from a table, and the NPC will comply. These tools from Convai enable such immersive interactions in games.
Having contextual awareness within the game is crucial for NPCs. In contrast, a Minecraft AI plugin was tested where players could converse with NPCs, but the NPCs lacked situational awareness. For instance, players could continue a conversation with a sheep even after killing it, as the NPC was unaware of its own demise.
This article is written by Gaurav Advit.