Microsoft has released a new artificial intelligence (AI) model that can create hyper-realistic films of talking human faces. The AI image-to-video model, named VASA-1, can create movies from a single photo and a voice audio clip. The business claims that the generated movies will include synced lip motions to match the soundtrack, facial expressions, and head movements to make them look real. Notably, the tech behemoth intends to produce something other than a product or API using the VASA-1 paradigm, claiming that it will be used to construct realistic virtual avatars.
The AI technology is also stated to enable online video production.
In a post on its Research announcement website, Microsoft described the workings of its under-development AI model and emphasised its possibilities. According to the manufacturer, the VASA-1 model can make films with a resolution of 512 x 512p at up to 40 frames per second. The AI technology is also stated to enable online video production with low beginning latency. Kaio Ken, an X (previously known as Twitter) user, released a video of the AI model.
While the most notable feat of VASA-1 is the capacity to render up to one-minute-long films (as demonstrated) in excellent quality with a single static picture, the business also emphasised its ability to produce lip motions that correspond to the audio file and facial expressions to accompany it. The AI video creation model also gives the user granular control over several movie features, including primary eye gaze direction, head distance, emotion offsets, and more. These attribution options over disentangled appearance, 3D head posture, and face dynamics can assist in tailoring the result to the user's specifications.
Furthermore, the AI model created films using creative pictures, singing sounds, and non-English speech. Microsoft researchers point out that the aptitude for these functions was not contained in its data, implying that it can learn independently.
AI model's hyperrealistic video production of actual individuals raises concerns about its unethical use, particularly for deep fakes.
The AI model's hyperrealistic video production of actual individuals with any audio is astounding, but it raises concerns about its unethical use, particularly for deep fakes. The business stated that it does not want to make the AI model available to the public and instead plans to use it to develop virtual, interactive characters.
Microsoft also stated that this technology might be used to improve forgery detection. "While admitting the likelihood of misuse, we must also recognise our technique's significant good potential. The benefits, from promoting educational fairness to increasing accessibility for those with communication difficulties to providing companionship or therapeutic assistance to those in need, highlight the significance of our research and other relevant investigations. We are committed to creating AI ethically to improve human well-being," the business stated.