Microsoft's new AI model sees Mona Lisa rap

Microsoft has unveiled a new AI model that can generate a realistic video of a human face, talking, from an image and an audio clip.

Martin Crowley
April 22, 2024

Microsoft researchers have revealed a new, experimental AI model, called VASA-1 (which stands for "visual affective skills”), that can mimic people's facial movements and expressions, releasing a clip–generated by the model–of the Mona Lisa rapping Lady Ga Ga’s song “Paparazzi” to demonstrate the model’s capabilities.

What can VASA-1 do?

VASA-1 can take a still image or drawing of a person, pair it with a speech audio clip, and (almost) instantaneously generate video footage of their face, realistically mimicking their facial expressions and head motions, synchronizing their lips so it looks and sounds like they’re talking or singing.

To create these realistic facial nuances, Microsoft used text-to-image models, like OpenAI’s DALLE-E-3, head movement generation models, and numerous video samples. As a result, VASA-1 can make faces move naturally, look in different directions, and show various emotions, and it can handle images, audio, and non-English speech, without any training.

What will VASA-1 be used for?

Microsoft researchers believe that VASA will pave the way for “real-time engagements with life-like avatars that emulate human conversational behaviors" and can be used for things like teaching, or offering companionship or therapeutic support for those who need it, providing a “person” for someone to talk to.

When will VASA-1 be available to the public?

Microsoft has revealed it currently has "no plans to release an online demo, API, product, or any related offerings” because they’re concerned it could be used irresponsibly to create misleading deep-fakes (especially with global elections looming).

“We are opposed to any behavior to create misleading or harmful contents of real persons, and are interested in applying our technique for advancing forgery detection.”

They’ve established that VASA-1 will not be available for public use “until we are certain that the technology will be used responsibly and in accordance with proper regulations."

This responsible stance over AI safety comes after, earlier this week, Microsoft took its WizardLM-2 AI model offline, within a day of its launch, because the developers failed to complete toxicity testing before its release.