27.2 C
Accra

Google DeepMind Introduces V2A For Soundtrack and Dialogue Creation

The V2A (Video to Audio) AI model will be able to create music, sound effects, and dialogue to match a video, based on text descriptions made by users

Google’s Artificial Intelligence research lab, DeepMind, is reportedly generating a new AI model that can craft soundtracks and dialogues for videos.

The V2A (Video to Audio) AI model will be able to create music, sound effects, and dialogue to match a video, based on text descriptions made by users. Users can provide specific information on sounds they want to match with a video, V2A will then combine video pixels along with the natural language text prompts to produce the desired soundtracks for footage without such sounds.

V2A works by encoding video input into a compressed representation. it employs a diffusion model to progressively refine audio from random noise, guided by visual cues and natural language prompts. This process yields realistic audio that closely matches the desired output. Finally, the audio is decoded and merged with the video data.

- Advertisement -

According to DeepMind the V2A AI  will be “pairable with video generation models like Veo to create shots with a dramatic score, realistic sound effects or dialogue that matches the characters and tone of a video” 

Join our WhatsApp Channel for more news

The V2A AI model has also been trained with numerous videos and audio along with AI-generated annotations that encapsulate detailed descriptions of sounds and dialogue transcripts. This was done to enable V2A to associate specific sounds with visual scenes.

“By training on video, audio, and the additional annotations, our technology learns to associate specific audio events with various visual scenes, while responding to the information provided in the annotations or transcripts,” DeepMind said. 

- Advertisement -

The V2A AI can additionally produce a virtually limitless array of soundtracks for videos, empowering users to explore different varieties of audio possibilities.

DeepMind further said even though the AI model is not yet widely available, once it’s released, its audio output will be identified with a SynthID watermark, indicating that it is generated by artificial intelligence.

While you're here, we just want to remind you of our commitment to telling the stories that matter the most.Our commitment is to our readers first before anything else.

Our Picks

THE LATEST

INSIDE POLITICS

Get the Stories Right in Your Inbox

OUR PARTNERS

Allafrica.com

MORE NEWS FOR YOU