Microsoft has announced the public preview release of Azure AI Speech, technology that allows users to create talking avatar videos with text input and build real-time interactive bots using human images.
The text-to-speech avatar is a new feature with vision capabilities that empowers customers to generate synthetic videos of a 2D photorealistic avatar speaking, Azure AI Speech said in a Wednesday (Nov. 15) blog post.
The avatar models are trained using deep neural networks based on human video recording samples, while the voice is provided by a text-to-speech voice model, according to the post.
The text-to-speech avatar can be used for training videos, product introductions, advertisements, virtual sales agents, AI teachers, virtual human resources (HR) assistants, and for other applications and use cases, the post said.
One of the primary reasons for building avatars is to streamline video content creation, per the post. Traditional methods require significant time and budget for shooting and editing. With the text-to-speech avatar, users can simply input text to create videos for their needs.
Additionally, the release of Azure OpenAI Service and neural text-to-speech has made interactive conversations more natural, according to the post. The text-to-speech avatar enables users to create engaging digital interactions, making it ideal for building conversational agents, virtual assistants, chatbots and more.
Azure AI Speech offers two distinct text-to-speech avatar features, the post said. The prebuilt text-to-speech avatar provides out-of-the-box products on Azure, allowing customers to choose from a variety of options for video content or interactive applications. The custom text-to-speech avatar feature enables customers to create personalized avatars for their products or brands by uploading their own video recordings.
Because Microsoft is committed to responsible AI, custom avatar access is limited and available by registration only for certain use cases, per the post. This ensures the protection of individual and societal rights and prevents harmful deepfakes and misleading content, Microsoft said.
In another recent development in this space, Meta unveiled an artificial intelligence (AI) model that performs speech and text translations for nearly 100 languages. The model supports speech recognition, speech-to-text translation, speech-to-speech translation, text-to-text translation and text-to-speech translation, Meta said when announcing the product in August.