AI

Compact Neural TTS Voices for Accessibility

1 Mins read

Contemporary text-to-speech solutions for accessibility applications can typically be classified into two categories: (i) device-based statistical parametric speech synthesis (SPSS) or unit selection (USEL) and (ii) cloud-based neural TTS. SPSS and USEL offer low latency and low disk footprint at the expense of naturalness and audio quality. Cloud-based neural TTS systems provide significantly better audio quality and naturalness but regress in terms of latency and responsiveness, rendering these impractical for real-world applications. More recently, neural TTS models were made deployable to run on handheld devices. Nevertheless, latency remains higher than SPSS and USEL, while disk footprint prohibits pre-installation for multiple voices at once. In this work, we describe a high-quality compact neural TTS system achieving latency on the order of 15 ms with low disk footprint. The proposed solution is capable of running on low-power devices.


Source link

Related posts
AI

Google AI Introduces Gemini Embedding: A Novel Embedding Model Initialized from the Powerful Gemini Large Language Model

3 Mins read
Recent advancements in embedding models have focused on transforming general-purpose text representations for diverse applications like semantic similarity, clustering, and classification. Traditional…
AI

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

1 Mins read
In this paper, we propose a new task – generating speech from videos of people and their transcripts (VTTS) – to motivate…
AI

How GoDaddy built a category generation system at scale with batch inference for Amazon Bedrock

18 Mins read
This post was co-written with Vishal Singh, Data Engineering Leader at Data & Analytics team of GoDaddy Generative AI solutions have the…

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *