NetEase Youdao announced the formal release of the “Yi Mo Sheng”: An open-source text-to-speech (TTS) engine. It is available on GitHub. The web and script interfaces it offers make it possible to generate results in batches, making it ideal for applications requiring emotional synthesis of timbres.
Youdao created this text-to-speech engine. It presently has more than 2,000 timbres and supports both Chinese and English. It also contains a one-of-a-kind emotion synthesis feature that may create feelings of joy, excitement, sadness, or anger. And a plethora of expressive vocalizations.
Regarding open-source text-to-speech engines, EmotiVoice is at the top of the game. EmotiVoice has over 2000 unique voices and can converse in English and Chinese. The most noticeable function is emotional synthesis, allowing you to generate speech with a wide spectrum of emotions, including happiness, eagerness, sadness, furiousness, and others.
There is a user-friendly online interface available. The findings can be generated in bulk via a scripting interface. Docker images make it simple to test out EmotiVoice. A computer with an NVidia graphics processing unit is required. Install the NVidia container toolkit on Linux or Windows WSL2 if you haven’t already.
In the current system, prompts manage how a user feels or acts. It disregards gender in favor of emphasis on tone, tempo, intensity, and passion. A style/timbre controller, like the original closed-source design, can be added rather easily.
Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.