Fish Audio has officially launched Fish Speech 1.4, an advanced iteration of its powerful text-to-speech (TTS) model. With the release, Fish Audio aims to democratize cutting-edge voice technology by making it more accessible to developers, researchers, and businesses worldwide. The latest version of Fish Speech significantly enhances its predecessor by expanding the training data, adding support for more languages, and offering a more streamlined and flexible user experience. It is now fully open-source, reinforcing the company’s mission of providing open access to high-performance voice technology.
Expanded Training Data and Language Support
One of the most notable advancements in Fish Speech 1.4 is its substantial increase in training data. The model has been trained on 700,000 hours of multilingual audio data, a significant leap from the 200,000 hours used in previous versions. This expanded dataset strengthens the model’s ability to handle various voices, accents, and languages more accurately and naturally.
Fish Speech 1.4 also introduces support for eight languages, enhancing its versatility in global applications. These languages include English, Chinese, German, Japanese, French, Spanish, Korean, and Arabic. The model’s proficiency in these languages is reflected in the large-scale training data: 300,000 hours each for English and Chinese and 20,000 hours for the other six languages. This extensive dataset allows the model to provide high-quality text-to-speech conversion across these languages, catering to a broad audience across different regions.
Key Features of Fish Speech 1.4
Fish Speech 1.4 stands out for its robust features that meet its users’ diverse needs. A key highlight is its lightning-fast TTS capabilities with ultra-low latency, making it suitable for real-time applications such as live broadcasting, gaming, and interactive voice response systems. This ensures that users experience minimal delay, maintaining smooth interactions and consistent performance.
In addition to its speed, the model now supports instant voice cloning, allowing users to replicate specific voices almost instantaneously. This feature has wide-reaching applications, from media production and content creation to customer service and personalized communication. Fish Speech 1.4 provides a scalable and efficient solution for voice cloning by enabling accurate voice replication with minimal data.
Another benefit of Fish Speech 1.4 is its flexibility in deployment. Users can self-host the model on their servers or use Fish Audio’s cloud service. This dual approach gives users control over their implementation, allowing them to choose between maintaining local infrastructure for privacy and performance or leveraging the convenience and scalability of cloud-based services.
Open-Source and Accessible
The fully open-source nature of Fish Speech 1.4 sets it apart from many other proprietary voice models. By providing open access to its model, Fish Audio empowers developers and researchers to innovate, experiment, and customize their TTS systems. The open-source model also facilitates the adoption of Fish Speech in educational and research settings, where access to high-performance technology is crucial for advancing voice-based applications.
Fish Audio has introduced a simple, flat-rate pricing model for users who opt for the cloud service. This pricing structure is designed to be straightforward and predictable, making it easier for businesses to plan and manage their voice technology expenses without unexpected costs or usage limits.
Conclusion
Fish Speech 1.4 is a landmark release in text-to-speech technology, combining expanded language support, faster performance, and open-source accessibility. With its cutting-edge features and commitment to making advanced voice technology available to all, Fish Audio is paving the way for more innovative and inclusive applications of TTS in industries ranging from media to customer service and beyond. The release of Fish Speech 1.4 reaffirms Fish Audio’s position as a leader in voice technology, continually pushing the boundaries of what is possible with text-to-speech solutions.
Check out the Model, Demo, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 50k+ ML SubReddit
Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.