Show HN: Three new Kitten TTS models – smallest less than 25MB
Kitten TTS is an open-source series of tiny and expressive text-to-speech models for on-device applications. (We had a thread last year here: https://news.ycombinator.com/item?id=44807868 .) Today we're releasing three new models with 80M, 40M and 14M parameters. The largest model (80M) has the highest quality. The 14M variant reaches new SOTA in expressivity among similar sized models, despite being <25MB in size. This release is a major upgrade from the previous one and supports English text-to-speech applications in eight voices: four male and four female. Demo: https://www.youtube.com/watch?v=ge3u5qblqZA . Most models are quantized to int8 + fp16, and they use ONNX for runtime. All three models are designed to run anywhere eg. raspberry pi, low-end smartphones, wearables, browsers etc. No GPU required! This release aims to bridge the gap between on-device and cloud models for tts applications. Multi-lingual model release is coming soon. On-device AI is bottlenecked by one thing: a
Continue reading on Hacker News
Opens in a new tab



