Voice AI company Speechify just launched a native Windows app that employs locally stored models to enable dictation across apps, and reading aloud articles, documents, or PDFs using its library of voices.
The company is taking on the likes of Wispr Flow, Willow, and Superwhisper who also provide dictation and transcription apps across platforms.
Speechify said Windows app does voice processing entirely on-device on Copilot+ PCs (that have NPUs from AMD, Intel, and Qualcomm) and other Windows 11 PCs that have GPUs from Intel and AMD.
The app has three models running on-device: neural text-to-speech, real-time voice activity detection, and Whisper-powered transcription. Users can configure the app to switch to cloud-based models or even change them during usage.
The company, which has over 50 million users, said that VITS Neural can generate audio across seven different speed presets, allowing users to have the app read aloud documents or web pages. The company uses the Silero open-source model for voice activity detection.
“Over a billion people on this planet use Windows. With this Windows launch, we’re making sure that reading, and now writing, is never a barrier, no matter what device you use or how you prefer to work. We’re especially excited about the opportunity in the enterprise given how many professionals have asked for Speechify on their PCs,” said Cliff Weitzman, founder and CEO of Speechify, in a statement.
Last month, the company launched Granola-like meeting transcription, but that feature was limited to browser-based meetings. Now that the company has apps across platforms, it will likely bring over this feature to native apps to transcribe meetings on any app or browser.
Techcrunch event
San Francisco, CA
|
October 13-15, 2026
Until a few years ago, Speechify largely concentrated on text-to-speech use cases such as reading out articles and emails, and generating podcasts out documents. Lately, the company has been trying to become a full-stack voice app for users by launching dictation, meeting transcription, and a voice assistant.