Back to articles
Audio Chunking for Long-Form Transcription: Splitting and Stitching with ffmpeg + TypeScript

Audio Chunking for Long-Form Transcription: Splitting and Stitching with ffmpeg + TypeScript

via Dev.tonareshipme

APIs that do speech-to-text — Groq Whisper, OpenAI Whisper, and friends — all have one thing in common: a file size limit. Groq's hard cap is 25MB . A typical one-hour interview at decent quality can easily be 80–150MB. If you just try to send that, you'll get a 413 or a rate-limit error before the transcription even starts. The fix is chunking: split the audio into manageable pieces, transcribe each one, then stitch the results back together — with correct timestamps . That last part is where most implementations go wrong. Here's the approach I landed on, built around ffmpeg and TypeScript. The Strategy if file < 24MB → send directly (fast path) else → chunk into 20-min segments at 32kbps mono → transcribe each → stitch The 20-minute / 32kbps combination keeps each chunk well under 5MB, which gives plenty of headroom below the 25MB limit regardless of source format.

Continue reading on Dev.to

Opens in a new tab

Read Full Article
2 views

Related Articles