Audio Chunking for Long-Form Transcription: Splitting and Stitching with ffmpeg + TypeScript

APIs that do speech-to-text — Groq Whisper, OpenAI Whisper, and friends — all have one thing in common: a file size limit. Groq's hard cap is 25MB . A typical one-hour interview at decent quality can easily be 80–150MB. If you just try to send that, you'll get a 413 or a rate-limit error before the transcription even starts. The fix is chunking: split the audio into manageable pieces, transcribe each one, then stitch the results back together — with correct timestamps . That last part is where most implementations go wrong. Here's the approach I landed on, built around ffmpeg and TypeScript. The Strategy if file < 24MB → send directly (fast path) else → chunk into 20-min segments at 32kbps mono → transcribe each → stitch The 20-minute / 32kbps combination keeps each chunk well under 5MB, which gives plenty of headroom below the 25MB limit regardless of source format.

Audio Chunking for Long-Form Transcription: Splitting and Stitching with ffmpeg + TypeScript

Related Articles

Solod: Go can be a better C

The Easy Way To Design Graphics Without Experience

Why craft-lovers are losing their craft

Folks, We’re Not in Kansas Anymore: Jensen Huang and the Redefinition of Intelligence

Flores amarillas

Related Articles

News
Solod: Go can be a better C
Lobsters • 3h ago

News
The Easy Way To Design Graphics Without Experience
Medium Programming • 3h ago

News
Why craft-lovers are losing their craft
Lobsters • 3h ago

News
Folks, We’re Not in Kansas Anymore: Jensen Huang and the Redefinition of Intelligence
Medium Programming • 3h ago

News
Flores amarillas
Dev.to • 3h ago