
I built a pay-per-use video transcription tool with Next.js and Whisper — here's the full breakdown
Why I built this I kept running into the same frustration: I had a video file and I needed the text inside it. The available options were either too manual (typing it yourself), too unreliable (YouTube auto-captions), or too expensive for occasional use (most transcription SaaS tools charge a flat monthly fee regardless of how much you actually transcribe). So I built Tonivox — a web app that accepts a video file, extracts the audio, runs it through a transcription model, and returns the full text. Pay per transcription, no subscription. This post covers the technical decisions, the problems I ran into, and what I'd do differently. The stack Next.js 15 (App Router) — frontend and API routes Prisma + PostgreSQL — data layer Better Auth — authentication (email/password + email verification) Stripe — credit purchases via Checkout Sessions OpenAI Whisper — transcription model FFmpeg — audio extraction from video files Tailwind CSS — styling How it works The flow is straightforward: User up
Continue reading on Dev.to Webdev
Opens in a new tab




