◀ RETURN TO CONSOLE

TRANSCRIPTION TOOL

A desktop application built with Claude Code that records audio and transcribes it locally using OpenAI's Whisper model — no cloud API required.

HOW IT WORKS

The tool captures microphone audio through a tkinter GUI, saves recordings as WAV files, and runs them through a local Whisper speech-recognition model for transcription. Everything runs on your machine — no data leaves your desktop.

The application is structured as four Python modules that separate concerns cleanly:

ARCHITECTURE

main.py — Entry point. Wires together the recorder, transcriber, and UI, then launches the application window.
recorder.py — Audio capture using sounddevice. Manages a state machine (idle/recording/paused), streams microphone input into memory, and writes timestamped WAV files via scipy.
transcriber.py — Loads the openai/whisper-base model through Hugging Face Transformers on first use. Converts int16 audio to float32, runs the speech-recognition pipeline, and saves transcripts as text files.
ui.py — A tkinter interface with Record, Pause, and Stop buttons and a scrollable transcript display. Runs transcription in a background thread to keep the UI responsive.
config.py — Central settings: sample rate (16kHz), model name, and output directories for recordings and transcripts.

DEPENDENCIES

sounddevice — cross-platform audio I/O
transformers + torch — Hugging Face pipeline and PyTorch backend for Whisper
scipy + numpy — WAV file I/O and audio array processing
tkinter — GUI framework (included with Python)

RUNNING IT

pip install -r requirements.txt python main.py

The Whisper model downloads automatically on first run (~150 MB). Recordings are saved to recordings/ and transcripts to transcripts/.