Case Study
End-to-end system that transcribes interview recordings, extracts key insights, and generates structured analysis reports. Audio processing, speaker diarization, and LLM-powered summarization.
60min audio → report in <2min
90%+ speaker diarization accuracy
Structured scoring across 12 dimensions
Hiring teams spend hours manually reviewing interview recordings, taking notes, and comparing candidates. The process is slow, subjective, and inconsistent.
Audio upload → Whisper transcription with timestamps
Speaker diarization to separate interviewer vs candidate
LLM chain for insight extraction: skills, red flags, culture fit
Structured JSON output for consistent scoring across candidates
FastAPI backend with async processing queue
React dashboard for side-by-side candidate comparison
Speaker diarization accuracy was initially ~70% — improved to 90%+ by combining pyannote with Whisper's word-level timestamps
Designed prompt chains that extract consistent structured data across different interview styles and formats
Handled large audio files (1hr+) by implementing chunked processing with context carry-over
Interested in building something similar?
Let's Talk