Transcription

10 features

Convert speech to text using GPU-accelerated Whisper models with speaker diarization.

How It Works

EdgeNote AI uses OpenAI's Whisper models running locally via whisper.cpp for fast, accurate speech-to-text transcription. All processing happens on your device.

GPU Accelerated

Metal (macOS), CUDA (NVIDIA), Vulkan (AMD/Intel) for 5-10x faster transcription.

99 Languages

Large models support automatic language detection across 99 languages.

Speaker Diarization

Identify who spoke when with automatic speaker segmentation.

Whisper Models

Choose a model based on your accuracy needs and available RAM.

ModelSizeRAM RequiredLanguagesNotes
Whisper Tiny
77 MB1 GBEnglishUltra-fast, lower accuracy
Whisper Small
488 MB2 GBEnglishFast with good accuracy
Whisper MediumRecommended
1.5 GB4 GBEnglishExcellent accuracy
Whisper Large V3 Turbo
1.6 GB4 GB99 languagesFast + multilingual
Whisper Large V3
3.1 GB6 GB99 languagesMaximum accuracy

Supported Languages

The Large models support automatic detection and transcription in 99 languages. Some common languages include:

EnglishSpanishFrenchGermanItalianPortugueseChineseJapaneseKoreanArabicRussianHindi+ 87 more

Language Detection

Large models automatically detect the spoken language. You can also manually specify the language in Settings for faster processing when you know the language in advance.

Speaker Diarization

Speaker diarization identifies different speakers in a recording and labels who said what. This is especially useful for meetings with multiple participants.

Automatic Detection

EdgeNote AI automatically detects when speakers change, even without prior training.

Speaker Labels

Speakers are labeled as "Speaker 1", "Speaker 2", etc. You can rename them in the Speaker Management screen.

Transcription with Speaker Segments
Transcription showing different speakers with timestamps

Transcription Output

The transcription includes timestamps and speaker information for easy navigation.

Complete Transcription View
Full transcription with timestamps, speakers, and navigation

Advanced Settings

Fine-tune transcription behavior in Settings > Advanced:

CPU Threads

Increase the number of CPU threads for faster CPU-mode transcription. Default is auto-detected based on your system.

VAD (Voice Activity Detection)

Automatically detect and skip silent portions of audio for faster processing.

Auto-Transcribe

Automatically start transcription when a recording ends. Disable to manually trigger transcription.