pyannote/segmentation

🧠 AI Modelpyannote

Pyannote voice activity detection model for speech segmentation and speaker turns.

The pyannote/segmentation model is a neural network designed for voice activity detection (VAD) and speaker segmentation. Built on PyTorch and part of the pyannote-audio toolkit, it processes raw audio waveforms to output segment boundaries and speaker identity. The architecture may involve SincNet or similar convolutional layers, trained on large multi-speaker datasets. This gated model requires acceptance of a license before use. Key features include high accuracy in overlapping speech, real-time inference capability, and seamless integration into diarization pipelines. It has been downloaded over 2.6 million times and received 678 likes on Hugging Face.

💡Highlights

├─2.6M+ downloads on Hugging Face
├─State-of-the-art VAD for speech
└─Part of pyannote-audio toolkit

🎯For

├─Audio researchers
├─Speech recognition engineers
└─Voice application developers

🔗Links

└─Model Hub