ibm-research/PowerMoE-3b

🧠 AI Modelibm-research

IBM Research's efficient 3B Mixture-of-Experts model based on Granite architecture for fast text generation.

PowerMoE-3b is an open-source Mixture-of-Experts language model created by IBM Research, built upon the Granite architecture foundation. The model employs a sparse MoE design where only a subset of expert parameters is activated for each token, enabling significantly more efficient inference compared to dense models of equivalent total parameter count. Key features include: a 3B total parameter count with sparse expert routing, full compatibility with the HuggingFace transformers library, safetensors support, and distribution under the permissive Apache 2.0 license. The model is detailed in the paper arXiv:2408.13359, which explores novel approaches to MoE routing and training efficiency. With over 1 million downloads on HuggingFace, PowerMoE-3b has demonstrated strong community adoption for text-generation workloads requiring a balance between capability and computational efficiency.

💡Highlights

├─3B sparse MoE on Granite arch
├─Apache 2.0 fully open source
├─1M+ HuggingFace downloads
└─Efficient expert routing

🎯For

├─AI researchers
├─NLP developers
└─Enterprise ML engineers

🔗Links

├─HuggingFace Model
└─Research Paper