lmstudio-community/gemma-4-E4B-it-MLX-8bit

🧠 AI Modellmstudio-community

Optimized 8-bit MLX quantization of Google's Gemma 4-E4B model for efficient local execution on Apple Silicon.

This model represents a specialized port of the Google Gemma 4-E4B-it architecture, converted into the MLX format with 8-bit weight quantization. MLX is Apple's machine learning framework designed specifically for efficient execution on Apple Silicon (M-series chips). By applying 8-bit quantization, the model achieves a balance between computational efficiency and output quality, allowing users to run complex image-text-to-text tasks with lower VRAM requirements compared to full-precision weights. The model supports the 'any-to-any' pipeline, facilitating versatile multimodal interactions. It is packaged using safetensors for secure and fast loading, ensuring compatibility with the broader Hugging Face ecosystem and local inference tools like LM Studio. This release is particularly significant for researchers and developers who need to integrate state-of-the-art multimodal capabilities into local applications without relying on cloud-based API endpoints.

💡Highlights

├─8-bit MLX optimized quantization
├─Supports any-to-any multimodal tasks
└─Native Apple Silicon acceleration

🎯For

├─AI Developers
└─Apple Silicon Users

🔗Links

└─Hugging Face Repository