unsloth/Qwen3.6-35B-A3B-GGUF

🧠 AI Modelunsloth

GGUF-quantized 35B MoE multimodal model by Qwen, optimized by Unsloth for efficient local inference.

This repository provides GGUF-format quantized builds of Qwen's Qwen3.6-35B-A3B, a large multimodal (image-text-to-text) Mixture-of-Experts model. The 'A3B' designation indicates approximately 3 billion active parameters per forward pass, drawn from a larger 35B parameter pool, balancing capability with inference efficiency. Unsloth, known for producing fast and memory-efficient model conversions, has packaged multiple GGUF quantization levels (e.g., Q2_K through Q8_0) to let users trade off VRAM usage against quality. The model is licensed Apache 2.0 and supports the transformers ecosystem, making it compatible with llama.cpp, Ollama, LM Studio, and other GGUF-aware runtimes. Its multimodal capability enables vision-language understanding alongside text generation, and the MoE architecture allows faster inference than a dense 35B model while retaining high-quality outputs.

💡Highlights

├─35B MoE, ~3B active per token
├─Multimodal image-text-to-text
├─Multiple GGUF quant sizes
└─Apache 2.0, Unsloth optimized

🎯For

├─AI researchers
├─Local LLM users
└─Multimodal app developers

🔗Links

└─Hugging Face Model