unsloth/Qwen3.6-35B-A3B-GGUF
🧠 AI Modelunsloth
GGUF-quantized 35B MoE multimodal model by Qwen, optimized by Unsloth for efficient local inference.
This repository provides GGUF-format quantized builds of Qwen's Qwen3.6-35B-A3B, a large multimodal (image-text-to-text) Mixture-of-Experts model. The 'A3B' designation indicates approximately 3 billion active parameters per forward pass, drawn from a larger 35B parameter pool, balancing capability with inference efficiency. Unsloth, known for producing fast and memory-efficient model conversions, has packaged multiple GGUF quantization levels (e.g., Q2_K through Q8_0) to let users trade off VRAM usage against quality. The model is licensed Apache 2.0 and supports the transformers ecosystem, making it compatible with llama.cpp, Ollama, LM Studio, and other GGUF-aware runtimes. Its multimodal capability enables vision-language understanding alongside text generation, and the MoE architecture allows faster inference than a dense 35B model while retaining high-quality outputs.
💡Highlights
- ├─35B MoE, ~3B active per token
- ├─Multimodal image-text-to-text
- ├─Multiple GGUF quant sizes
- └─Apache 2.0, Unsloth optimized
🎯For
- ├─AI researchers
- ├─Local LLM users
- └─Multimodal app developers