inclusionAI: Ling-2.6-flash

🧠 AI Modelinclusionai

Efficient 104B MoE model with 7.4B active params and 262k context for fast agents.

Ling-2.6-flash is a Mixture-of-Experts (MoE) style model from inclusionAI, optimized for agent workloads. With 104B total parameters and only 7.4B activated per token, it achieves a sweet spot between capability and inference speed. The model boasts a 262,144 token context window, making it suitable for long-form reasoning and memory-intensive tasks. It supports structured outputs, repetition penalties, and other inference controls. Benchmarks indicate strong performance on mathematical reasoning and general knowledge tasks. The pricing is extremely competitive: $0.01 per million input tokens and $0.03 per million output tokens, making it one of the most cost-effective models for high-volume agent deployments. Input and output modalities are text-only.

💡Highlights

├─104B total, 7.4B active per token
├─262K token context window
└─$0.01/M input, $0.03/M output

🎯For

├─AI agent developers
├─Cost-conscious enterprises
└─Researchers in efficient inference

🔗Links

└─OpenRouter Model Page