inclusionAI: Ling-2.6-flash
🧠 AI Modelinclusionai
Efficient 104B MoE model with 7.4B active params and 262k context for fast agents.
Ling-2.6-flash is a Mixture-of-Experts (MoE) style model from inclusionAI, optimized for agent workloads. With 104B total parameters and only 7.4B activated per token, it achieves a sweet spot between capability and inference speed. The model boasts a 262,144 token context window, making it suitable for long-form reasoning and memory-intensive tasks. It supports structured outputs, repetition penalties, and other inference controls. Benchmarks indicate strong performance on mathematical reasoning and general knowledge tasks. The pricing is extremely competitive: $0.01 per million input tokens and $0.03 per million output tokens, making it one of the most cost-effective models for high-volume agent deployments. Input and output modalities are text-only.
💡Highlights
- ├─104B total, 7.4B active per token
- ├─262K token context window
- └─$0.01/M input, $0.03/M output
🎯For
- ├─AI agent developers
- ├─Cost-conscious enterprises
- └─Researchers in efficient inference