Z.ai: GLM 4.6V

🧠 AI Modelz-ai

High-fidelity multimodal model with 128K context for vision and language.

GLM 4.6V is a powerful multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports a context length of up to 131,072 tokens, enabling complex analysis of large documents or multiple images. The model processes image, text, and video inputs and generates text outputs. It includes advanced features such as frequency penalty, reasoning, repetition penalty, response format, and seed control. Benchmarks demonstrate strong performance in visual question answering, document parsing, and multimodal reasoning tasks. This model is part of the GLM series by Z-AI, optimized for efficiency and accuracy.

💡Highlights

├─128K context window
├─Multimodal input: image, text, video
└─Pricing: $0.30/$0.90 per M tokens

🎯For

├─AI researchers
├─multimodal app developers
└─vision-language users

🔗Links

└─View on OpenRouter