mlc-ai/web-llm-chat

📦 Open Source Projectmlc-ai

Run large language models natively in your web browser using WebGPU for private, server-free AI chat.

Web-LLM-Chat represents a significant advancement in edge AI, utilizing the WebLLM engine to bring sophisticated generative AI capabilities to the browser environment. By tapping into the client's hardware via WebGPU, the application achieves efficient inference without requiring external API calls or server infrastructure. This architecture is particularly innovative for privacy-conscious developers and users, as all model weights and conversation data remain local to the user's machine. The project is built with TypeScript and Next.js, offering a familiar developer experience for those looking to integrate local LLMs into web applications. It supports a wide range of model architectures, including Qwen, Phi-2, and TinyLlama, allowing for flexible deployment options based on the user's hardware capabilities. The project's modular design makes it an ideal starting point for building privacy-first AI interfaces that work offline and scale across modern browsers.

💡Highlights

├─Native WebGPU model acceleration
├─Zero-server privacy architecture
└─Supports Llama, Mistral, and Gemma

🎯For

├─Web Developers
├─Privacy Advocates
└─AI Researchers

🔗Links

└─GitHub Repository