withcatai/node-llama-cpp

🏗️ Frameworkwithcatai

Run local LLMs in Node.js with llama.cpp bindings, featuring native JSON schema enforcement and GPU acceleration.

node-llama-cpp bridges the gap between high-performance C++ inference engines and the Node.js ecosystem. By leveraging llama.cpp, it allows developers to execute GGUF models directly on local hardware, utilizing hardware acceleration via CUDA, Metal, or Vulkan for optimal throughput. A standout feature is its ability to enforce JSON schemas during the token generation process, which prevents hallucinated formatting and guarantees that the model's output adheres to specific data structures. This makes it an ideal choice for building reliable AI agents, automated data extraction pipelines, and local chatbots. The library includes prebuilt binaries for easy installation, supports embedding generation, and provides robust function-calling capabilities, making it a comprehensive toolkit for production-grade local AI integration in the Node.js runtime.

💡Highlights

├─Native JSON schema enforcement
├─CUDA, Metal, and Vulkan support
└─Prebuilt binaries for easy setup

🎯For

├─Node.js Backend Developers
└─AI Engineers

🔗Links

└─GitHub Repository