Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
-
Updated
Nov 6, 2023 - Python
Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A
Wrapper for simplified use of Llama2 GGUF quantized models.
Privacy-focused RAG chatbot for network documentation. Chat with your PDFs locally using Ollama, Chroma & LangChain. CPU-only, fully offline.
Self-hosted Anthropic API Compatible Inference Server with Claude Code support, Interleaved Thinking, and HuggingFace Spaces deployment
CPU-first, turn-aware local voice assistant with multiprocessing, streaming STT→LLM→TTS, and interruption-safe orchestration.
🤖 AI Text Completion App built with Streamlit and Llama-3.2-1B. Generate creative text completions with an intuitive web interface. GPU & CPU optimized, easy to deploy, perfect for content creation and AI experimentation.
Un sistema RAG per chattare con documenti locali usando Foundry e modelli LLM su CPU
Personal project. Local RAG chatbot (personal project) using Mistralv0.2/TinyLlama with TF-IDF retrieval. Streamlit interface for CPU-optimized inference without GPU requirements.
FastAPI service for car damage detection and damage type classification using PyTorch
High-performance facial landmark detection and tracking library by Deepixel. CPU-only, real-time inference using TensorFlow Lite, OpenCV, and DeepCore. Outputs 106 facial landmarks with head pose estimation and Python API support.
🚀 Advanced offline AI assistant with one-click installation. Complete privacy, no GPU required. Built by RaxCore.
Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.
To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."