This project is an interactive chatbot built as part of an interview exercise, showcasing a modern Contextual RAG (Retrieval-Augmented Generation) pipeline.
It demonstrates document ingestion, vector storage, local LLM inference, dynamic routing, evaluation, and monitoring — all running locally using open-source tools.
- Supports PDFs, DOCX, and other document formats.
- Stored in PostgreSQL (
ragdb) with PGVector extension for embeddings.
- LlamaIndex used to build retrievers and query engines.
- Ollama models (
nomic-embed-text+llama3.2:3b) for embeddings and generation. - Dynamic Router: Chooses between RAG and direct LLM responses (for chit-chat / non-document queries).
- FastAPI backend with
/chat/completionsendpoint. - Compatible with Open WebUI for interactive chatbot usage.
- RAGAs: Evaluates precision, recall, faithfulness, and answer relevance.
- Arize Phoenix: Observability for prompts, RAG pipeline monitoring, and agent tracing.
- Ready for Crew.AI-based prompt optimization, rerankers, and agent orchestration.
- LLM & Embeddings: Ollama (
nomic-embed-text,llama3.2:3b) - RAG Framework: LlamaIndex
- Database: PostgreSQL with PGVector
- Backend: FastAPI
- Evaluation: RAGAs
- Tracing & Monitoring: Arize Phoenix
- Chatbot interface: Open WebUI
-
Clone the repository
git clone https://github.com/pmaske-aihub/rag-application.git cd rag-application -
Install Prerequisites
ollama pull llama3.2:3b ollama pull nomic-embed-text
Ensure Postgres is installed, and enable
pgvector.CREATE DATABASE ragdb; \c ragdb CREATE EXTENSION IF NOT EXISTS vector;
Ensure that Open WebUI is installed and running either locally or via Docker Desktop. See How to install Open WebUI
-
Setup Environment
python -m venv venv .\venv\Scripts\activate # Windows source venv/bin/activate # Linux/Mac pip install -r requirements.txt
-
Run the FastAPI backend
uvicorn src.api:app --host 0.0.0.0 --port 5601 --workers 4
On the web broweser, access http://localhost:3000. This will open an Open WebUI interface. Go to Admin Panel > Settings > Connections and add locally running FastAPI app.
Create a new Workflow and select model as llama3.2.3b
Select New Chat and switch to Custom RAG Pipeline
For quick test, use SwaggerUI which can be accessed on http://localhost:5601/docs
Example Usage
POST /chat/completions
{
"model": "llama3.2:3b",
"messages": [
{
"role": "user",
"content": "Based on the penalties section, what are the different levels of disciplinary actions?"
}
]
}python src/evaluate_rag.pyOutputs per-sample metrics: Context Precision / Recall, Faithfulness, Answer Relevancy.
On the web browser, visit http://localhost:6006. This will open Phoenix dashboard and will show Query pipeline traces, Latency breakdown and Prompt optimizations.
- Crew.AI: Multi-agent prompt optimization.
- Rerankers (e.g., Cohere / bge-reranker).
- Docker deployment for portability.
This project was built as part of an interview technical exercise, showcasing an end-to-end RAG application with monitoring, evaluation, and extensibility in mind.