RAGOps: Production-Grade RAG Platform

Personal Project

2024 – 2025

Problem

Naive RAG systems built on embedding-only retrieval suffer from poor recall on out-of-distribution queries, context pollution from irrelevant chunks, and zero visibility into why a given answer was produced. There was no systematic way to measure or improve retrieval quality over time.

Constraints

Heterogeneous document formats (PDF, markdown, HTML) required a unified ingestion pipeline
Query latency budget: end-to-end response under 3 seconds including reranking
Cost per query had to remain viable for self-hosted, single-tenant use
Evaluation required ground-truth labels — 150 QA pairs curated manually
No managed vector database; pgvector on PostgreSQL to keep the stack minimal

Approach

Replaced single-stage dense retrieval with a three-stage pipeline: (1) dual retrieval combining pgvector ANN search with BM25-style lexical matching, (2) score fusion to merge candidate lists, and (3) a cross-encoder reranker applied to the top-k candidates before passing context to the LLM. Chunking strategy was switched from fixed-size to semantic boundaries to improve chunk coherence. A fallback gate rejects low-confidence queries rather than hallucinating. Evaluation was embedded into the development loop — every pipeline change was measured against the 150-query benchmark before merging.

Architecture

Metrics

Metric	Baseline	Achieved
Recall@10	~58%	~81%
Answer precision (manual)	62%	84%
Irrelevant context rate	31%	11%
Avg query latency	1.1 s	2.4 s (reranker added)
Benchmark queries	0	150 QA pairs

Product Impact

RAGOps functions as a self-hostable knowledge-base Q&A system for domain-specific document corpora. The observability dashboard lets an operator debug retrieval failures without re-running experiments manually. The evaluation framework enables confident iteration — any retrieval change is quantified before deployment, treating the LLM application as infrastructure rather than a prototype.

Tech Stack

Python

FastAPI

PostgreSQL

pgvector

Redis

Celery

Next.js

TypeScript

BM25

Cross-encoder reranker

LLM API

Links

GitHub Repository