Back to implementations

Case Study - Agentic RAG System

Most AI document chatbots work in demos. They break on real data.

When a business deploys a RAG chatbot on their documents, it answers conceptual questions well and quietly fails on anything specific. Ask it about pricing trends across a product category and it shines. Ask it for the status of order #10718 and it hallucinates, apologizes, or returns the wrong result entirely. The reason is architectural, not a model limitation, and it's fixable. This project demonstrates exactly that fix: an agentic RAG system that knows which kind of question it's dealing with, picks the right retrieval strategy, and returns a sourced, accurate answer either way.

Demo walkthrough ~1 min

Why RAG alone is not enough

Retrieval-Augmented Generation works by converting text into vectors, numerical representations of meaning, then finding the chunks most semantically similar to a query. This is powerful for open-ended questions. It struggles the moment precision is required.

Vector search finds what is conceptually close. The number "10718" as a vector sits near other numbers, other identifiers, not necessarily near the document chunk that actually contains order 10718. The model searches six times with rephrased queries, retrieves 57 unrelated chunks, and correctly concludes it cannot find the answer. Not a hallucination, a retrieval architecture mismatch.

This failure mode is predictable. Any query involving an exact identifier, order IDs, invoice numbers, customer codes, product SKUs, will hit it. Which means a business deploying a standard RAG chatbot over its operational documents will encounter this constantly.

"What is the status of order 10718?" - 6 search attempts, 57 chunks retrieved, zero containing the correct document. Standard RAG architecture, working exactly as designed. That is the problem.

The fix: let the model choose its retrieval strategy

The solution is not to replace vector search. It is to give the agent a second tool and let it reason about which one to use.

No hardcoded routing rules. No keyword detection. The LangGraph agent receives the user's query, reasons about what kind of retrieval it requires, selects the appropriate tool, and executes. If the exact search returns too many matches, it asks the user to narrow down rather than guessing.

The same query that failed completely now resolves in a single pass. And the semantic search path still handles conceptual queries just as well. The two tools complement rather than replace each other.

Tool 1

Semantic search

Vector retrieval via Qdrant. Best for conceptual questions, natural language queries, topics distributed across multiple documents.

Tool 2

Exact text search

Regex search over the raw corpus. Best for order IDs, customer codes, product names, any precise identifier.

"What is the status of order 10718?" - one find to fetch cycle. Complete result: date, shipping address, customer, all line items with quantities and prices, inferred delivery status. Every detail sourced and verifiable against the original document.

How a request moves through the system

The frontend receives a custom SSE event stream with meaningful intermediate states: searching_documents, documents_found, generating_answer. Users see real-time progress rather than a spinner. This was built deliberately outside the Vercel AI SDK to maintain direct control over what gets surfaced and when.

User query->LangGraph agent->Tool selection->Qdrant or raw text->LLM synthesis->Streamed answer + sources

Fun fact

The same architecture can be applied to any document corpus. It can be used for internal knowledge bases, product documentation, or even public-facing content. The key is the agent's ability to reason about the type of query and select the appropriate retrieval strategy.

Technical decisions and why

Qdrant over PGVectorNo existing Postgres in the stack. A purpose-built vector DB is simpler to deploy and scale independently for this use case.
LangChain over LlamaIndexMore direct control over agent loop and tool definitions, important when the tool selection behavior itself is part of the design.
Custom SSE over Vercel AI SDKThe SDK protocol does not natively surface intermediate retrieval states. Custom protocol adds complexity on the frontend but gives full control over UX.
OpenRouter for LLM accessProvider-agnostic. Production deployments can swap the underlying model without touching orchestration logic.
No auth, persistence, or uploadsDeliberate scope decision. The core retrieval behavior is what this demonstrates. These are known production additions, not gaps.

What is not in this build and why it was left out

Multi-user auth and hybrid BM25 + vector search were intentionally skipped. A portfolio project that simulates production infra adds complexity without proving the core claim. The retrieval architecture and agent design are the demonstration. Everything else is a known layer on top.

The limitation worth noting: exact text search on very large corpora gets slow without an index. Production deployment at scale would require BM25 or full-text indexing alongside the vector store, not regex over raw text.

Stack

LangChainLangGraphQdrant CloudOpenRoutertext-embedding-3-smallNext.jsTypeScriptTailwind CSSShadcn/uiCustom SSE protocol