What Building AI Chatbots Taught Me About Simplicity

by FormulatedBy | Technology

Reading Time: ( Word Count: )

I spent months overengineering an AI chatbot. Then I threw most of it away and got better results in two weeks.

This is an anecdote of what actually worked when I built production RAG systems at scale, and why the lessons surprised me.

The Complexity Trap

I was tasked to build a chatbot which had certain important team documents as its context. These documents had business domain knowledge. The main idea was to retrieve relevant information from the documents based on customer/user query and provide the most accurate response. When I first started building this conversational AI system, I did what any engineer would do: I read the papers, studied the frameworks, and built something impressive. Semantic chunking with overlap windows. Multi-vector retrieval with re-ranking. Hybrid search combined with dense vector embeddings. A beautiful, sophisticated architecture. 

But it hallucinated constantly on our documents.

The problem wasn’t that the techniques were wrong. The problem was that I’d built a generic system for a specific problem. Our documents had structure, numbered sections, hierarchical headings, procedural steps. My fancy semantic chunking was actively destroying the very information users needed.

The fix was embarrassingly simple: hierarchical chunking that respected document structure. Instead of treating every document like a wall of text, I preserved the natural hierarchy. Headers stayed with their content. Procedures remained intact. Parent-child relationships between sections were maintained.

Accuracy jumped. Hallucinations dropped. And I learned my first hard lesson: understand your data before you engineer around it.

The Prompt Hierarchy

Here’s something that took me too long to accept: your model is only as good as your prompts.

I had been treating prompts as an afterthought. A thin wrapper around the “real” work happening in retrieval and embedding. But when I started experimenting with few-shot examples and structured outputs, everything changed.

Few-shot prompts are prompts in which you provide certain examples of input and output samples to the LLM. Instead of hoping the model would figure out our format, I showed it exactly what I wanted. Three examples of input-output pairs, and suddenly responses followed a predictable structure. Quality control became possible because outputs were predictable.

Structured outputs eliminated an entire category of bugs. JSON schemas meant no more parsing failures. No more responses that were technically correct but impossible to process downstream. The model understood not just what to say, but how to say it.

This sounds obvious written down. It wasn’t obvious when I was knee-deep in embedding optimization, convinced that retrieval quality was my bottleneck. Sometimes the leverage is in the last mile, not the foundation.

Intent Classification Changed Everything

The biggest architecture win came from a simple insight: classify intent first.

My early systems tried to handle everything in one flow. User message goes in, retrieval happens, response comes out. But users ask wildly different types of questions. Some want facts. Some want procedures. Some are complaining. Some are confused about their own question.

Treating them identically made no sense.

I rebuilt the system with an LLM-powered intent classifier at the front. Not keyword matching, that’s too brittle. A lightweight LLM call with structured output that categorized the query and extracted key entities. The classifier told me what kind of response the user actually needed before I committed to a retrieval strategy.

The result was cleaner code, faster responses, and dramatically better user satisfaction. Each intent type got its own optimized flow. Procedural questions hit the hierarchical chunks. Factual queries used dense retrieval. Complaints got routed differently entirely.

A small amount of intelligence at the routing layer saved enormous complexity downstream.

The Deterministic/Non-Deterministic Split

The most important architectural decision I made was drawing a clear line: deterministic actions get functions, non-deterministic decisions get the LLM.

What does this mean in practice? Database lookups, API calls, calculations, status checks: these are deterministic. The answer is knowable, consistent, and shouldn’t vary. I wrapped these in functions the LLM could call, but the LLM didn’t execute them. It decided when to call them and what arguments to pass. The actual execution was reliable code.

The LLM handled what LLMs are good at: understanding intent, generating natural language, synthesizing information, making judgment calls when data was ambiguous. I stopped trying to make it do math or remember precise numbers.

This separation made the system debuggable. When something went wrong, I could immediately identify whether it was a function error (deterministic, reproducible) or a model error (needs prompt tuning). Before this split, errors disappeared into a fog of probabilistic behavior.

Clean boundaries between deterministic and non-deterministic components turned chaos into engineering.

Simple Beats Clever

Looking back, every major improvement came from simplification, not sophistication. Respecting document structure instead of fighting it. Using the model’s strengths instead of compensating for weaknesses. Drawing clear boundaries instead of building monolithic flows.

The frameworks and papers have their place. But they’re solutions to general problems. Your problem is specific. The best architecture is the one that fits your data, your users, and your constraints, not the one that impresses other engineers.

I still read the papers. I still experiment with new techniques. But now I start simple and add complexity only when I’ve proven it’s necessary. The code I’m proudest of isn’t the most sophisticated. It’s the code that works reliably, fails predictably, and can be understood by the next engineer who inherits it.

That’s what building AI chatbots taught me. Not how to be clever with vectors and embeddings, but how to be disciplined about simplicity.

The best RAG system isn’t the most advanced one. It’s the one that actually helps your users.

Author: Utkarsh Bajaj

Post Category: Technology