From my impromptu series on real-life AI agent series, here’s the recipe we followed to uncork the ability to query databases in plain English. More than most, this post will assuredly age like milk given the pace of change in tech. Should you give it a sniff? Well, it’s free as in beer. Maybe try a sip?
For the throw it back type:
Our August-vintage database AI agent now runs queries for 40 people (growing) who our overburdened BI team could not serve before. With access to 6,700 entities (tables, views, stored procedures) spread across many databases from which to self-serve, these newly minted data mixologists are now answering questions we did not know to ask just a few weeks ago.
For the sipping type, let’s get into the craft:
First, JustAnswer has a huge thirst for learning. In 2024, we tested 525 product hypotheses to statistical significance on our online professional services marketplace. The tests range from simple UI tweaks to brand new business models and everything in between. Yes, that’s a LOT of tests! I’ll drink to that.
Second, after two decades of testing, you won’t be surprised that our database schema is volatile and extremely complex. Our main database has 1,000 tables and views. Also unsurprising is that our documentation can’t keep up as we struggle to document each schema change perfectly. Net-net, few humans can speak to our full data catalog. It really makes your head spin.
This is a major headache because learning from past tests is crucial to formulating good hypotheses for future investments. Per our traditional menu, our BI and Analytics bartenders wade through the schema and serve canned reports for the rest of us to consume. That works, but they can’t keep up with demand. To increase the flow, we wanted something more like a self-serve tap. Hence our foray into AI agents.
First, we used LLMs to autogenerate documentation for our tables, views and stored procedures. Feeding data samples and our pre-existing sparse and occasionally outdated documentation gave it a jump start. Even that early witches’ brew met the need of some.
Then, we vectorized this documentation, schema and stored procedures and made it available to query via a RAG interface. That is so much faster than searching and reading the 10 million words of underlying documentation, even for grep experts. Bottoms up!
Encouraged, we could then connect the pieces and enable users to ask questions in plain language: “How many users converted on our UK site during lunch hour in June?” The agent first converts English to SQL and then runs the resulting query against an empty DB to validate its syntax. Upon a failure – LLM-powered SQL generation is still imperfect - the agent enlists another LLM to repair the query. Guardrails are in place to prevent the system from spinning out of control. Finally, the resulting data are formatted into a coherent answer to the original question. It’s an entire cocktail of LLM calls.
Obviously, there’s a non-negligible error rate, so we still rely on experts when in doubt. But it’s been so useful that we’ve ported the concept to other databases. This week’s version of the agent now works with over 6,700 entities (tables, views, stored procedures) across many databases.
Oh, our production SQL schema sees nine daily changes on average. To ensure our users don’t sour at the agent’s staleness, the documentation and RAG embeddings are regenerated automatically for every one of those changes.
In this early going, about 40 people use the agent regularly to the tune of 80 queries per week. Not huge volume by any stretch, but average report creation lead time has shrunk dramatically as a result and we now get a better sense of our employee baseline’s appetite for business insights. It helps us establish the value of continued investments in this AI agent.
Cheers to that.