Building an AI chatbot with real business value: notes from a non-developer
Conor Normile
Product strategy & UX leader | Solving real problems with user-centred AI
Amidst the AI revolution, there's a sweet spot between high-level theory and deep technical documentation that often goes unfilled. After spending months building an AI chatbot that halved report creation time for a financial services firm, I wanted to share what I learned – the successes and the occasional stumbles along the way.
What this guide is and isn't?
This is a guide for people like me: tech-literate designers, consultants and leaders who are navigating the world of AI and chatbots, trying to figure out how to unlock value for their businesses.?
Most beginner guides to AI chatbots are either too high-level, or too detailed for people who want to know what’s involved without necessarily building a solution. This post is intended to sit somewhere in the middle. Developers will find it too basic. For everyone else, I hope it gives you a sense of what’s required to bring an AI-powered chatbot to life.?
I’ll cover:
Background to the project?
Over recent months, I've been exploring how to take a user-centred approach to solving business problems with generative AI. While chatbots haven't been the only focus, they're part of the mix. So when a client was brave enough to join me on a journey of experimentation – with real business challenges and real stakes – I was all in.
I had two main goals in mind when taking on the project:
My prior experience with chatbots?
With a background in design, I came into this project with no coding experience (unless you're counting basic HTML, which seems generous). I had experimented with drag-and-drop chatbot tools at the UX Design Institute, creating an AI tutor to answer student questions on Slack. While it was a fun prototype, it wasn't robust enough for real-world use.
Since then, I've explored several 'low-code' chatbot development platforms. In theory you can use these tools without knowing any code. But I quickly realised that a basic grasp of Python could help me get more out of them. So I completed two beginner-level Python courses: Python for Absolute Beginners (Udemy) and AI Python for Beginners (DeepLearning.AI).?
While I'm certainly not a Python programmer, I can now set up environments, install packages, and decipher simple snippets of code. This proved invaluable when configuring the low-code tool I used to build the chatbot.
The process, step-by-step
1. Identifying the use case
The first step was figuring out the right problem to solve. Using the service design and UX toolkit, I led a workshop with the client team to explore potential use cases. We focused on uses cases that met four criteria:
We settled on a use case of speeding up the process for creating financial planning reports. These are documents that outline wealth management and investment strategies, tailored to the unique needs of each of the firm's clients.
The existing process involved drawing on information from hundreds of internal documents, emails, and PDFs – a task that could take up to 10 hours per report. Our goal was to reduce the preparation time by at least 20%.
2. Mapping out the journey
I worked closely with the team to map out the end-to-end process for creating a report. This step was critical. Understanding the current workflow would allow us to design a solution that fit seamlessly into the team’s day-to-day operations, increasing the chances of adoption.
The report creation process was simple enough to map out. The real value was in the conversations it spurred with the client team, which uncovered important nuances about how they work.
A few examples:
1. Preserving the personal touch: Initially I had ideas about using AI to create templated reports. But the team emphasised the importance of providing a bespoke service to their clients. A cookie-cutter approach would be neither appropriate for this business, nor welcomed by the people doing the work.?
2. Maintaining the advisor's voice: The team were rightly proud of the corpus of knowledge and insight that they had built up, and how it was expressed in each carefully-tailored report to their clients. This would have an impact on the level of expressiveness we allowed the AI in delivering its responses.
3. Making it useful on the go: The firm’s advisors often had their best insights right after client meetings. This led us to add speech-to-text capabilities to the chatbot through OpenAI's Whisper API, allowing them to capture and process these thoughts on the go.
The lesson? Just as with any service design or UX project, understanding how people actually work is critical for designing an AI system that’s not just useful, but used.?
3. Choosing RAG as an approach
The use case lent itself well to a Retrieval-Augmented Generation (RAG) system, which would query an internal knowledge base of documents to provide accurate, well-written answers.
RAG combines two key capabilities: finding relevant information from a knowledge base (Retrieval) and generating tailored responses (Generation). The system first searches through the data to surface the most relevant snippets, then passes those snippets to an AI model that crafts a clear and informative answer.
Conceptually, RAG is fairly straightforward. But getting to a high-quality output requires a systematic process of testing and optimisation, which I'll get into later.
领英推荐
4. Building the solution
I designed the solution using a tool called Flowise, which is based on LangChain – a popular framework for connecting the different elements of an LLM-based product. Flowise provides a graph-based interface for configuring and connecting the various components.
Here’s a simplified version of my Flowise setup with the key components labelled:
Each component above represents a step in the process. If you're curious about the technical details, here's what's involved:
5. Testing the pipeline
With everything set up, I submitted my first test queries – that moment when you hit 'send' and hold your breath. Barring some teething problems that were relatively simple to resolve, the system worked.
The initial responses were not bad. But I knew that getting consistent results in a real-world scenario would require extensive testing and optimisation.
6. Evaluating and optimising response quality
This was the longest and most labour-intensive part of the project. Evaluating the RAG responses and adjusting the various levers involved (chunking, embeddings, prompts, etc.) is crucial for optimising quality and accuracy. I followed a four-step approach.
1. Compile a set of common queries for testing
In a workshop, I asked the client team to provide a list of 30 queries that they might ask the chatbot for information on. We sought a mix of queries that covered the breadth of concepts contained in the source documents.?
2. Define evaluation metrics
I used a subset of the evaluation metrics defined by RAGAs, a framework for evaluating RAG systems.?
These four metrics allow you to test the two critical aspects of an effective RAG system: retrieval of data, and generation of high quality responses based on that data. Context Recall is hard to evaluate manually, so I focused primarily on the other three metrics, double-checking my scores later with the client team.
3. Test the query set against various ‘recipes’
There are several levers you can play with when optimising a RAG system, all of which have an impact on the responses. I systematically tested the query set against different configurations (what I termed ‘recipes’), scoring each response against the RAGAs metrics.?
The main levers I experimented with:
4. Record and score responses
I recorded the queries, the responses and the metric scores in a spreadsheet, using a separate worksheet for each recipe. I tested 13 recipes in total.
Since I didn't have access to a production-grade testing pipeline, I performed the evaluation and optimisation manually. While time-consuming, this hands-on approach gave me an intimate understanding of how the different elements affected the output.?It does of course inject a degree of subjectivity into the process.
7. Pilot phase and results
You can only truly test a system through day-to-day use. I ran a two-month pilot phase with the client team, having them use the chatbot weekly to help draft reports. This allowed us to see how the system performed in real-world scenarios and gauge user adoption.
The results exceeded our expectations. Instead of the 20% reduction we had targeted, the chatbot was delivering more than a 50% time saving – with no impact on the quality or bespoke nature of the final reports.
Equally impressive was the level of user adoption. The team embraced the chatbot and was using it on a daily basis, which was a testament to how well the solution fit into their existing workflows.
8. Keeping content up to date
Before deploying the solution into production (a process that merits its own blog post), I had one more challenge to overcome. How do you keep the content in the knowledge base from going stale?
I took the simple approach of providing a way for the client to upload or delete documents through a document loader on the Flowise back-end. I connected a Postgres database as a record manager (thanks to this guide on Youtube, one of many helpful video tutorials I relied on from Leo Van Zyl). The record manager checks any proposed changes against the current knowledge base to avoid issues like duplication of data.
Lessons learned
While I don’t claim to have it all figured out, these are the insights that stayed with me.
Final thoughts
The results speak for themselves: a 50% reduction in the time needed to create tailored financial planning reports, with genuine user adoption across the team.
Perhaps the biggest lesson was that tried-and-tested design principles – understanding user needs, iterating based on feedback, and focusing on real business problems – remain crucial when working with AI. The technology is powerful, but success still hinges on understanding how people actually work.
Founder at Bricolage
3 个月Fantastic paper Conor. Thanks for sharing.
Customer Success | Project & Operations Management | Maximising Client Value
3 个月Fascinating stuff Conor.
Partner & Board Member @ TalentHub | Growth Mindset Coach
3 个月Thanks so much for sharing Conor, and I hope all is going great for you?
Thanks Conor. This is interesting
Data Protection and Privacy
3 个月Very accessible and insightful read Conor Normile