Shakespeare meets AI

Mark Gerow

Impactful Application Development | Process Automation | Artificial Intelligence | Agile Project Management | Technology Leadership

发布日期: 2024年5月27日

In this installment of the RAG prototype series, I've created a web UI for the application, and I've included another document set - the complete works of Shakespeare - to simulate the much larger document collections that are likely to be found within any enterprise. My goals here were to see how well the RAG solution performed over this large set, whether the prototype could still select the correct data source based on the seekers question, and practical differences there were between GPT 3.5-turbo and GPT 4o.

Summary of findings

I encourage you to take a look at the video below, as it demonstrates the prototype in action, but if you don't have the time, here are my key findings (in no particular order):

1. Use AI whenever possible to save time writing code

For starters, I went "old-school" and Googled for examples of JavaScript chatbots to find some sample code. I found a variety of GitHub libraries, and several good posts on Stack Overflow. I also found a lot of sales-ware masquerading as opensource, which wasted my time. This is how I would find sample code pre-GPT (old habits die hard). I finally came to my senses and asked ChatGPT for sample code and within seconds had what I needed to get started. The prompt I entered was "I need code for a simple bootstrap chatbot UI " and GPT provided a fully functional template to get me started.

From there, GitHub Copilot in Visual Studio took over to fill in code snippets based on my comments, both in JavaScript on the front-end and in C# on the back end. As widely reported, I saved about 50% of the time and probably ended up with a higher-quality solution. That's not to say there wasn't plenty left for me to do, but for all the developers out there - if you're not using AI in various forms to assist you with coding you are wasting your valuable time and, eventually, you will be replaced by those who have adopted AI coding.

2. GPT 4o is a bit faster and generates higher quality answers

GPT 4o does "feel" faster and the phrasing of its answers seems more subtle and complete. I am unsure how important this will be for many RAG solutions, where the data is coming from well-defined local sources, and this definitely requires more study. One clear advantage of GPT 4o is the amount of data it can reason over, as measured by "tokens". You can send more data to GPT 4o to analyze when it is formulating a response, which all things equal should lead to better results.

3. GPT 4o costs 10x GPT 3.5-turbo, so consider using 3.5-turbo when appropriate

Simply put, GPT 4o costs 10 times what 3.5-turbo does for the same length of uploaded data (questions, RAG data, other prompt content) and for the same length of generated response. This can add up fast! However, the responses from GPT 4o are likely to be more accurate, complete, and nuanced. This may lead to fewer clarifying or follow-on questions by the seeker, which could offset the additional cost to a certain extent. And for highly demanding knowledge domains such as law, medicine, or engineering where both the consequences of an incorrect reply and the value of the seeker's time are high, the 10x cost differential may be justified. One question I have is whether RAG solutions benefit as much from this increased accuracy, since the source data is provided to GPT in the form of the prompt rather than coming from the LLM's knowledgebase, and that data is well understood and curated. This is an area for further study.

4. Selecting the right data source is critical to returning the correct answer

This is of course a "duh!" statement, but my testing on this version of the prototype reinforced the point. When I asked the generic question "who are you?" I expected my chatbot to search GPT's public knowledgebase, but instead it classified the question as being about a customer, product, or sales rep and so selected my local "VECTOR" data source. I hadn't seen this problem before adding the "SHAKESPEARE" data source, and most likely I need to tweak the prompt I use to when asking GPT to select the correct source. More generally, it is probably a good idea to test the classification prompt every time new data sources are added to maximize the probability that the application is pulling the RAG data from the correct source.

5. How you split text for storage in the vector database is important

Splitting text data so it can be written to a vector database for optimal query performance is an art as well as a science. Cleaning up the text to remove punctuation or other non-information-bearing characters is the first step. Deciding how to split up the text (by paragraph, topic, fixed number of characters, fixed number of words, or some other scheme) can have a significant impact. It's also recommended that text blocks overlap (perhaps by 30%) to increase the chance that relevant neighboring content gets selected. I did none of these things.

For this version, I placed entire sonnets and entire acts from plays into their own records in the vector database. This means that there could be a widely varying number of characters in each record - some acts may be quite short, while others quite long. And I didn't overlap my entries to include any text from neighboring blocks. For these reasons I'm fairly sure the performance of my "SHAKESPEARE" RAG data source could be significantly improved - a question for an upcoming installment.

6. There's an excellent testbench for comparing performance of different OpenAI models

For developers or others working with the OpenAI API, be sure to check out their Chat Playground at: https://platform.openai.com/playground/chat.

This is a great tool to help you understand how different generations of models stack up against each other and to test out different prompts to see which works best with your data.

Conclusion

I had fun creating the web UI and found that ChatGPT and GitHub Copilot made creating it much easier. I also explored the differences between GPT 3.5-turbo and GPT 4o and discovered there was not as much difference as I expected. I also uncovered several topics for further study, including the best way to split the Shakespeare text when adding it to the vector database, and how to improve question classification so that my RAG application uses the correct data source.

Onward to the next layer of the onion!

Building Software

277 位关注者

要查看或添加评论，请登录

Mark Gerow的更多文章

Why enterprise search and AI are inextricably linked

2025年2月26日

Why enterprise search and AI are inextricably linked

Having developed enterprise AI applications for the past 8 years, first with Google’s DialogFlow NLP, and more recently…

1 条评论
Artificial Intelligence - Who needs it?

2025年2月17日

Artificial Intelligence - Who needs it?

Reflections on AI from the trenches When I was just out of college and a cub developer in the IT department at Intel -…

3 条评论
SQL Server can be your 1-stop datastore

2025年1月8日

SQL Server can be your 1-stop datastore

Old habits die hard, and as someone who's been using SQL Server since before Microsoft bought it from Sybase back in…

1 条评论
Automated testing of AI applications

2024年10月21日

Automated testing of AI applications

Automated testing of application code is a mainstay of professional software development, whether for commercial…

1 条评论
Thoughts on Enterprise Search (and AI)

2024年10月1日

Thoughts on Enterprise Search (and AI)

“Enterprise” search can mean many things, so I thought it would be useful to organize my thoughts on the subject…
Querying SQL Databases with AI

2024年9月13日

Querying SQL Databases with AI

One of the many intriguing uses for generative AI is to have it write SQL scripts for you. And lest you think this only…
How AI puts a key measure of software quality at risk

2024年9月3日

How AI puts a key measure of software quality at risk

It can be difficult to keep track of all the ways that AI is changing (upending!) software development. In this…

2 条评论
Vector Databases - the hidden gem within the generative AI frenzy?

2024年8月21日

Vector Databases - the hidden gem within the generative AI frenzy?

I have recently begun to wonder if, as is so often the case with "revolutionary" new technologies, we haven't been so…

2 条评论
AI Gets Real

2024年8月6日

AI Gets Real

Like many, I have a portion of my savings invested in the stock market, so I couldn't help but notice the significant…

3 条评论
Summarizing Documents using AI

2024年7月29日

Summarizing Documents using AI

We are understandably fascinated by the ability of generative AI to seemingly converse with us. It appears so…

See all articles

Summary of findings

1. Use AI whenever possible to save time writing code

2. GPT 4o is a bit faster and generates higher quality answers

3. GPT 4o costs 10x GPT 3.5-turbo, so consider using 3.5-turbo when appropriate

4. Selecting the right data source is critical to returning the correct answer

5. How you split text for storage in the vector database is important

6. There's an excellent testbench for comparing performance of different OpenAI models

Conclusion

Building Software

277 位关注者

Mark Gerow的更多文章

Why enterprise search and AI are inextricably linked

Artificial Intelligence - Who needs it?

SQL Server can be your 1-stop datastore

Automated testing of AI applications

Thoughts on Enterprise Search (and AI)

Querying SQL Databases with AI

How AI puts a key measure of software quality at risk

Vector Databases - the hidden gem within the generative AI frenzy?

AI Gets Real

Summarizing Documents using AI

社区洞察