Rapid prototyping GenAI apps using Amazon Q
As I build the GenAI Advisor app, I am now ready to start rapid prototyping and evaluating various models and platforms. At the same time I want to nail down the basic input prompts and output formats. A good place to start is to review recent trend reports. Timing is right for doing this at the begining of the year as many awesome reports are out.
App design and problem discovery recap
Before I get started, here is how I got here. I started with the vision with the AI for Everyone article. Then I discovered the problem to solve with GenAI Advisor app. I followed this by starting to create a dataset for the app. Along the way I outlined my mental model for problem discovery patterns and app design flow for GenAI apps. Here is a visual recap.
Governance via knowledge base
As I move to next stage of building GenAI Advisor, while I am excited to try out a bunch of models and platforms, at the same time, I do not want to lose sight of what does successful outcome look like. I review the dataset created in the last step and expand my top priority sources to review reports, papers, and other documents for inspiration. As I drill down from each trend source type, like for example, industry analysts, to top five management consultants, to recent reports from each of these consultants, I am creating a knowledge base which also represents traceability of data back to its source. I can further add metadata about the artifacts like views, license, and date to enable ranking of these artifacts as well as data governance.
While writing this section, I am also conscious of the slippery slope that is GenAI when it comes to copyright, including the recent copyright infringement lawsuit by New York Times and the Author's Guild class action lawsuit. Baking data governance into the knowledge base will help me determine which data artifact can be used for what purpose.
Prompt evaluation dataset
The first artifact I review for inspiration is the McKinsey report - The economic potential of generative AI: The next productivity frontier - presents insights over 68 pages in a variety of forms including graphs, trends, narratives, and use cases.
As I read the report, I note via links from report there are many others that McKinsey have published, I am wondering if these reports themselves have been part of the pre-training corpus used by recent LLM releases. Before I go deep into a certain direction, I should try a quick experiment.
Experiment #1: Can I rely on world knowledge of an LLM and prompt it to generate latest trends and insights without pointing it to a retrieval source?
I study the insights presented in the McKinsey report and come up with simple questions or prompts which may lead to those insights. This way I have a baseline prompt-response set which I can compare with variety of LLM responses.
Here is what the prompt evaluation dataset looks like. It includes columns for prompt to generate insight, insight data points from McKinsey source, data points and sources cited by various LLMs part of the evaluation experiment.
Note that I am only capturing data points and sources of the insights for comparison to make it easy for me to evaluate a broad matrix of 6 prompts x 4 LLMs. Running these prompt evaluations on Amazon Bedrock Chat playground is a breeze with side-by-side comparison of multiple LLMs.
Analyzing evaluation results
For the broader evaluation I chose two LLMs (Claude and Llama on Bedrock) which were operating from parametric knowledge cutoff at a certain date in 2023. I also chose two LLMs (ChatGPT-4 and Perplexity) which had access to search tool integration for more recent knowledge.
Here are few observations from my analysis of the evaluation results.
Mock design and information flow
The second artifact I review is the Microsoft New Future of Work Report which is published annually by a Microsoft Research team led by their Chief Scientist and Technical Fellow, Jaime Teevan.
This is also a popular report packed with insights across 40+ pages and it follows an easy to grok structure. Here is my first attempt to mock one of the pages (for example, page 6) of this report into an LLM generated outcome. I try using proven LLM capabilities (summarize, classify) as verbs and my text mock resembles a set of simple chat prompts.
I also create a simple information flow from source of trend data to outcome. I recognize that creating such a report must have taken a lot of research, curation, analysis, and writing by some very smart folks. However, to train an AI like a child, I want to simplify my assumptions of the techniques used, to get from the source to outcome, into atomic elements.
Many pages of the report refer to multiple research papers, so that is a definitive source. I am assuming a simple way to filter from tens of thousands of research papers to a few is to develop a topic outline for the report. Then search for papers by that topic and we are getting somewhere. Maybe even rank the papers by views and date to tighten the search radius. So, if I have narrowed down a set of research papers related to topics for my GenAI Advisor generated report, then I need a prompt or a set of prompts to extract those insights.
Rapid prototyping using Amazon Q
One of the research papers cited in the Microsoft report is - Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence by Shakked Noy, Whitney Zhang - which I will use for my next experiment.
Experiment #2: Can I prompt an LLM to extract the relevant insights from the research paper uploaded as a retrieval source?
For this experiment I am going to use the recently announced Amazon Q (preview) for three reasons. These features satisfy my production application requirements for the GenAI Advisor to be flexible (take many data sources) and ensure the right fit governance as my application scales.
领英推荐
I am going to start by performing one-off setup for a new application which takes about 10 minutes. Once the initial setup is out of the way, the iterative prototyping and building an application on Amazon Q takes just a couple of actions in most scenarios.
Step 1: I start the setup by visiting Amazon Q Applications on my AWS Console and click on the Create application button.
Step 2: Next screen only requires a unique application name and a service role name for enabling secured access.
Step 3: Amazon Q helps build an assistant based on retrieval-augmented generation (RAG) capability of an LLM. I can choose a native retriever and use a custom connector to a data source of my choice, like an Amazon S3 bucket. I can also hook into an existing Amazon Kendra enterprise search service I may have setup in the past. I also note the sheer number of documents a native retriever allows. For now I will choose the default options and move to the next step.
Step 4: Now I have a choice of data sources which I can use to populate the RAG index for my assistant. I want to prototype GenAI Advisor on the analyst reports and research papers I downloaded for this article, so Amazon S3 seems like the easiest data source to use. Before moving furter, I do need to create an empty S3 bucket so I can point Amazon Q application to this bucket. I will do this in the next step and then come back to this screen.
Step 5: I search for S3 on my AWS Console search bar and open in a new tab. I leave the default settings and add a unique name for my S3 bucket and create an empty S3 bucket. I switch my tabs to continue setting up the Amazon Q application.
Step 6: Back to Amazon Q application setup, just like for my application, I provide a new IAM role to protect the data source connection. I also select the name of the recently created empty S3 bucket.
Step 7: Last step in configuring the data source is to select the sync mode. I choose to switch to "New modified, or deleted content sync". This will save time by incrementally updating my retriever index instead of rebuilding it everytime I upload a new document to my data source. I also select Run on demand to sync the data source with my index as I only plan to manually add documents for now. That's it. I have configured my data source and I complete this step.
Step 8: I return back to the Applications console and notice a new application is created. I click on the application link and then select Preview web experience under the customize web experience step. On the next screen I configure the name of the AI assistant, add a tagline, and voila! My Amazon Q assistant is ready to test!
Step 9: Now I upload the Noy and Zhang paper I mentioned earlier. It is a 15 page PDF file. Within a second I can query this paper with a simple prompt. The results look promising! I can continue prototyping the assistant by trying out other documents and prompts.
Step 10: Now I go back to the S3 bucket I created earlier. Upload a bunch of analyst reports and research papers. Then I return to my Amazon Q application and sync the data source to build the retriever index. It takes several minutes for Amazon Q to read all the unstructured documents, chunk these, generate embeddings, and store these in a vector database. All this happens behind the scenes. Now I am ready to query the GenAI Advisor application based on knowledge from multiple documents.
That's it! In 10 steps, I have built a prototype of GenAI Advisor using Amazon Q. It has several features out of the box.
Before I leave you and get back to playing with my newly minted GenAI Advisor, one last thing! Amazon Q has a cool feature out-of-the-box to enable traceability of responses to retrieved data source.
Next in series, I will continue sharing my explorations with Amazon Q. I will try out AWS PartyRock. I may also dare myself to launch an incarnation of the GenAI Advisor on the new GPT Builder store! What would you like to read more about? Please comment or DM.
The author writes about generative AI to share his personal interest in this rapidly evolving field. The author's opinions are his own and do not represent the views of any employer or other entity with which the author may be associated.
Thanks for reading the AI for Everyone newsletter. Share and subscribe for free to receive new posts and support my work.