Rapid prototyping GenAI apps using Amazon Q
Created by Manav Sehgal using DALL.E

Rapid prototyping GenAI apps using Amazon Q

As I build the GenAI Advisor app, I am now ready to start rapid prototyping and evaluating various models and platforms. At the same time I want to nail down the basic input prompts and output formats. A good place to start is to review recent trend reports. Timing is right for doing this at the begining of the year as many awesome reports are out.

App design and problem discovery recap

Before I get started, here is how I got here. I started with the vision with the AI for Everyone article. Then I discovered the problem to solve with GenAI Advisor app. I followed this by starting to create a dataset for the app. Along the way I outlined my mental model for problem discovery patterns and app design flow for GenAI apps. Here is a visual recap.

Mental model for GenAI app design and problem discovery

Governance via knowledge base

As I move to next stage of building GenAI Advisor, while I am excited to try out a bunch of models and platforms, at the same time, I do not want to lose sight of what does successful outcome look like. I review the dataset created in the last step and expand my top priority sources to review reports, papers, and other documents for inspiration. As I drill down from each trend source type, like for example, industry analysts, to top five management consultants, to recent reports from each of these consultants, I am creating a knowledge base which also represents traceability of data back to its source. I can further add metadata about the artifacts like views, license, and date to enable ranking of these artifacts as well as data governance.

Data governance baked into the knowledge base

While writing this section, I am also conscious of the slippery slope that is GenAI when it comes to copyright, including the recent copyright infringement lawsuit by New York Times and the Author's Guild class action lawsuit. Baking data governance into the knowledge base will help me determine which data artifact can be used for what purpose.

Prompt evaluation dataset

The first artifact I review for inspiration is the McKinsey report - The economic potential of generative AI: The next productivity frontier - presents insights over 68 pages in a variety of forms including graphs, trends, narratives, and use cases.

As I read the report, I note via links from report there are many others that McKinsey have published, I am wondering if these reports themselves have been part of the pre-training corpus used by recent LLM releases. Before I go deep into a certain direction, I should try a quick experiment.

Experiment #1: Can I rely on world knowledge of an LLM and prompt it to generate latest trends and insights without pointing it to a retrieval source?

I study the insights presented in the McKinsey report and come up with simple questions or prompts which may lead to those insights. This way I have a baseline prompt-response set which I can compare with variety of LLM responses.

  1. How much in USD will generative AI add to the economy by 2026?
  2. Which business functions will benefit the most from Generative AI?
  3. Which sectors will benefit the most from Generative AI?
  4. What is the percentage of knowledge related work that can be automated by Generative AI by 2026?
  5. What is the expected impact on labor productivity from generative AI by 2026?
  6. How much funding did generative AI startups receive in 2023?

Here is what the prompt evaluation dataset looks like. It includes columns for prompt to generate insight, insight data points from McKinsey source, data points and sources cited by various LLMs part of the evaluation experiment.

Snapshot of prompt evaluation dataset

Note that I am only capturing data points and sources of the insights for comparison to make it easy for me to evaluate a broad matrix of 6 prompts x 4 LLMs. Running these prompt evaluations on Amazon Bedrock Chat playground is a breeze with side-by-side comparison of multiple LLMs.

Amazon Bedrock side-by-side comparison of multiple LLM responses

Analyzing evaluation results

For the broader evaluation I chose two LLMs (Claude and Llama on Bedrock) which were operating from parametric knowledge cutoff at a certain date in 2023. I also chose two LLMs (ChatGPT-4 and Perplexity) which had access to search tool integration for more recent knowledge.

Prompt evaluation across multiple models

Here are few observations from my analysis of the evaluation results.

  1. Rationalizing data points from multiple sources. For trends analysis domain it will be hard to rationalize if variations in responses are due to different analysts having own view, or due to knowledge cutoff, or just hallucinations. Llama responded with multiple variants in the same response when analyst did not agree on the same data point. Others with search tools access came back with multiple citations to sources for single data point.
  2. Web corpus and search rankings effect. Search rankings and how much content is out there from a given source has an impact on LLM responses. Even for a relatively small prompt evaluation dataset, many responses came back with McKinsey as a data source. The prompt questions are derived from the McKinsey article, however that is a coincidence at best. The questions by themselves do not point to content only available in that article.
  3. Copyright infringement. Surprisingly, at least one LLM response was word for word same as the McKinsey article, despite a fairly generic prompt. The response referred to use cases based data point, which was specific to the McKinsey article, however did not make sense as response to a generic question with no use cases in context.
  4. GenAI shows promise for dynamic information. On the funding question (#6) the McKinsey report was only updated until H1 2023. ChatGPT and Perplexity with their access to search tools came back with all of 2023 investments, when queried in Jan 2024.
  5. Slowly changing information performs well on parametric. This is the reverse of the prior point. Certain information does not change much over time or takes longer cycles. Like functions or sectors impacted by GenAI (questions #2 and #3). These kinds of insights perform as well, if not better, on parametric (world knowledge) of LLMs when compared with LLMs relying on search tools.
  6. Adjacent question (mis)understanding. In some cases LLMs misunderstood a question when they were trained on information from an adjacent topic. Like a question on percentage of knowledge work automation with GenAI (#4) was misinterpreted as percentage of GenAI adoption in enterprises. Not the same thing.

Mock design and information flow

The second artifact I review is the Microsoft New Future of Work Report which is published annually by a Microsoft Research team led by their Chief Scientist and Technical Fellow, Jaime Teevan.

This is also a popular report packed with insights across 40+ pages and it follows an easy to grok structure. Here is my first attempt to mock one of the pages (for example, page 6) of this report into an LLM generated outcome. I try using proven LLM capabilities (summarize, classify) as verbs and my text mock resembles a set of simple chat prompts.

Mock and information flow for GenAI Advisor generated report

I also create a simple information flow from source of trend data to outcome. I recognize that creating such a report must have taken a lot of research, curation, analysis, and writing by some very smart folks. However, to train an AI like a child, I want to simplify my assumptions of the techniques used, to get from the source to outcome, into atomic elements.

Many pages of the report refer to multiple research papers, so that is a definitive source. I am assuming a simple way to filter from tens of thousands of research papers to a few is to develop a topic outline for the report. Then search for papers by that topic and we are getting somewhere. Maybe even rank the papers by views and date to tighten the search radius. So, if I have narrowed down a set of research papers related to topics for my GenAI Advisor generated report, then I need a prompt or a set of prompts to extract those insights.

Rapid prototyping using Amazon Q

One of the research papers cited in the Microsoft report is - Experimental Evidence on the Productivity Effects of Generative Artificial Intelligence by Shakked Noy, Whitney Zhang - which I will use for my next experiment.

Experiment #2: Can I prompt an LLM to extract the relevant insights from the research paper uploaded as a retrieval source?

For this experiment I am going to use the recently announced Amazon Q (preview) for three reasons. These features satisfy my production application requirements for the GenAI Advisor to be flexible (take many data sources) and ensure the right fit governance as my application scales.

  1. Amazon Q can be tailored by connecting it to company data, information, and systems, made simple with more than 40 built-in connectors.
  2. Users can have tailored conversations, solve problems, generate content, take actions, and more.
  3. Amazon Q is aware of which systems the users can access, so they can ask detailed, nuanced questions and get tailored results that include only information they are authorized to see.

I am going to start by performing one-off setup for a new application which takes about 10 minutes. Once the initial setup is out of the way, the iterative prototyping and building an application on Amazon Q takes just a couple of actions in most scenarios.

Step 1: I start the setup by visiting Amazon Q Applications on my AWS Console and click on the Create application button.

Amazon Q Applications on AWS Console

Step 2: Next screen only requires a unique application name and a service role name for enabling secured access.

Creating Amazon Q Application

Step 3: Amazon Q helps build an assistant based on retrieval-augmented generation (RAG) capability of an LLM. I can choose a native retriever and use a custom connector to a data source of my choice, like an Amazon S3 bucket. I can also hook into an existing Amazon Kendra enterprise search service I may have setup in the past. I also note the sheer number of documents a native retriever allows. For now I will choose the default options and move to the next step.

Choose the retriever method

Step 4: Now I have a choice of data sources which I can use to populate the RAG index for my assistant. I want to prototype GenAI Advisor on the analyst reports and research papers I downloaded for this article, so Amazon S3 seems like the easiest data source to use. Before moving furter, I do need to create an empty S3 bucket so I can point Amazon Q application to this bucket. I will do this in the next step and then come back to this screen.

Selecting a data source

Step 5: I search for S3 on my AWS Console search bar and open in a new tab. I leave the default settings and add a unique name for my S3 bucket and create an empty S3 bucket. I switch my tabs to continue setting up the Amazon Q application.

Create an S3 bucket

Step 6: Back to Amazon Q application setup, just like for my application, I provide a new IAM role to protect the data source connection. I also select the name of the recently created empty S3 bucket.

Configuring the data source

Step 7: Last step in configuring the data source is to select the sync mode. I choose to switch to "New modified, or deleted content sync". This will save time by incrementally updating my retriever index instead of rebuilding it everytime I upload a new document to my data source. I also select Run on demand to sync the data source with my index as I only plan to manually add documents for now. That's it. I have configured my data source and I complete this step.

Continue configuring the data source

Step 8: I return back to the Applications console and notice a new application is created. I click on the application link and then select Preview web experience under the customize web experience step. On the next screen I configure the name of the AI assistant, add a tagline, and voila! My Amazon Q assistant is ready to test!

Preview web experience leads to AI assistant

Step 9: Now I upload the Noy and Zhang paper I mentioned earlier. It is a 15 page PDF file. Within a second I can query this paper with a simple prompt. The results look promising! I can continue prototyping the assistant by trying out other documents and prompts.

Amazon Q assistant answering a question based on a document

Step 10: Now I go back to the S3 bucket I created earlier. Upload a bunch of analyst reports and research papers. Then I return to my Amazon Q application and sync the data source to build the retriever index. It takes several minutes for Amazon Q to read all the unstructured documents, chunk these, generate embeddings, and store these in a vector database. All this happens behind the scenes. Now I am ready to query the GenAI Advisor application based on knowledge from multiple documents.

Prototyping GenAI Advisor built on Amazon Q

That's it! In 10 steps, I have built a prototype of GenAI Advisor using Amazon Q. It has several features out of the box.

  1. Secure: Hosted on my AWS account.
  2. Scalable: I can keep adding more knowledge.
  3. Steerable: I can control which documents it refers when responding.
  4. Simple: I did not write a single line of code to build it.
  5. Speedy: It took me around 10 minutes to setup and takes me less than a minute to add a new document to its knowledge and evaluate it.

Before I leave you and get back to playing with my newly minted GenAI Advisor, one last thing! Amazon Q has a cool feature out-of-the-box to enable traceability of responses to retrieved data source.

Inline citations for traceability of responses to retreiver data sources

Next in series, I will continue sharing my explorations with Amazon Q. I will try out AWS PartyRock. I may also dare myself to launch an incarnation of the GenAI Advisor on the new GPT Builder store! What would you like to read more about? Please comment or DM.


The author writes about generative AI to share his personal interest in this rapidly evolving field. The author's opinions are his own and do not represent the views of any employer or other entity with which the author may be associated.


Thanks for reading the AI for Everyone newsletter. Share and subscribe for free to receive new posts and support my work.



要查看或添加评论,请登录

Manav Sehgal的更多文章

社区洞察

其他会员也浏览了