GenAI app design flow and creating a dataset
Created by Manav Sehgal using DALL.E

GenAI app design flow and creating a dataset

This article is second in the series Creating GenAI Advisor. In the last article I identified the problem to solve and suggested six problem discovery patterns. Now I am going to share techniques to achieve an app design flow to rapidly prototype the solution and iteratively scale into a production ready system. This article will then expand on one of these techniques to create a custom dataset for the GenAI Advisor app.

App design flow

Here are a few techniques which I will use to create my GenAI Advisor app flow. I hope these will help me deliver results rapidly yet scale elegantly over time.

  1. Awesome list into dataset. Most GenAI apps rely on a dataset or a set of documents for retrieval, fine tuning, model evaluation, and in some advanced cases for continued pre-training. I start with a simple list of data sources, iterate how my app behaves with these, and refine into a comprehensive dataset. Building a dataset for GenAI apps is a continuos function. Sometimes, it may be a good idea to use an LLM to build the dataset for your GenAI app.
  2. Inspire, improvise, and innovate. I am inspired by real-world artifacts (documents, data sources, reports, etc.) to move quickly, like investigating how an analyst report presents trends, then creating a prompt-response pairing to simulate similar results. Then I will improvise to deliver better results based on LLM capabilities, like using prompt engineering techniques. Finally, I will innovate as I have significant body of knowledge for solving a particular task, like switching to fine tuning or agentic systems.
  3. Simple before specializing. For example, I will use simple and obvious prompts, like talking to a human advisor, before exploring more sophisticated prompt engineering techniques.
  4. English over code. I will use coding as the last resort to build my generative AI app. Code tends to bind the system deterministically while LLM responses are based on probabilities. Code tends to fix user interactions while LLM frees. Coded features more or less remain static over time while LLM capabilities will evolve with every training step, every retrieval, and every prompt engineering technique. Coded apps can be customized at build time and configured at run time while LLM can be customized (fine tuning, continued pre-training) and configured (parameters tuning) at both build and run time. Code mostly relies on the abilities of programmer(s) while LLM is collaborative and also relies on user abilities to prompt.
  5. Atomic into system. Where possible I will explore atomic LLM capabilities first, like copying a document section within chat to explore in-context learning and prompting, to learn from first principles, before building a system, like using document upload based retrieval within chat, followed by using multiple documents based retrieval, followed by creating a knowledge base, and so on.
  6. Hard-wiring before extracting patterns. I will hard-wire the application to rapidly prototype and learn before extracting reusable patterns to make the application more generalizable.

Awesome list into dataset

When creating a GenAI app, I find it useful to start with an inspiration of a human expert created artifact which I want the AI to consume or generate. Studying the ideal artifact can help me, as an example, clone certain sections into the desired prompts which might generate similar results using an LLM.

Such artifacts could include reports, documents, databases, transcripts, or workflows and relate to the problem domain I have discovered in the prior step. I am building GenAI Advisor and the problem domain includes research papers, analyst reports, among others. At the back of my mind, I have a related, more ambitous goal, to create a dataset for my GenAI app, which I can use for retrieval, fine tuning, model evaluation, or continued pre-training as the need may arise.

Starting with a simple list

So where do I start? How do I know if I am starting with the best artifact in the first place? Maybe, a good place to start is in the middle, not a specific artifact, not an entire dataset, but a simple list of types of data sources I will need.

If I am familiar with the problem domain where I have experience in performing the task, then I usually have a goto list handy. As I monitor technology and business trends for my work, I do have a list in this particular case.

You will notice that my list requires varying number of steps to get to the actual artifact reporting on trends. For example, I need to further drill down and list top analysts, then go to their website or blog to get the report which I monitor. However, in case of ChatGPT or Bard I may get the trends or insights I need in a single prompt. So my list needs a hierarchy, prioritization, and categorization of some sorts to be even more useful. I will park that thought for now.

I also make a mental note that this list in itself may lead to identifying the data sources I want to connect for retrieval augmented generation (RAG) in GenAI Advisor. So in some ways this is a list of microfeatures for my app. Worth the time spend in iterating through this.

List of trend sources and types I read in alphabetical order

So, back to the list. When I am new to the problem domain, I usually seek advice from experts, follow influencers, or search for "awesome list of X" on GitHub and Web in general. I am pretty confident I have a good list nailed down so I won't do that in this case.

Generating ideas using chat playground

This is also a good place to query my favorite LLM for ideas. I like using Amazon Bedrock Chat playground for generating ideas as it gives me options to (1) select and compare responses from multiple LLMs including proprietary Claude and open Llama, (2) configure the model response based on my needs. For example, in this case I went with Claude 2.1 and default configurations for Temperature (generates more creative responses), etc. I did change the length to 1000 tokens to ensure all 20 rows of the table are returned.

Using Bedrock Chat playground to generate ideas

Note that I am using a simple but specific prompt to generate the response I desire.

Prompt: Create a table with columns for 20 types of sources of GenAI trends and description.

Here is the complete table generated by Claude. There are some new and interesting trend sources not on my goto list, like job postings, hackathons, web traffic, and search autocomplete. I may want to add these to my list.

Complete list of ideas generated

Challenges in generating recent data

While generating ideas for trend sources is relatively easy, using an LLM to list actual reports is way more tricky. The reason is LLM knowledge cutoff date. Trend sources can be generated from LLM's parametric knowledge based on pre-training data. Listing latest reports and links requires LLM to use agentic capabilities to call a search tool and then generate a response. If the search results are not deep enough (number of pages of search, crawling deep links, searching within content) then the context available to the LLM will be limited which in turn will generate sub-optimal responses.

ChatGPT 4 does this with Bing search integration, however there are several limitations even with a simple prompt like this one.

ChatGPT 4 prompt and response

It took me 10 attempts at prompt variations to get the desired results. Asking for more than a few columns (like description) in same prompt resulted in "I can't generate..." response. All responses resulted in less than 10 reports. Links returned were not links to reports, instead to source website. And so on. Even this 10th attempt had issues like, I don't know why SC Media is included. One of the primary reasons, why the results are sub-optimal, is that the quick search used by ChatGPT is not deep enough. Yet, the response takes some time to generate.

Extracting a feature for GenAI Advisor

However, this quick experiment gives me a cool feature idea for GenAI Advisor which may address some of the limitations observed.

Feature: GenAI Advisor can use an LLM to continously update its retrieval knowledge base of analyst reports, papers, etc. based on a human curated dataset of trend sources. For now I will call this feature Source Curator.

This feature validates our journey so far - the trend sources list is important. With this feature in mind, now is a good time to iterate further on the trend sources list and start maturing it into a dataset.

Maturing the dataset

As I think about maturing the trend sources list, my first step is thinking through how I will classify and prioritize the trend sources as well as the artifacts these sources deliver. This is important to prioritize microfeatures for the GenAI Advisor. I won't add all the trend sources at once to the app. How do I know where to start? What is the 80/20 rule here? I need a taxonomy to mature my list into a dataset.

To create the taxonomy, I think about how I use the sources myself. What do I value the most. Trust trumps everything else. Have I come to trust the source more than the others? Trust has priority levels - Origin through Mixed. I use numeric order to indicate high (1) to low (6) priority order. Next, I value Complexity (or ease), Accuracy, Frequency, and finally Quantity (density). I debated between Accuracy and Complexity, what comes first. For me, if I do not understand the data source easily (example, technical papers), I will not be able to assess its accuracy anyway, so complexity before accuracy.

Taxonomy for GenAI trend sources dataset

Now that I have the taxonomy, I classify and prioritize my goto list like so. You will note that it is not a perfect multi-column sort, but you get the idea where I am going with this. Certain rankings were obvious, like industry leader content or events. Certain new ones came surprisingly high for me like job postings and Statista. It is also interesting that ChatGPT and Bard are in the middle order, which indicates there is an addressable problem that GenAI Advisor can solve which is harder for generalized LLMs to satisfy.

Classification and prioritization applied to trend sources

Intuitively, this relatively simple taxonomy of five qualitative measures may also provide some side benefits. I want to explore if these can help in model performance evaluation, creating content sourcing guardrails, or controlling agentic actions. More on this in future articles. Another thought I parked earlier was to add hierarchy to my dataset so that I can traverse to the actual artifact (document, report, database) which my GenAI Advisor will consume or generate. I will explore solving this (or if it needs to be solved) in a future article.

Here is a summary of my progress so far. In the prior article I identified the problem to solve and identified six problem discovery patterns. In this article I introduced six techniques to create an app design flow. I applied one of these techniques to create a dataset for GenAI Advisor, starting from a simple list and using an LLM playground to generate ideas. I then introduced a qualitative taxonomy to mature the list into a classified and prioritized dataset which, among other benefits, will help me establish RAG sources for the GenAI Advisor app. Along the way I validated the GenAI Advisor idea, learning from limitations of state of the art LLMs, and extracted a differentiated feature for creating the Source Curator LLM.


The author writes about generative AI to share his personal interest in this rapidly evolving field. The author's opinions are his own and do not represent the views of any employer or other entity with which the author may be associated.


Thanks for reading the AI for Everyone newsletter. Share and subscribe for free so you don't miss the next article and also support my work.


Carol K

Marketing Coordinator for ChatFusion @ ContactLoop | Elevating Customer Engagement with AI-Driven Conversations

9 个月

Manav Sehgal Good share

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了