AI Researcher - What it is and why we built it!
Note: Next week we plan to release a Prompt Chain Builder to make it easier to build AI Apps. Please stay tuned! In the meantime, here I talk about our AI Researcher - what it is and why we built it!
What is the Meerkats AI Researcher and why we built it!
To leverage the power of GenAI for content creation, giving it factual and detailed information is key, and it comes only with good research.
Without good research, all the downstream tasks fall apart. To write an insightful article, you need great research - probably the most time consuming part of writing anything.
I found perplexity’s answers were accurate but lacked the depth. AI should be able to figure out what are the related queries that need to be researched upon to fetch me detailed inisghts.
For example: If I want to research on “Latest SEO algorithm leak by Google,” the AI should figure sub-queries such as: What does it mean for SEO agencies, what are the key points to be noted, what are key points to pick for better SEO and so on.
Also, if I want answers from any target sites, which I trust, it but should be able to aggregate answers from those sites e.g. Yahoo or Techcrunch, Bloomberg etc.
Once you have the high quality research, it’s easier to repruspose it to an engaging blog or a Linkedin post by adding a unique perspective - high quality research is the 1st step.
Key Goals of the AI Researcher:
Usability goals:
Let’s dive into tech stack used and structure
The key stack used are: Google SERP API, Web Scraper and LangChain agents and OpenAI. Everything was written in NodeJs as the core backend is in NodeJS.
At a high level, the process can be broken down into 3 steps.
Find relevant sources of information using Google Search
Scrape the content in the links
Assemble all the information into the report using openAI.
领英推荐
Google Search API
We used Google Search API to perform multiple searches and aggregated the URLs that google returned and marked them for scraping.
Cheerio
We used Cheerio to scrape the documents stripping out all the. unnecessary information. We eliminated Javascript, CSS and just focused on the HTML tags and returned in markdown format.
LangChain to find relevant Documents
LLM ready mark down data from the scrapes was converted into embeddings that were converted an finally sent to OpenAI for assembling the report from all embeddings of the markdown data. A key step was also identifying the most relevant documents and sections from the relevant URLs as many may not have been relevant in answering the query. We used the LLMContextRetriever to find the most relevant documents.
Open AI
We used function calling to call tools, aggregate all the information and generate a nice detailed report.
Challenges we ran into
Initial versions the reports came with significant hallucinations.
The key challenges were related to getting the scraped URLs ready for the LLM by stripping out the javascript tags and outputting in Markdown format.
Another challenge that we had to address was making the queries run in parallel so that time that the user has to wait for the report is minmized. Here timeouts and retries were causing a lot of delays in the report generated and we spent quite a bit of time stripping out certain document types etc. so that it didn’t waste time extracting information that would take a lot of time.
Side Note: Managing Javascript worker threads were also causing timeouts and inconsistencies which we were time consuming to debug.
We tried using Beautiful Soup as a scraper but Cheerio turned out to be easier and more accurate in terms of extracting mark down data. Hence, we decided to implement the entire scraping module in Cheerio.
Future Plans
Key features that we are planning to add is ability to provide a Table of Contents to the researcher so it can do research on individual subtopics and create a detailed report of up to 5K words that can be used for creating high quality blogs, newsletters etc. For this we're using Langchain and increasing the number of AI agents to handle the complexity.
Also, we are currently working on providing an easy to way to include your POV and customize the research content into high quality SEO ready article.
I am attaching the link to the tool here: https://apps.meerkats.ai/playground/crux-ai
Please do try out Crux AI researcher and let me know your thoughts. Next week we talk about our Prompt Chain Builder!
Builder | Analytics & Data Leader: Strategy, Architecture, Build, Launch | From pre-A to post-IPO | 2 Exits | Former Synctera, Facebook
8 个月I love the idea of the table of contents. Much as in a academic paper it would be great to see attributions and sources.