AI Researcher - What it is and why we built it!

AI Researcher - What it is and why we built it!


Note: Next week we plan to release a Prompt Chain Builder to make it easier to build AI Apps. Please stay tuned! In the meantime, here I talk about our AI Researcher - what it is and why we built it!

What is the Meerkats AI Researcher and why we built it!

To leverage the power of GenAI for content creation, giving it factual and detailed information is key, and it comes only with good research.

Without good research, all the downstream tasks fall apart. To write an insightful article, you need great research - probably the most time consuming part of writing anything.

I found perplexity’s answers were accurate but lacked the depth. AI should be able to figure out what are the related queries that need to be researched upon to fetch me detailed inisghts.

For example: If I want to research on “Latest SEO algorithm leak by Google,” the AI should figure sub-queries such as: What does it mean for SEO agencies, what are the key points to be noted, what are key points to pick for better SEO and so on.

Also, if I want answers from any target sites, which I trust, it but should be able to aggregate answers from those sites e.g. Yahoo or Techcrunch, Bloomberg etc.

Once you have the high quality research, it’s easier to repruspose it to an engaging blog or a Linkedin post by adding a unique perspective - high quality research is the 1st step.

Key Goals of the AI Researcher:

  1. Generated documents should be accurate
  2. Research should be fast
  3. Users should have a simple, easy to use interface to get the desired results.
  4. Finally, it should be scalable

Usability goals:

  1. Ensure that while the report was being generated, it would keep the user updated on the progress of the reports and that includes mentioning the sites being scanned for the answer.
  2. The document generated should be in mark down format and easily be edited in a Notion like interface. Like, expand text, shorten the text or rewrite a section using AI itself.
  3. Integrations to publish the article on LinkedIn or a blog easily.

Let’s dive into tech stack used and structure

The key stack used are: Google SERP API, Web Scraper and LangChain agents and OpenAI. Everything was written in NodeJs as the core backend is in NodeJS.

At a high level, the process can be broken down into 3 steps.

Find relevant sources of information using Google Search

Scrape the content in the links

Assemble all the information into the report using openAI.

Google Search API

We used Google Search API to perform multiple searches and aggregated the URLs that google returned and marked them for scraping.

Cheerio

We used Cheerio to scrape the documents stripping out all the. unnecessary information. We eliminated Javascript, CSS and just focused on the HTML tags and returned in markdown format.

LangChain to find relevant Documents

LLM ready mark down data from the scrapes was converted into embeddings that were converted an finally sent to OpenAI for assembling the report from all embeddings of the markdown data. A key step was also identifying the most relevant documents and sections from the relevant URLs as many may not have been relevant in answering the query. We used the LLMContextRetriever to find the most relevant documents.

Open AI

We used function calling to call tools, aggregate all the information and generate a nice detailed report.

Challenges we ran into

Initial versions the reports came with significant hallucinations.

The key challenges were related to getting the scraped URLs ready for the LLM by stripping out the javascript tags and outputting in Markdown format.

Another challenge that we had to address was making the queries run in parallel so that time that the user has to wait for the report is minmized. Here timeouts and retries were causing a lot of delays in the report generated and we spent quite a bit of time stripping out certain document types etc. so that it didn’t waste time extracting information that would take a lot of time.

Side Note: Managing Javascript worker threads were also causing timeouts and inconsistencies which we were time consuming to debug.

We tried using Beautiful Soup as a scraper but Cheerio turned out to be easier and more accurate in terms of extracting mark down data. Hence, we decided to implement the entire scraping module in Cheerio.

Future Plans

Key features that we are planning to add is ability to provide a Table of Contents to the researcher so it can do research on individual subtopics and create a detailed report of up to 5K words that can be used for creating high quality blogs, newsletters etc. For this we're using Langchain and increasing the number of AI agents to handle the complexity.

Also, we are currently working on providing an easy to way to include your POV and customize the research content into high quality SEO ready article.

I am attaching the link to the tool here: https://apps.meerkats.ai/playground/crux-ai

Please do try out Crux AI researcher and let me know your thoughts. Next week we talk about our Prompt Chain Builder!

Dave Holmes-Kinsella

Builder | Analytics & Data Leader: Strategy, Architecture, Build, Launch | From pre-A to post-IPO | 2 Exits | Former Synctera, Facebook

8 个月

I love the idea of the table of contents. Much as in a academic paper it would be great to see attributions and sources.

回复

要查看或添加评论,请登录

Santanu DasGupta的更多文章

社区洞察

其他会员也浏览了