The Fracking of Information
An oil rig pumping oil from a stack of paper in a grassy field on a summer day via Midjourney

The Fracking of Information

Large language models enable fracking of documents. Historically, extracting value from unstructured text files has been difficult. But LLMs do this beautifully, pumping value from one of the hardest places to mine.

We have a collection of thousands of notes researching startups. We are tinkering with deploying large language models on top of them.

Here are some quick observations about our initial experiments :

The Future is Constellations of Models.?When faced with a search box, a user might ask quantitative questions. For example, how many people from Google have a met in the last month?

Unfortunately, large language models - at least the ones that we have tested - do not answer quantitative questions in this way.

That’s problematic because users don’t stop to think about the type of query (quantitative, classification, segmentation, prediction, etc) before they type it into a search box.

To solve this, knowledge management systems will likely employ a constellation of different models. Perhaps the first model will classify the query, then route it to the right machine learning model to answer.

Summarization works out of the box.?We have been researching the robotic process automation (RPA) space. Here is a subset of that output. One could imagine replacing the background or introduction sections in an investment memo or producing a blog post from this in about 2 minutes on a laptop. Editing remains essential.

Question: write a summary of the RPA space & the opportunities & challenges within it
Answer (took 119.74 s.): The context mentions that there are problems with maintaining consistency and quality in process discovery documents, which can cause issues for business continuity. There may be multiple factors contributing to these difficulties such as high attrition rates or lack of experience among personnel involved during development phase due diligence on large enterprises. However the most important factor…
source_documents/redacted.txt the Rpa market is converging with the process discovery market, so understanding which process is to automate and then automating them are really important.Rpa is a bottoms-up business and process discovery is tops down

Source Identification Matters?LLM models are now linking to the source text. In the example above, the model cites the file (whose name I’ve redacted) & the location of the contributing source.

This behavior matters for two reasons. First, it builds trust & credibility in the model. Questions will inevitably arise from summaries. Drilling down to the root answer should assuage those doubts.

Second, this pattern should limit hallucinations, when models “invent” answers without basis in the source or training data.

Ubiquity means being everywhere.?Our business maintains a single knowledge repository but outputs will appear in email, presentations, investment memos, blog posts, & search results.

New knowledge management systems will find a way to be integrated into all those outputs while respecting permissions, governance, & other policies that matter to a business.

If data is the new oil, then LLMs are the environmentally friendly fracking rigs, blasting value from unstructured text shale formations.

Abhi Mahule

Tech executive with expertise in building large scale systems. Building a GenAI platform to help ecommerce brands with video ads using LLMs. 2x founder | 1 IPO | Ex-Roku | Holder of O-1A extraordinary ability visa

1 年

Nice! Sounds like a perfect use case for deploying a LangChain based solution. Have you looked into it? https://docs.langchain.com/docs/category/use-cases

回复
Pedro Cortés

SaaS Company? I’ll rewrite your vague landing page into a clear, conversion-focused page in 7 business days.

1 年

Great post, Tomasz Tunguz! It's exciting to see how these technologies are shaping the future of information retrieval and analysis.

回复
Mark Ogne

Founder, CEO @ Symplexity.AI, ABM Consortium | B2B AI innovator | Fractional CMO | High-Performance Account-Based Strategy | I Help B2B Companies Find Their 2X Revenue Growth

1 年

Always smart and to the point, Tomasz! Love this quote... "If data is the new oil, then LLMs are the environmentally friendly fracking rigs, blasting value from unstructured text shale formations."

Matthias Kuhn

Co-Founder Converta | ex-Ultimate

1 年

We've experienced first hand how important this is "the first model will classify the query, then route it to the right machine learning model to answer." ??

要查看或添加评论,请登录

Tomasz Tunguz的更多文章

  • Theory Two

    Theory Two

    Today, we’re announcing our second fund of $450m to support our mission of partnering with early stage software…

    22 条评论
  • My Little Library

    My Little Library

    I didn’t notice it at first but there in the back corner of my laptop, I’ve been assembling a little library. The…

    4 条评论
  • 75 Cents per Month

    75 Cents per Month

    What does it cost to have an assistant with you like in the movie Her? The cost of using AI has dropped precipitously…

    12 条评论
  • Small but Mighty AI

    Small but Mighty AI

    77% of enterprise AI usage are using models that are small models, less than 13b parameters. Databricks, in their…

    4 条评论
  • The Post Election Surge is Unevenly Distributed

    The Post Election Surge is Unevenly Distributed

    After the election, the public markets have roared, but not equally. The broad software ecosystem has seen a relatively…

    7 条评论
  • I Talk to Robots While Driving

    I Talk to Robots While Driving

    Over the weekend, I found myself in an hour-long conversation during my drive with an AI. We jumped from discussing…

    12 条评论
  • The White Collar Revolution

    The White Collar Revolution

    The major areas of AI innovation automate white-collar work. Reviewing the BLS’ data on employment for white collar…

    8 条评论
  • Profit Dollars per GPU Dollar

    Profit Dollars per GPU Dollar

    “AWS’ AI business is a multibillion-dollar revenue run rate business that continues to grow at a triple-digit…

    6 条评论
  • My AI Rube Goldberg Machine

    My AI Rube Goldberg Machine

    In yesterday’s post, I calculated the profitability of public software companies. To calculate these figures, I built a…

    9 条评论
  • Productivity One Year from Now

    Productivity One Year from Now

    If AI continues on its current trajectory or accelerates, what will change in your business? We’ve been asking leaders…

    5 条评论

社区洞察

其他会员也浏览了