登录查看更多内容

Making data smoothies

John Yoder

Co-Founder at Constructable

发布日期: 2024年12月17日

This article is written for the technically inclined General Contractor interested in how AI is leveraged in modern cloud software, and specifically for construction projects.

In a previous article, I talked about how Constructable uses semantic search to find the needle in the haystack of construction data. In this article, I’ll talk in more depth about how we prepare and maintain data for effective search.

Embeddings are like smoothies

As a quick review, semantic search is a method of searching data by meaning rather than keyword. At the heart of semantic search is the concept of the embedding, a numeric representation of a chunk of data. In Constructable, we convert all of our customer data, whether it’s in a PDF or structured data in a database like daily logs, into embeddings that can be searched by meaning and then fed to an LLM to answer questions, summarize content, or take actions.

Making an embedding is kind of like making a data smoothie. When you make a smoothie, you take a bunch of ingredients (pieces of data), you throw them in the blender (large language model), and what comes out is a delicious uniform liquid that has the flavor of all of the original ingredients, but in a completely different form (a tasty embedding). If you taste the smoothie (compare the distance between embeddings), you can tell that, for example, a banana was one of the ingredients even though the banana no longer exists in its true banana form. A semantic search in a database is like tasting a bunch of smoothies and finding the ones that taste the most like a banana (assuming you happened to be searching for a banana).

Mixing in more ingredients

Search is only as good as the data that is put into the blender. At Constructable, we found that the accuracy of AI search can be improved by enriching embeddings with contextual information, similar to putting more related ingredients into one smoothie. For example, a paragraph of text in a PDF is useful for search, but it’s even more useful if we combine it with the title of the document, the name of the person who uploaded the PDF, the title of the section from which the chunk is taken, and a whole host of other information. This way, if you search for information about the compressive strength of the concrete provided by Tom at Concrete Pros, our search will automatically boost the relevance of paragraphs in PDFs that not only talk about concrete strength, but also paragraphs that were found in documents from our contact Tom at the Concrete Pros subcontractor. It’s like putting jalape?os in a banana smoothie. You are more likely to choose this smoothie if you are searching for a spicy tropical smoothie.

领英推荐

AI Scraping for product data now available in Zyte API

Zyte 1 年前

From a PoC to a robust MVP with LLMs

AgileEngine 1 年前

EinsteinGPT For Developers: How To Use It? How not to…

Andres Perez 1 年前

This works great for unstructured data, but it also works wonders for structured data too, and it’s even easier to provide context in embeddings for structured data because the relationships between different pieces of data are known to a much more specific degree. Take daily logs for example. In our database, daily logs can have notes, weather notes, attached photos, comments from other people in the system, connections to markup on the drawings, and connections to topics of conversation between people collaborating on the project. All of this information represents context that can be used to enrich the basic notes that were entered for a project on a particular day. If we throw all of this information into the blender along with the notes for any given daily log, suddenly our search can handle a much broader range of queries.

Keeping your smoothies from going bad

While this is a pretty easy and effective way to boost your search accuracy and effectiveness, there is one main challenge. The data in a system is not static–it changes every day. For example, someone might edit a comment associated with a daily log that we previously generated an embedding for. Or maybe they attach new photos to the daily log, or delete weather information. Now our smoothie for this daily log has spoiled. In other words, we will prioritize the wrong things in a search based on stale information. So we need a way to ensure that we re-blend our smoothies anytime there is an update to one of the ingredients. This requires that we track all of the dependencies for each embedding, and regenerate embeddings dynamically.

Fortunately, we were able to come up with a way to automatically track these dependencies (and keep them up to date as we build new features!) Anytime we create an embedding, we tell our system to go out and find all of the related pieces of information that we want to include. We use unique identifiers (UUIDs) for absolutely everything in our system, so we simply note the id of every piece of information that contributed to the context of an embedding in our database. Then, any time something in our system changes, we perform an efficient search of this list for the id. Anywhere it appears, we know that we have a new smoothie to blend, so we can blend and serve these new smoothies up in a timely fashion so our search system stays healthy and accurate.

Get in touch!

At Constructable, we are passionate about freeing customer data from the confines of PDFs and databases so it can be used to accomplish amazing things using AI. If you’d like a demo, please fill out our contact form or email me directly at [email protected]. I’d love to meet you, and if you are local to the California central coast, maybe we could go grab a smoothie sometime :).

要查看或添加评论，请登录

John Yoder的更多文章

Desktop is the new cloud

2024年12月19日

Desktop is the new cloud

Desktop vs cloud In the beginning, there was desktop software. You went to CompUSA, rummaged through a bin of CDs, went…
Construction is a scavenger hunt

2024年12月13日

Construction is a scavenger hunt

This article is written for the technically inclined General Contractor interested in how AI is leveraged in modern…

2 条评论
Alive at work

2024年12月9日

Alive at work

Back in 2008, a few years into my career, I read an essay by Paul Graham about the nature of working in large…

5 条评论
Ship Your Code Before You Write It

2018年7月18日

Ship Your Code Before You Write It

Customer love Above is a picture of one of our teams in a moment of great achievement. In the front, an engineer is…
Code Like a Fighter Pilot

2018年7月9日

Code Like a Fighter Pilot

There are some surprising similarities between developing a software product and being a fighter pilot. This article…

2 条评论
Are You Finding the Right Software Engineers For the Job?

2018年7月2日

Are You Finding the Right Software Engineers For the Job?

The best engineers "This project will be critical to our success. Let’s get our best software engineers on it…" Maybe…

6 条评论
Programming in Paradise at AppFolio

2018年6月25日

Programming in Paradise at AppFolio

Once a month, I wake up very early, put on a warm jacket, make myself a cup of coffee, and sneak out of the house while…

7 条评论

See all articles

Making data smoothies

John Yoder

Co-Founder at Constructable

Embeddings are like smoothies

Mixing in more ingredients

领英推荐

Keeping your smoothies from going bad

Get in touch!

John Yoder的更多文章

社区洞察

其他会员也浏览了

Demystifying RIS file format

Tips and Tricks for Advanced Strategies in Web Scraping and Price Intelligence

Why Bolt DIY Is the Best AI Code Editor You’ve Never Heard Of...

Fine Tuning on Single and Multiple Tasks: Part 2 of my Fine-Tuning Series of Blogs

Web Scraping Software Market Comprehensive Study Explores Huge Growth in Future

Webinar "A Whirlwind Tour of ML Model Serving Strategies (Including LLMs)"

How to Extract All URLs from a PDF Document

Guide For AI-Powered Web scraping

Why Library Catalog Data Remains Invisible on the Web: Revealing the Challenges and Solutions

Principles for naming taxonomy terms (tags) – the devil is in the details

Embeddings are like smoothies

Mixing in more ingredients

领英推荐

Keeping your smoothies from going bad

Get in touch!

John Yoder的更多文章

Desktop is the new cloud

Construction is a scavenger hunt

Alive at work

Ship Your Code Before You Write It

Code Like a Fighter Pilot

Are You Finding the Right Software Engineers For the Job?

Programming in Paradise at AppFolio

社区洞察

其他会员也浏览了

Demystifying RIS file format

Tips and Tricks for Advanced Strategies in Web Scraping and Price Intelligence

Why Bolt DIY Is the Best AI Code Editor You’ve Never Heard Of...

Fine Tuning on Single and Multiple Tasks: Part 2 of my Fine-Tuning Series of Blogs

Web Scraping Software Market Comprehensive Study Explores Huge Growth in Future

Webinar "A Whirlwind Tour of ML Model Serving Strategies (Including LLMs)"

How to Extract All URLs from a PDF Document

Guide For AI-Powered Web scraping

Why Library Catalog Data Remains Invisible on the Web: Revealing the Challenges and Solutions

Principles for naming taxonomy terms (tags) – the devil is in the details