Making data smoothies
Source: Somewhere in the latent space of DALL-E

Making data smoothies

This article is written for the technically inclined General Contractor interested in how AI is leveraged in modern cloud software, and specifically for construction projects.

In a previous article, I talked about how Constructable uses semantic search to find the needle in the haystack of construction data. In this article, I’ll talk in more depth about how we prepare and maintain data for effective search.

Embeddings are like smoothies

As a quick review, semantic search is a method of searching data by meaning rather than keyword. At the heart of semantic search is the concept of the embedding, a numeric representation of a chunk of data. In Constructable, we convert all of our customer data, whether it’s in a PDF or structured data in a database like daily logs, into embeddings that can be searched by meaning and then fed to an LLM to answer questions, summarize content, or take actions.

Making an embedding is kind of like making a data smoothie. When you make a smoothie, you take a bunch of ingredients (pieces of data), you throw them in the blender (large language model), and what comes out is a delicious uniform liquid that has the flavor of all of the original ingredients, but in a completely different form (a tasty embedding). If you taste the smoothie (compare the distance between embeddings), you can tell that, for example, a banana was one of the ingredients even though the banana no longer exists in its true banana form. A semantic search in a database is like tasting a bunch of smoothies and finding the ones that taste the most like a banana (assuming you happened to be searching for a banana).

Mixing in more ingredients

Search is only as good as the data that is put into the blender. At Constructable, we found that the accuracy of AI search can be improved by enriching embeddings with contextual information, similar to putting more related ingredients into one smoothie. For example, a paragraph of text in a PDF is useful for search, but it’s even more useful if we combine it with the title of the document, the name of the person who uploaded the PDF, the title of the section from which the chunk is taken, and a whole host of other information. This way, if you search for information about the compressive strength of the concrete provided by Tom at Concrete Pros, our search will automatically boost the relevance of paragraphs in PDFs that not only talk about concrete strength, but also paragraphs that were found in documents from our contact Tom at the Concrete Pros subcontractor. It’s like putting jalape?os in a banana smoothie. You are more likely to choose this smoothie if you are searching for a spicy tropical smoothie.

This works great for unstructured data, but it also works wonders for structured data too, and it’s even easier to provide context in embeddings for structured data because the relationships between different pieces of data are known to a much more specific degree. Take daily logs for example. In our database, daily logs can have notes, weather notes, attached photos, comments from other people in the system, connections to markup on the drawings, and connections to topics of conversation between people collaborating on the project. All of this information represents context that can be used to enrich the basic notes that were entered for a project on a particular day. If we throw all of this information into the blender along with the notes for any given daily log, suddenly our search can handle a much broader range of queries.

Keeping your smoothies from going bad

While this is a pretty easy and effective way to boost your search accuracy and effectiveness, there is one main challenge. The data in a system is not static–it changes every day. For example, someone might edit a comment associated with a daily log that we previously generated an embedding for. Or maybe they attach new photos to the daily log, or delete weather information. Now our smoothie for this daily log has spoiled. In other words, we will prioritize the wrong things in a search based on stale information. So we need a way to ensure that we re-blend our smoothies anytime there is an update to one of the ingredients. This requires that we track all of the dependencies for each embedding, and regenerate embeddings dynamically.

Fortunately, we were able to come up with a way to automatically track these dependencies (and keep them up to date as we build new features!) Anytime we create an embedding, we tell our system to go out and find all of the related pieces of information that we want to include. We use unique identifiers (UUIDs) for absolutely everything in our system, so we simply note the id of every piece of information that contributed to the context of an embedding in our database. Then, any time something in our system changes, we perform an efficient search of this list for the id. Anywhere it appears, we know that we have a new smoothie to blend, so we can blend and serve these new smoothies up in a timely fashion so our search system stays healthy and accurate.

Get in touch!

At Constructable, we are passionate about freeing customer data from the confines of PDFs and databases so it can be used to accomplish amazing things using AI. If you’d like a demo, please fill out our contact form or email me directly at [email protected]. I’d love to meet you, and if you are local to the California central coast, maybe we could go grab a smoothie sometime :).

要查看或添加评论,请登录

John Yoder的更多文章

  • Desktop is the new cloud

    Desktop is the new cloud

    Desktop vs cloud In the beginning, there was desktop software. You went to CompUSA, rummaged through a bin of CDs, went…

  • Construction is a scavenger hunt

    Construction is a scavenger hunt

    This article is written for the technically inclined General Contractor interested in how AI is leveraged in modern…

    2 条评论
  • Alive at work

    Alive at work

    Back in 2008, a few years into my career, I read an essay by Paul Graham about the nature of working in large…

    5 条评论
  • Ship Your Code Before You Write It

    Ship Your Code Before You Write It

    Customer love Above is a picture of one of our teams in a moment of great achievement. In the front, an engineer is…

  • Code Like a Fighter Pilot

    Code Like a Fighter Pilot

    There are some surprising similarities between developing a software product and being a fighter pilot. This article…

    2 条评论
  • Are You Finding the Right Software Engineers For the Job?

    Are You Finding the Right Software Engineers For the Job?

    The best engineers "This project will be critical to our success. Let’s get our best software engineers on it…" Maybe…

    6 条评论
  • Programming in Paradise at AppFolio

    Programming in Paradise at AppFolio

    Once a month, I wake up very early, put on a warm jacket, make myself a cup of coffee, and sneak out of the house while…

    7 条评论

社区洞察

其他会员也浏览了