登录查看更多内容

An interesting, if frustrating, real-world AI use case

Jay Heavner

发布日期: 2024年6月7日

I'm buying some land that adjoins a national forest. My plan is to use it for camping and recreation in the short term, but in the longer term, I'd like to potentially develop it into a bigger recreational area that I could share with others. The problem is, the lot I'm buying is only 2 acres, and I would need more than that to develop it. Luckily, my dad is a land surveyor and had some info, including an archaic website that allowed me to pull all the historic deed records for the area. Unfortunately, the output is a mess. I've got a bunch of barely readable PDF files. Historically, this would have been a time for a highlighter and red marker combo, but I decided to try and solve my problem with various AI tools.

First up, AWS Textract. I wrote a simple script to run the PDF files through Textract to get something I could read. I then dumped that output back into ChatGPT 4.0 and started asking some simple questions. Fun! Not super helpful though. I wanted tabular data that I could sort, filter, and run queries against.

Some of the data was already tabular-ish. There were blocks of data, but they weren't joined together in meaningful ways. There might be 10 rows with 8 columns in one spot, and the other 6 columns of the data would be somewhere else. Back to ChatGPT!

Long story short, ChatGPT could see the problem but was unable to combine the data no matter how hard I prompted. I'll confess, at one point, I opened it up in Excel and started manually copying and pasting and moving things around. After about five minutes of that, I realized it would take 60-90 minutes to complete, and I wasn't going to do that. Much better to spend 12 hours writing code to automate a 90-minute task.

One rule about my code: it all had to be written by AI. I would give it the prompts, and I would run the code it outputted. To make it even more...pointless(?), I made it create an environment from scratch where I could run the code. To make sure I didn't cheat too much, I made it write Python code. I don't know Python, so I could orchestrate commands, but I couldn't write the code myself.

Off I went! I bounced back and forth between ChatGPT 4.0 and Claude Opus. I have a personal license for both tools. The beauty of these tools is that they seem to have good and bad days like a normal human. Overall, ChatGPT did better work for me.

I finally got data into a spreadsheet! It took a long time, and my wife told me more than once to stop fighting with the AI. But wait, there's more, and it's not great. Within these records, there is a Description field, and this is where a lot of the good information is. Things like the last price paid, lot size, and other acronyms like TD BK, which I'm sure are important. I must have the data in this field represented in my spreadsheet.

This is where the magic of AI finally materialized. So far, I've spent a lot of time automating boring tasks that would have taken considerably less time to just do than what I spent wrangling AI to do (other than Textract, which is amazing). I went back to ChatGPT and fed it some of the descriptions and asked it to explain them to me. It did. Turns out TD BK is Trust Deed Book (and I still don't know what that is, but I feel empowered. I did ask it to tell me what it was, but the answer was boring, and I didn't care).

From there, I told it to pull all the important information from the description with industry-appropriate labels and put it into a list. Next, I told it to create a sample JSON object so I could use it programmatically. Next, and this is the important part, I told it to generate a prompt that I could use programmatically with AWS Bedrock. AWS Bedrock is a programmatic wrapper for a lot of different AI models. It doesn't have ChatGPT or Gemini, but it has Claude (including Opus), Llama3, Mistral, and others.

Let's be clear: a person without a programming background would not have gotten to this point, and that last paragraph might be word salad to a lot of people.

领英推荐

Leveraging ChatGPT in Data Science

Anurag Harsh 1 年前

How to Use Synthetic and Simulated Data Effectively

Towards Data Science 6 个月前

Building Agentic AI Applications using LangGraph - A…

Data Science Dojo 1 个月前

I went back to ChatGPT and asked it to write me some code using Bedrock and Mistral. Mistral is the least expensive option. IMO, there's a reason it's the least expensive. It would kind of do what it was supposed to do, but it couldn't stop inserting things that I didn't need, or it would leave off an important control character. I probably spent 4 hours and up to 10 cents realizing that it wasn't going to work.

On to Llama3. Llama3 basically nailed it. Yes, it's 2.5x the cost of Mistral, and at scale, that matters, but my total cost, including some debugging, was less than 15 cents.

So now I have this wonderful spreadsheet with useful information that I can sort and filter and do all kinds of things with, and I made AI do it all. Total time spent was 24 hours or so.

Why did I need this thing to begin with?

General Notes/TL;DR

There are a lot of AI tools out there, and picking the right tool for the job is important. Cost matters, but so does your time.

Using tools like ChatGPT 4.0 to help orchestrate bare-metal tools like Textract or Bedrock is amazing. You can open Google Code Playground and start getting results immediately with ChatGPT telling you what to do.

There are a lot of things that GenAI is terrible at. With some persistence, it can close the gap, but knowing when to pivot is critical. My general sense is that if a competent, but lazy, 8th grader could perform the given task with Google, then it has a good chance of producing a result. Although, it also might lie because it doesn't want to work.

Python is a terrible programming language, but it's weirdly interesting. If it were a food, it would be Cheetos.

As a side project, this was all done on my personal time using my personal resources but I will apply my learnings to my day job.

Rick Kowalski

Bridging tech and human insight, I turn data trends into stories and AI into action.

4 个月

Thanks for sharing your account. I've had similar experiences using ChatGPT, Perplexity and Copilot, where the time wrangling the AI has not given me good ROI. ChatGPT has not been great with tabular data, but I could coax it to get acceptable output eventually. It has helped a lot with R scripts - much better than searching stackoverflow. I'm still game for experimenting with it in the hopes of an ROI breakeven point. On another note: I don't know Python but I do like Cheetos; therefore I should try Python?

Patrick Pannett

Tech public affairs & strategic comms counselor @ #CES #CTA

4 个月

Epic journey so far, thanks for sharing!

Jamie Koppersmith

Realtor for DC and the Inner DMV at McEnearney Associates

4 个月

Jay, that was a fascinating read. It has me thinking a out many real estate analysis questions that are just not easily answerable given standard MLS data and how it is organized.

Gary Shapiro

4 个月

Well said! Love your curiousity, diligence, humor and writing style!

1 次回应

April Speight

AI @ Microsoft | Responsible AI | Author

4 个月

Jay, my favorite part was “and I still don't know what that is, but I feel empowered. I did ask it to tell me what it was, but the answer was boring, and I didn't care”. I miss that Jay personality of yours lol ??

查看更多评论

要查看或添加评论，请登录

An interesting, if frustrating, real-world AI use case

Jay Heavner

领英推荐

General Notes/TL;DR

社区洞察

其他会员也浏览了

Data Analytics with Generative AI: A Detailed Guide

Vector search, RAG, and large language models

RAG Unlocks Your Enterprise Data

Issue #272 - The ML Engineer??

Towards Advanced RAG

Issue #226 - THE ML ENGINEER ??

#183 Are Lakehouses Ready for AI Guests?

Artificial Intelligence #87: New low-code data scientist course for domain experts and industry professionals who are non-developers

When GraphRAG Goes?Bad: A Study in Why you Cannot Afford to Ignore Entity Resolution

ODSC’s AI Weekly Recap: Week of March 8th, 2023