登录查看更多内容

Observations Using LLMs Every Day for Two Months

Tomasz Tunguz

发布日期: 2023年5月31日

I’ve been using large-language models (LLMs) most days for the past few months for three major use cases : data analysis, writing code, & web search1.

Here’s what I’ve observed:

First, coding incrementally works better than describing a full task all at once.

Second, coding LLMs struggle to solve problems of their own creation, turning in circles, & debugging can require significant work.

Third, LLMs could replace search engines if their indexes contain more recent or evergreen data for summarization searches but not for exhaustive ones.

Let me share some examples :

This weekend, I wanted to clean up some HTML image links in older blog posts & modernize them to markdown format. That requires uploading images to Cloudinary’s image hosting service & using the new link. I typed this description into ChatGPT.?See the transcript here ?:

create a ruby script to go through every markdown file in a folder & find html image tags & rewrite them as markdown image tags. but replace the url with a url from cloudinary. to get the cloudinary url, create a function to hit the cloudinary api to send the image there, then parse the response to retrieve the url for the markdown update.

The script failed to update the files. Subsequent iterations don’t solve the issue. The engine becomes “blind” to the error & reformulates the solution with a similar fundamental error with each regeneration.

But, if I guide the computer through each step in a program, as I did for the?recent Nvidia analysis , the engine succeeds in both accurately formatting the data & writing a function to replicate the analysis for other metrics.2

Pluralsight 9 个月前

Dash Club 10: Dash Enterprise 5.1, Dash-ChatGPT App…

Plotly 1 年前

??Top ML Papers of the Week

DAIR.AI 3 个月前

For web search, I created a little script to open chatGPT for search instead of Google each time I type in a query. Typing in queries feels very much like using Google for the first time on the high school library’s computer : I’m iterating through different query syntaxes to yield the best result.

The summarization techniques often produce formulaic content. On a recent rainy day, I asked what to do in San Francisco, Palo Alto, & San Jose. Each of the responses contained a local?museum, shopping, & a spa ?recommendation. Search results MadLibs!

The challenge is that these “search results pages” don’t reveal how extensive the search was : how many of the TripAdvisor top 20 recommendations were consulted? Might a rarer indoor activity like rock climbing be of interest? There’s a user-experience - even a new product opportunity - in solving that problem.

Recency matters : ChatGPT is trained on web data through 2021, which turns out to be a significant issue because I often search for newer pages. An entire generation of web3 companies doesn’t yet exist in the minds of many LLMs. So, I query Google Bard instead.

These early rough edges are to be expected. Early search engines, including Google, also required specialized inputs/prompts & suffered from lesser quality results in different categories. With so many brilliant people working in this domain, new solutions will certainly address these early challenges.

1?I’ve written about using LLMs for image generation in a post called?Rabbits on Firetrucks. ?& my impressions there remain the same : it’s great for consumer use cases but hard to drive the precision needed for B2B applications.

2?To analyze the NVDA data set, I use comments - which start with # - to tell the computer how to clean up a data frame before plotting it. Once achieved, I tell the computer to create a function to do the same called make_long()1.

# read in the tsv file nvda
nvda = read_tsv("nvda.tsv")
# pull out the third row & call it revenue
revenue = nvda[2,]; head(revenue)
# set colnames equal to sequence of 2004 to 2023 by 1
colnames(revenue) = c("field", seq(2004, 2023, 1))
# make revenue long
revenue_long = gather(revenue, year, value)
# set colnames to year and revenue
colnames(revenue_long) = c("year", "revenue")
...
# plot revenue by year on a line chart with the caption tomtunguz.com and the the line color red with a size of 2
ggplot(revenue_long, aes(x = year, y = revenue/1e3, group = 1)) + geom_line(color = "red", size = 2) + labs(title = "Nvidia Revenue Grew 2.5x in 2 Years", subtitle = "Revenue Has Been Flat in the Last Year but AI Growing Fast", x = "", y = "Revenue, $b", caption = "tomtunguz.com") + scale_y_continuous(limits = c(0,30), breaks = seq(0, 30, by=5)) 

# create a function to take a row from the nvda data set, make it long, convert both columns to numeric
# and delete where there is na
make_long = function(row) {
  colnames(row) = c("field", seq(2004, 2023, 1))
  row = gather(row, year, value)
  colnames(row) = c("year", "value")
  row$value = as.numeric(row$value)
  row$year = as.numeric(row$year)
  row = row[!is.na(row$value),]
  row = row[!is.na(row$year),]
  return(row)
}

Tomasz Tunguz

114,217 位关注者

Sascha Darius Mojtahedi

Making Stuff

1 年

You should check this out https://twitter.com/paralleltcg/status/1667424581903167491?s=46&t=rDzLUhKbzPKV1mAXTt2StQ

1 次回应

Christian Finstad

Chief Sales Officer Mentimeter

1 年

Fredrik Nordstr?m Niklas Ingvar ??

2 次回应

Mark Ralls

President at Auvik

1 年

"how many of the TripAdvisor top 20 recommendations were consulted? Might a rarer indoor activity like rock climbing be of interest? There’s a user-experience - even a new product opportunity - in solving that problem." Agree on the problem, but wonder if it's hard to solve. If I ask BingGPT for some ideas on what to do I get the usual fare of museums and such. But if I ask it to interview me first on likes & dislikes and then make a recommendation then it does a better job. But clearly room for more interactivity and less "you asked a very specific question, let me give you an answer to that" in LLMs in general.

1 次回应

Yash Piplani

1 年

LLMs have become indispensable tools for various tasks like data analysis, coding, and web search. Their versatility and efficiency are truly remarkable.

1 次回应

CHESTER SWANSON SR.

Next Trend Realty LLC./wwwHar.com/Chester-Swanson/agent_cbswan

1 年

Thanks for Sharing.

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Observations Using LLMs Every Day for Two Months

Tomasz Tunguz

领英推荐

Tomasz Tunguz

114,217 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

Issue #195 - THE ML ENGINEER ??

Notes on Data Compression: Part 4 (JPEG)

The Unofficial Guide to Picking the Right Coding AI Assistant for Software Developers

Image Watermarking Using Computer Vision

TensorFlow.js Monthly #7: RoboFlow.js, Coral Edge TPU acceleration for Node.js, and OCR recognition in the browser

LLM Foundations: Constructing and Training Decoder-Only Transformers

Building Applications with LLMs

Building a simple Agent using LangChain

For everything you do...there is an app on YOU

领英推荐

Tomasz Tunguz

114,217 位关注者

Theory Two

2024年11月22日

My Little Library

2024年11月20日

75 Cents per Month

2024年11月19日

Small but Mighty AI

2024年11月15日

The Post Election Surge is Unevenly Distributed

2024年11月11日

I Talk to Robots While Driving

2024年11月8日

The White Collar Revolution

2024年11月6日

Profit Dollars per GPU Dollar

2024年11月5日

My AI Rube Goldberg Machine

2024年10月29日

Productivity One Year from Now

2024年10月28日

社区洞察

其他会员也浏览了

Custom Enterprise LLM/RAG with Real-Time Fine-Tuning

Issue #195 - THE ML ENGINEER ??

Notes on Data Compression: Part 4 (JPEG)

The Unofficial Guide to Picking the Right Coding AI Assistant for Software Developers

Image Watermarking Using Computer Vision

TensorFlow.js Monthly #7: RoboFlow.js, Coral Edge TPU acceleration for Node.js, and OCR recognition in the browser

LLM Foundations: Constructing and Training Decoder-Only Transformers

Building Applications with LLMs

Building a simple Agent using LangChain

For everything you do...there is an app on YOU