登录查看更多内容

How Gemini and GPT4 completely messed a standard task that Claude 3 easily did.

Balaji Viswanathan Ph.D.

CEO, Brahmasumm. Building document AI at scale -- organizing, searching and summarizing enterprise data.

发布日期: 2024年3月5日

I wanted to try the best in class LLMs to understand a moderately complex table. This is a fairly standard handwritten invoice. It doesn't have nested tables or any monstrous stuff.

Let's see what each of them did. Gemini Advanced very confidently picked the task and made a complete mess of it. The number of items, their names and their prices are all wrong.

ChatGPT 4 tried it a few times and put a lot of effort and after 3 retries simply gave up.

The same in Claude 3 launched a few hours ago.

Vineet S.

Product Manager - Innovation Lead, EY

1 年

Arghya, ??Praveen FYI please

Ravi Narsini

?? Sr Technical/Delivery Manager (.NET) - Digital Innovations & Transformation | ?? Technologist | ?? Cloud Computing | ?? Automation | ?? Data & AI Practitioner

1 年

Hi Dr. Balaji Viswanathan Thanks for sharing this. I am struggling since last few weeks on similar usecase.. I will check on this now.

2 次回应

Divyang Patel

Building Real-World AI use-cases

1 年

Well, from last 3 months I am trying to find/train LLMs to generate insights from a tabular data wih less than 100 rows and it's extremely difficult. I find LLMs cannot comprehend multi-modality of the data and it makes stupid mistakes like taing average of averages to summarize the data. Will give claude3 a try to see if it can comprehend it.

1 次回应

查看更多评论

要查看或添加评论，请登录

Balaji Viswanathan Ph.D.的更多文章

What's So Challenging About Building Chatbots? Drawing lessons from the trenches.

2024年4月27日

What's So Challenging About Building Chatbots? Drawing lessons from the trenches.

Everything looks easy until you are the one building it. In early 2023, I was discussing with a friend who was…

8 条评论
Wondering How to Hire an AI? Evaluating Large Language Models (LLMs) could be made better.

2024年4月24日

Wondering How to Hire an AI? Evaluating Large Language Models (LLMs) could be made better.

Summary: We will cover a new way to approach AI benchmarks and evaluate one of the recent models -- Microsoft Phi3 with…
How do we evaluate the Multimodal Models for key enterprise tasks?

2024年3月21日

How do we evaluate the Multimodal Models for key enterprise tasks?

A lot of the benchmarks to evaluate LLMs come from toy problems that researchers create. While it is interesting to see…

2 条评论
Candid Lessons in Building Humanoids

2024年3月20日

Candid Lessons in Building Humanoids

A few years ago, at a technology event, a senior executive from a leading Japanese robotics company pulled me during…

10 条评论
Langflow: A simple way to build LLM applications locally without code.

2024年3月8日

Langflow: A simple way to build LLM applications locally without code.

In the olden days, we used to spend days and weeks to build a smart chatbot with a reasonable interface. Now, you can…
Implementing Modern AI in your Enterprise.

2023年5月5日

Implementing Modern AI in your Enterprise.

Nearly all of you would already be using Large Language Models (LLMs) such as ChatGPT. How do you turn this into…
The cool robots we build: Big collection of unedited videos and pictures of our robots.

2020年12月12日

The cool robots we build: Big collection of unedited videos and pictures of our robots.

At Invento, we are about solving complex problems in robotics. We build autonomously navigating robots packed with a…

2 条评论
Pivoting in the COVID era.

2020年5月14日

Pivoting in the COVID era.

On March 1 we were staring at a major crisis. The way we do business was getting fundamentally altered.

12 条评论
Why were the desperate Indian migrants take railway tracks to get home?

2020年5月10日

Why were the desperate Indian migrants take railway tracks to get home?

As an 8 year old I once walked 7km on a railway track after I missed a bus to my village -- about 12km from the coastal…

2 条评论
Email is wrongly vilified

2020年4月27日

Email is wrongly vilified

"Email is the worst form of communication, except for all the other tools." Paraphrasing an old saying [wrongly…

2 条评论

See all articles

How Gemini and GPT4 completely messed a standard task that Claude 3 easily did.

Balaji Viswanathan Ph.D.

CEO, Brahmasumm. Building document AI at scale -- organizing, searching and summarizing enterprise data.

Balaji Viswanathan Ph.D.的更多文章

社区洞察

其他会员也浏览了

Is Chat GPT 5 coming? Also TFG v. TFB (Tech For Good #004)

We built a palm-reading GPT just for the lulz

HULK SMASH THE GAME

ChatGPT Pro: It costs 10 times more, but is it 10 times better?

Deepseek maybe Overhyped (at this point)

With All the Hype Around DeepSeek R1, Can We Finally Run a Competent Language Model on a Normal Laptop for FREE?

The Masala Magic of ChatGPT!

Time to Abandon the Chat-bot interface

Data analysis of customer reviews on Clutch. Part 2

Dark Horizons: ChatGPT Takes Key Role in Army Investigation of Field Grade Officer

Balaji Viswanathan Ph.D.的更多文章

What's So Challenging About Building Chatbots? Drawing lessons from the trenches.

Wondering How to Hire an AI? Evaluating Large Language Models (LLMs) could be made better.

How do we evaluate the Multimodal Models for key enterprise tasks?

Candid Lessons in Building Humanoids

Langflow: A simple way to build LLM applications locally without code.

Implementing Modern AI in your Enterprise.

The cool robots we build: Big collection of unedited videos and pictures of our robots.

Pivoting in the COVID era.

Why were the desperate Indian migrants take railway tracks to get home?

Email is wrongly vilified

社区洞察

其他会员也浏览了

Is Chat GPT 5 coming? Also TFG v. TFB (Tech For Good #004)

We built a palm-reading GPT just for the lulz

HULK SMASH THE GAME

ChatGPT Pro: It costs 10 times more, but is it 10 times better?

Deepseek maybe Overhyped (at this point)

With All the Hype Around DeepSeek R1, Can We Finally Run a Competent Language Model on a Normal Laptop for FREE?

The Masala Magic of ChatGPT!

Time to Abandon the Chat-bot interface

Data analysis of customer reviews on Clutch. Part 2

Dark Horizons: ChatGPT Takes Key Role in Army Investigation of Field Grade Officer