登录查看更多内容

Improving A.I. Chats with Multi-Modal Integration

Rand Morimoto

President - Convergent Computing

发布日期: 2024年10月1日

A.I. Chats have historically been asking questions to text-based content, where A.I. responses come from the words in PDFs, DOCs, and the like. However many documents have charts, graphs, tables, and images that hold valuable information that is an important part of the material in a document.

The way to improve chats on these documents is to include the analysis of this non-text information into the chat results in an A.I. conversation, and that multi-modal technology exists for integration!

Example Where a Chart is Where the Answer Resides

In an A.I. Chat where the question "What has been the year by year data for the S&P 500 from 2018 to 2022" asked of content that was sitting in a PowerPoint presentation, in a text only response, A.I. would not have been able to respond because the text in the presentation slide (below) didn't have the "words" required to answer the question.

However with multi-modal support where a graph itself is assessed by A.I. and its analysis becomes part of the knowledgebase used to answer questions about the information in the chart, the multi-modal response provides a year by year response as shown in the screenshot below:

Similarly, when asked a question about data that resides in a table that is embedded in a document, like "What is the correlation between Stock Indices, Cryptocurrency Prices, and Commodity Prices as it relates to the S&P 500?", A.I. chat responses are assisted with multi-modal content analysis where A.I. captured data from the rows and columns of a table (like the one below):

that was used to generate the chat response as follows:

领英推荐

Possible profit pools in Gen AI Stack

Pramod Gosavi 1 年前

How AgentOps Helps Developers Build AI Agents and…

Greggory Elias 4 个月前

How Credit Unions can adopt generative AI

John Giordani, DIA 1 年前

Concept of Split Skill Analysis

The supporting technology that helps Microsoft A.I. segment out content as Text vs Charts vs Tables vs Images is all part of Split Skill Analysis, and Document Intelligence.

These technologies scan a document (PDF, DOCX, HTML, TXT, PPTX, JPG, etc), identifies that content has "changed" (went from text to something else, like maybe a table or chart), and to handle each separate piece of the document in a manner that provides the best analysis.

So for a table embedded in a document, it looks at rows and columns.

For a chart embedded in a document it looks at X and Y axis and lines/bars.

For images, it starts to vectorize the image to determine what the image is that'll help describe the graphic for later recall.

Wrap-up

There was a time what A.I. chats solely depended on text-based information to make up the knowledge available to ask questions. However through the inclusion of split skills and multi-modal analysis, valuable information sitting in charts, graphs, tables, and images can now be analyzed and have its data included in the A.I. chat response.

Steven S. Tuma

Corporate Technical Development Manager at Middleby

1 个月

Very informative

1 次回应

要查看或添加评论，请登录

Rand Morimoto的更多文章

50 Years Ago Today, Don Johanson Discovered Lucy that Changed History Books Forever

2024年11月26日

50 Years Ago Today, Don Johanson Discovered Lucy that Changed History Books Forever

Before the 3.2-mil year old Lucy fossils were discovered, we only "believed" that humans descended from apes, and just…

8 条评论
Real Business Benefits from Microsoft 365 CoPilot

2024年11月13日

Real Business Benefits from Microsoft 365 CoPilot

Microsoft 365 CoPilot has been around for more than a year, and most of the people I talk to that are using it are…

2 条评论
Vote! Then Stay Offline the Next Couple Days!

2024年11月5日

Vote! Then Stay Offline the Next Couple Days!

Today's the day here in the States, the U.S.

3 条评论
This Month's Astrophotography - Aurora Borealis and Comet A3

2024年10月22日

This Month's Astrophotography - Aurora Borealis and Comet A3

A couple exciting astronomical events this month with a massive flare up of the Aurora Borealis lighting up the skies…

2 条评论
Building Sophisticated A.I. Chat Models Using GraphRAG

2024年10月8日

Building Sophisticated A.I. Chat Models Using GraphRAG

The latest thing in A.I.

1 条评论
Latest Toy - A Laser Cutter / Engraver - for Home Projects

2024年9月24日

Latest Toy - A Laser Cutter / Engraver - for Home Projects

A little while back I posted an article on the latest 3D printer (toy) we got to fiddle with, and while working on a 3D…

3 条评论
Addressing Key Business Topics - Audit Compliance, Business Continuity, and Data Governance

2024年9月18日

Addressing Key Business Topics - Audit Compliance, Business Continuity, and Data Governance

Beyond technical topics of A.I.

4 条评论
Azure A.I. Model Use, Token Use, and "Good Answers"

2024年9月10日

Azure A.I. Model Use, Token Use, and "Good Answers"

With multiple versions of ChatGPT available now, and an eye on Token consumption to answer questions, the latest focus…
A.I. Chat Tokens Drive Up Azure A.I. Costs

2024年9月3日

A.I. Chat Tokens Drive Up Azure A.I. Costs

There was a time not too long ago that I used to say you couldn't use up enough tokens in a private Azure A.I.

4 条评论
Pinnacle Point is now a UNESCO World Heritage Site!

2024年8月28日

Pinnacle Point is now a UNESCO World Heritage Site!

Just last month I wrote about a vacation we took this summer to South Africa and was toured around by Prof Curtis…

2 条评论

See all articles

Improving A.I. Chats with Multi-Modal Integration

Rand Morimoto

President - Convergent Computing

领英推荐

Rand Morimoto的更多文章

社区洞察

其他会员也浏览了

‘Real Intelligence that objectively challenges Artificial Intelligence (‘AI’) with FACT will make our Intellectual Property (‘IP’) even more valuable

Contextual Blinders and Multi-Pass flows for LLM Chatbots

July 09, 2024

Testing IP Proxies

November 20, 2023

Dear Mr. President: A Note on AI Policy

G-FINTECH: Generative AI's Transformative Role in the Financial Services Industry

How Do Private LLMs Transform Your Data into Precious Safe Assets, Emerging as Saviors for Enterprises – Shifting from Generic Bots to Bespoke Brains?

No-code ≠ No-logic: prompt engineering and the future of talking to machines

Rapid AI Insights: Edition 33

领英推荐

Rand Morimoto的更多文章

50 Years Ago Today, Don Johanson Discovered Lucy that Changed History Books Forever

Real Business Benefits from Microsoft 365 CoPilot

Vote! Then Stay Offline the Next Couple Days!

This Month's Astrophotography - Aurora Borealis and Comet A3

Building Sophisticated A.I. Chat Models Using GraphRAG

Latest Toy - A Laser Cutter / Engraver - for Home Projects

Addressing Key Business Topics - Audit Compliance, Business Continuity, and Data Governance

Azure A.I. Model Use, Token Use, and "Good Answers"

A.I. Chat Tokens Drive Up Azure A.I. Costs

Pinnacle Point is now a UNESCO World Heritage Site!

社区洞察

其他会员也浏览了

‘Real Intelligence that objectively challenges Artificial Intelligence (‘AI’) with FACT will make our Intellectual Property (‘IP’) even more valuable

Contextual Blinders and Multi-Pass flows for LLM Chatbots

July 09, 2024

Testing IP Proxies

November 20, 2023

Dear Mr. President: A Note on AI Policy

G-FINTECH: Generative AI's Transformative Role in the Financial Services Industry

How Do Private LLMs Transform Your Data into Precious Safe Assets, Emerging as Saviors for Enterprises – Shifting from Generic Bots to Bespoke Brains?

No-code ≠ No-logic: prompt engineering and the future of talking to machines

Rapid AI Insights: Edition 33