Improving A.I. Chats with Multi-Modal Integration
A.I. Chats have historically been asking questions to text-based content, where A.I. responses come from the words in PDFs, DOCs, and the like. However many documents have charts, graphs, tables, and images that hold valuable information that is an important part of the material in a document.
The way to improve chats on these documents is to include the analysis of this non-text information into the chat results in an A.I. conversation, and that multi-modal technology exists for integration!
Example Where a Chart is Where the Answer Resides
In an A.I. Chat where the question "What has been the year by year data for the S&P 500 from 2018 to 2022" asked of content that was sitting in a PowerPoint presentation, in a text only response, A.I. would not have been able to respond because the text in the presentation slide (below) didn't have the "words" required to answer the question.
However with multi-modal support where a graph itself is assessed by A.I. and its analysis becomes part of the knowledgebase used to answer questions about the information in the chart, the multi-modal response provides a year by year response as shown in the screenshot below:
Similarly, when asked a question about data that resides in a table that is embedded in a document, like "What is the correlation between Stock Indices, Cryptocurrency Prices, and Commodity Prices as it relates to the S&P 500?", A.I. chat responses are assisted with multi-modal content analysis where A.I. captured data from the rows and columns of a table (like the one below):
that was used to generate the chat response as follows:
领英推荐
Concept of Split Skill Analysis
The supporting technology that helps Microsoft A.I. segment out content as Text vs Charts vs Tables vs Images is all part of Split Skill Analysis, and Document Intelligence.
These technologies scan a document (PDF, DOCX, HTML, TXT, PPTX, JPG, etc), identifies that content has "changed" (went from text to something else, like maybe a table or chart), and to handle each separate piece of the document in a manner that provides the best analysis.
So for a table embedded in a document, it looks at rows and columns.
For a chart embedded in a document it looks at X and Y axis and lines/bars.
For images, it starts to vectorize the image to determine what the image is that'll help describe the graphic for later recall.
Wrap-up
There was a time what A.I. chats solely depended on text-based information to make up the knowledge available to ask questions. However through the inclusion of split skills and multi-modal analysis, valuable information sitting in charts, graphs, tables, and images can now be analyzed and have its data included in the A.I. chat response.
Corporate Technical Development Manager at Middleby
1 周Very informative