登录查看更多内容

GPT-4 Accepts Image Inputs, Here’s What That Means for IDP

super.AI

Process 100% of complex documents

发布日期: 2023年3月21日

OpenAI, the artificial intelligence (AI) research laboratory behind popular generative AI tools DALL-E and ChatGPT, just announced GPT-4. In casual conversation, the company says the newest iteration of its generative pre-trained transformer (GPT) language model is only subtly different from last year’s GPT-3.5. However, as the tasks thrown at GPT-4 become increasingly complex, contrast with the older model becomes more stark.

To demonstrate this, OpenAI researchers use a variety of benchmarks including exams originally designed to test human knowledge across various subjects (e.g., AP Calculus BC, Uniform Bar Exam, GRE Writing, LSAT, etc.). Unsurprisingly, GPT-4 outperforms GPT-3.5 in most instances. What’s even more intriguing is that another new feature of the model—its ability to accept image inputs—leads to even greater performance gains.

GPT-4 is is multimodal, which means it accepts different modalities of data. Specifically, the model is capable of generating text outputs, including natural language and code, from inputs that contain a combination of text and images. According to OpenAI, it has shown similar levels of proficiency on a diverse range of domains, including documents containing text, photographs, diagrams, or screenshots, as it does on inputs that only contain text. Conversely, GPT-3.5 was limited to one modality: text.

No alt text provided for this image — Exam results from the GPT-4 Technical Report.

For some who were anticipating a multimodal GPT-4 that would support audio and video inputs (and potentially diverse forms of output), this development might be viewed as disappointing. But as a company primarily focused on Intelligent Document Processing (IDP), we’re incredibly excited. We’ve written about ChatGPT and the future of IDP before, where we speculated that Large Language Models (LLMs) had the potential to improve data extraction accuracy, respond to natural language queries about critical business information, and simplify the creation of AI applications. The new information revealed about GPT-4 reinforces the potential of those ideas.

In the GPT-4 developer livestream, OpenAI demonstrated the model’s potential when it comes to documents. For example, Greg Brockman, President and Co-Founder of OpenAI, showed how GPT-4 could not only extract information from a rough, hand-drawn website mockup, but take it a step further by converting it into working HTML. Not only does this showcase an impressive ability to recognize handwritten characters, but it also demonstrates an understanding of context and intent that has broad applicability in document processing and analysis.

MIT Technology Review 1 个月前

Microsoft is Playing with Fire GPT-4

Michael Spencer 3 年前

AI News Roundup

Mohammad Arshad 11 个月前

Riding the exponential wave

Moore’s law is an observation that the number of transistors on a microchip doubles every two years, meaning the speed and capability of our computers will increase every two years and the cost will decrease. In comparison, AI is seeing a doubling every 3.5 months. This is why we built a platform to harness the best models, rather than attempt to build and maintain proprietary ones (and compete with industry giants like Microsoft and Amazon). Super.AI customers get rapid access to new models like GPT-4, optimized for processing complex documents.

Expect to see:

Improved zero- and few-shot learning. Limited resources and data shouldn’t stall automation initiatives, and testing indicates generative pre-trained models like GPT-4 will make it possible to do more with less data and training.
Increased automation rates. GPT-4 is capable of augmenting or replacing entirely leading optical character recognition (OCR) models due to better extraction accuracy. Achieve higher automation rates in less time.
Complex question answering. GPT-4 is capable of processing long-form text (up to 25k words), building upon ChatGPT’s question answering abilities. Scour massive document datasets and answer complex questions about them posed in natural language.
Document summarization. Save time reading through long, complex documents searching for key information and instead simply ask GPT-4 to provide a summary for you.
Document classification. Organizing massive document stores is a complex undertaking even with advanced machine learning techniques. GPT-4’s massive training dataset makes it capable of accurate classification of varied document, vastly improving information retrieval.
Faster AI app development. OpenAI showcased GPT-4’s ability to write and troubleshoot code during its developer livestream. This functionality can be used to write custom AI data programs tailored to the unique needs of your business. We are also developing a future “prompt builder” that will allow users to build new data programs using natural language.

These are just some of the initial ideas we have for enhanced functionality GPT-4 can bring to the super.AI platform. In the coming weeks and months we will continue to discover new use cases for the latest and greatest large language models. Stay tuned for additional updates, and don’t hesitate to reach out if you’re interested in learning more about how our platform can benefit your document automation use case.

Super.AI Newsletter

913 位关注者

A. R.

Sharing what I've been through to help you get a better life. Things get better when we learn from each other, did you know that?

1 年

super.AI The introduction of GPT-4, a multimodal AI model capable of accepting image inputs, holds great promise as it has demonstrated comparable skills across various domains, including text, photographs, diagrams, or screenshots. This opens up exciting possibilities for Intelligent Document Processing (IDP) and beyond, despite its limited availability at present. ?? ??

GPT-4 Accepts Image Inputs, Here’s What That Means for IDP

super.AI

Process 100% of complex documents

领英推荐

Riding the exponential wave

Super.AI Newsletter

913 位关注者

super.AI的更多文章

社区洞察

其他会员也浏览了

LLMs, Embeddings, Vector Search and More!

Latest In Web3, AI & Emerging Tech

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

GEMMA, Google's New LLM Model Powered by Gemini Technology

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

?? OpenAI reveal: the world's best AI, never bad prompts again with Anthropic, and other highlights

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Explore the Evolution of GPT-3, the World's Most Influential Language Model: From its Humble Beginnings to Today's ChatGPT - Happy Friday!

LLM Economics: Which is Cheaper to deploy ChatGPT Vs Open Source LLMs?

Retrieval Augmented Generation (RAG) overview

领英推荐

Riding the exponential wave

Super.AI Newsletter

913 位关注者

super.AI的更多文章

Is ChatGPT Super AI?

ChatGPT: Revolutionary or Overhyped?

20+ AI Use Cases (and More) | October 2022

Super.AI Newsletter | September 2022

Super.AI Newsletter | August 2022

社区洞察

其他会员也浏览了

LLMs, Embeddings, Vector Search and More!

Latest In Web3, AI & Emerging Tech

GPT-4 Cheat Sheet: What Is GPT-4, and What Is it Capable Of?

GEMMA, Google's New LLM Model Powered by Gemini Technology

Evolution of AI Language Models: A Comparative Analysis of GPT-3.5 and GPT-4

?? OpenAI reveal: the world's best AI, never bad prompts again with Anthropic, and other highlights

Insider's Edit: OpenAI's Tips for Writing Better Prompts

Explore the Evolution of GPT-3, the World's Most Influential Language Model: From its Humble Beginnings to Today's ChatGPT - Happy Friday!

LLM Economics: Which is Cheaper to deploy ChatGPT Vs Open Source LLMs?

Retrieval Augmented Generation (RAG) overview