Getting started with Azure GPT-4-Turbo Vision in Industry-based scenarios
Bing Image Creator: "a 3d, cute robot with a wonderful and beautiful landscape reflected in its eyes"

Getting started with Azure GPT-4-Turbo Vision in Industry-based scenarios

On December 12, Microsoft announced the Public Preview of the GPT-4-turbo Vision model in Azure.?

GPT-4-turbo vision is a Large Multimodal Model (LMM) developed by OpenAI, which can take as input texts and images. Additionally, within the Azure AI Studio, you can also integrate the model capabilities with Azure AI Services for vision, in order to:

  • Enhance OCR
  • Enhance object detection
  • Enhance video analysis

In this article, we will explore x scenarios of application of the GPT-4-turbo vision within the following contexts: Education, Manufacturing, Healthcare and Life Science, Cultural Heritage, and Fashion.

Education

LLMs have already proved exceptional capabilities within education. One success story is that of Khan Academy, a learning platform that has explored the capabilities of GPT-4 with a personalized learning assistant called Khanmigo that helps students learn given topics.

Incorporating the model’s vision capabilities, we can bring this approach to the next level. Let’s explore a simple example with a physics problem.

This is the prompt I’ll be using:

You are an AI assistant that helps students do their homework. Your goal is to let the student get to the problem’s solution providing hints, not the final solution. Be sure to accompany the student through the learning process, providing examples and also checking here and there the student’s understanding.

The scenario will be the following physics problem:

Source:

I’ll ask my Homework Assistant for some support in solving this problem:?

Let’s see the response:

Let’s say that I really cannot make it, and I just want the solution to the problem:

As you can see, the assistant is very well aligned with our system message, and it won’t give the final solution to the problem. Since I cannot see and shortcut from here, I’ll do my math and find the solution. Let’s check with our assistant whether it is correct!

Cool! The Assistant was able to guide me through the learning process of getting the right result. Plus, it was also able to confirm to me the final solution. That is a great example of a model’s alignment with human instructions (system message).

Manufacturing

Manufacturing is one of the most important sectors of the economy, as it contributes to employment, innovation, and trade. According to the World Bank, the global value added by the manufacturing industry was about 16% of the world's GDP in 2019.

Generative AI has the potential to impact the manufacturing industry in various ways, such as accelerating product development, automating repetitive processes, improving quality control, enhancing innovation and creativity, and optimizing supply chain and logistics.

According to a recent survey made by BCG to manufacturers, those use cases can be grouped into three main areas of innovation: assistance systems, recommendation systems, and autonomous systems?—?that correspond to maturity levels in the “factory of the future”.

GPT4-turbo vision can accelerate this innovation process in two ways: enhancing existing Computer Vision tasks (such as a quality check) or introducing new AI assistants, such as a copilot for plant operators to produce remediation tutorials given a picture of the current environment.

Let’s consider an example where we feed the model with a picture of a damaged cross-section of an electric cable. In this case, I’m also incorporating the Azure AI services for vision, so that we can enhance the model with Object detection capabilities.

First, let’s set our system message:

You are an AI assistant for manufacturers that helps in tasks like defect detection, plant operator’s assistant, and remediation tutorials.

Then, let’s see some interactions:

As you can see, in this scenario the model invoked the Object Detection AI service, which is visible due to the bounding boxes it produced and referenced.

Let’s now ask whether there are some defects within this cable:

And finally, we can ask to generate a remediation tutorial as follows:

Now, imagine you have specific documentation about your machinery. You could embed your knowledge base and build an RAG-based application, incorporating the vision capabilities of the new GPT-4-turbo Vision. This would create the perfect AI assistant for plant operators!

Healthcare and Life?Science

Generative AI has the potential to impact the HCLS industry across all segments, from drug discovery in Pharmaceutical firms, to patient care for healthcare providers (such as Hospitals or private clinics). According to a recent research from BCG, there are already many use cases HCLS companies are experimenting, some already validated (such as accelerating drug discovery and design, as Insilico Medicine did), while others still under validation or conceptual.

Leveraging the vision capabilities of the GPT-4 turbo, we can unlock even further scenarios. In this paragraph, I’m going through a couple of examples in the field of healthcare providers.

Prompt used:

You are a medicine expert. Your role is to support the doctor in its exams analysis and diagnoses. The doctor can provide you with pictures, x-rays, blood exams, and other data.? The doctor might want to brainstorm with you, so use all the knowledge you have to answer.

Let’s start with an orthopedic scenario. In this case, we have an X-rays scan of a post-surgery right knee exhibiting a hardware system due to a tibial plateau fracture.

Imagine I’m an orthopedic who receives this patient for the first time, who shares with me these pictures showing his clinical history. I might be curious to interact with my AI assistant to reason about the surgery my patient went through in the past:

Let’s now see another example with blood tests. In this case, this application might be extremely useful for patients with poor or no knowledge in the field, who might want to understand the meaning of their test results.

Here I’m providing a sample blood test that indicates severe anemia with iron deficiency.

As you can see, the model was able to perfectly identify the disease, specifying also that in this case, it is an iron-deficiency anemia. Now, since I’m a non-expert and I don’t know what anemia is, neither should I treat it. Let’s see how it works:

Ideally, such a patient assistant could be implemented by Healthcare structures, so that patients can be fully informed about their disease and also book their appointments with doctors directly via the assistant. Namely, in the above example, with a fully integrated solution, I might be suggested to accomplish step one, with full visibility of the doctor’s calendar and the possibility of booking an appointment as soon as possible.

Fashion

Let’s now see an application within the Fashion industry. Within this context, Computer Vision is not new: many companies have invested in AI models that are able to perform brand detection in pictures. However, in this case, we still suffer from the “traditional AI curse”: the lack of generalization.?

What if I want a model that can examine a whole outfit, identify brands, and share suggestions about possible improvements??

Let’s say we are attending a fashion show and we have to write a review for each outfit. We take a picture and feed it to our model (taken from a Gucci’s show in 2015):

Also in this case, we used the Vision enhancements from Azure AI Services (as you can see from the bounding boxes). Let’s see the response:

Now I want to see whether it can recognize the brand:

It did it! Note that there is no evident logo on top of the bag, so the model was able to retrieve it just by inferring the style and fabric of the item. Finally, let’s ask the model a suggestion about what to change in the outfit:

Great, now I have all the elements to write my article about the fashion show:

Ready to be published in the most popular fashion magazines!

Cultural Heritage

The last example I want to provide is in the context of Cultural Heritage, the sector that deals with the preservation, promotion, and transmission of cultural heritage, such as monuments, artifacts, traditions, languages, and arts.?

In recent years, digital innovation has already impacted this sector in various ways, such as providing new methods and tools to document, record, reconstruct, display, interpret, and preserve different forms of heritage (especially those that are at risk of disappearance or damage) or enhancing cross-sector, cross-border cooperation and capacity building among different stakeholders (such as museums, heritage sites, governmental bodies, academic institutions and communities).

In this example, we are going to leverage the GPT-4-turbo vision to further engage a tourist or citizen while visiting iconic places such as monuments or museums. To do so, we will use the following prompt:

You are an expert touristic guide. You answer user’s questions about monuments, museums, historical places, and similar.? You can provide historical context and share suggestions on how to enjoy the experience at the best. Feel free to suggest additional activities users can do to fully experience what they are visiting.

Let’s start by sharing a picture of the well-known Duomo di Milano:

What if I’m interested in knowing more about the cathedral’s spires?

Now let’s do a different exercise. I’m studying Caravaggio’s paintings and there is one item, “The Cardsharps”, I'm particularly curious about. Let’s engage GPT-4 vision in a conversation about that.

We can also further investigate where to find this painting and what is the message the artist wanted to convey:

Now, imagine being a tour operator providing an application with such capabilities. As a tourist, I’d be super happy and engaged by having this assistant, which also allows me great flexibility in terms of timing, style, availability etc.

Conclusion

GPT-4-turbo vision and, in general, Large Multimodal Models, are unlocking a new wave of scenarios across different industries. The above examples are just a sample of what we can achieve with this new model, and I’m looking forward to witnessing the digital transformation this will bring to the market.

References

With Vision GPT does an excellent job in recognizing old manuscripts or symbols which makes a visit to a museum more fun.

回复
Ingo Hampe

Solution Architect @ NTT Germany AG & Co. KG | Modern Workplace Consultant

1 年

I wonder in the manufacturing example: Wouldn't it be more cost effective to use a custom trained ai model to detect the defects.? Anyway - very impressive ??

回复
Leo Wang

AI and Automation, Business Intelligence, Enterprise Mobility and always in Web3.

1 年

Can’t wait to see some exciting applications to be built using GPT4-Vision!

回复
Rémi Dyon

Principal Solution Architect | Microsoft Copilot Studio | Power CAT

1 年

Excellent article Valentina! The potential of GPT-4-Turbo vision is incredible!

Anurag(Anu) Karuparti

AI Leader @Microsoft | Author - Generative AI for Cloud Solutions | Responsible AI Advisor | Ex-PwC, EY | Global Guest Lecturer | Marathon Runner

1 年

Thank you for writing this blog! It provided an enlightening perspective on the diverse applications of multimodal LLMs across various industry sectors!

要查看或添加评论,请登录

Valentina Alto的更多文章

  • Getting started with Azure AI Studio

    Getting started with Azure AI Studio

    During the opening keynote of the Microsoft Ignite event, on November 15th, Artificial Intelligence was the undiscussed…

    3 条评论
  • Computer Vision: Feature Matching with OpenCV

    Computer Vision: Feature Matching with OpenCV

    Computer vision is a field of study which aims at gaining a deep understanding from digital images or videos. Combined…

  • Building your first chatbot with Python

    Building your first chatbot with Python

    Today, if you are about to order some foods on a restaurant's website or you need assistance because your router is not…

    1 条评论
  • The Bias-Variance trade-off

    The Bias-Variance trade-off

    Machine Learning models' ultimate goal is making reliable predictions on new, unknown data. With this purpose in mind…

    1 条评论
  • Streaming analysis with Kafka, InfluxDB and Grafana

    Streaming analysis with Kafka, InfluxDB and Grafana

    If you are dealing with the streaming analysis of your data, there are some tools which can offer performing and…

  • Decision Tree and Information Gain

    Decision Tree and Information Gain

    Decision trees are some of the most popular ML algorithms used in industry, as they are quite interpretable and…

  • Natural Language Processing with TextBlob

    Natural Language Processing with TextBlob

    Natural Language Processing (NPL) is a field of Artificial Intelligence whose purpose is finding computational methods…

  • Features Engineering: behind the scenes of ML algorithms

    Features Engineering: behind the scenes of ML algorithms

    The majority of people (including me) tend to think that the core activity of building a Machine Learning algorithm is,…

  • Neural Networks: parameters, hyperparameters and optimization strategies

    Neural Networks: parameters, hyperparameters and optimization strategies

    Neural Networks (NNs) are the typical algorithms used in Deep Learning analysis. NNs can take different shapes and…

    1 条评论
  • Unsupervised Learning: PCA and K-means

    Unsupervised Learning: PCA and K-means

    Machine Learning algorithms can be categorized mainly into two bunches: supervised learning: we are provided with data…

    2 条评论

社区洞察

其他会员也浏览了