登录查看更多内容

Data efficiency: solving a problem of visual analytics

Patrick Bangert

Chief of AI | Data Science | Artificial Intelligence (AI) | Machine Learning (ML) | Data Analytics | Product Development | Software Engineering | CTO

发布日期: 2021年7月9日

A consistent pattern in AI research is the need for a great volume of training data. This creates an issue for visual AI, which needs thousands of manually labeled images to learn how to identify and denominate different images. However, there are ways to improve data efficiency. Here is a story of how Samsung SDS has researched and developed a different AI model that is far more efficient.

Identifying the problem

Computer vision and visual analytics are becoming increasingly valuable in various different industries. Observing and identifying objects in images and video has been a human-driven job, requiring people to work long hours staring at screens and images. It is routine, repetitive work that an AI engine could do. Indeed, in the world of surveillance, supply chains, and diagnostics of assets, visual AI has been deployed, successfully in many cases.

The main issue remaining is data efficiency. Visual information is complex to read without all of the mental schemas that humans have naturally developed. Machines are still able to read images and identify subjects within them. Or, rather, mark out features of objects in them based on learned data. In order to accomplish this, they need data learn from, in the form of images that depict the subjects the AI is meant to identify or demarcate.

These images also need to be labeled. The AI can’t simple attach meaning or identification to abstract blobs, even if humans are aware that the blob in question is a “cat”. The features of that cat have to be highlighted or marked in some manner and identified as such. This creates a new challenge: labelling the data, like analyzing images in the first place, is repetitive, time consuming, and pricey.

Learning Methods

This new challenge is based on supervised learning. The AI platform or neural network requires direct interfacing by a human to train it. It takes the form of manually labeled images or videos in the case of computer vision. For other types of AI, it is huge excel spreadsheets, vast seas of data, much of it carefully cultivated to ensure that the AI processes the information and is then capable of work with new, unstructured data in the future.

Naturally, this method of learning is incredibly data and time intense. It also is inefficient with the data. It takes a great deal of images to train a visual AI platform, all of which has to be checked and supervised by a technician. It is expensive, defeating much of the cost-savings and time savings from using a machine in the first place.

But what if the data used could be more efficient?

Data efficiency

Samsung SDS, by experimenting with active learning and gaussian process (GP) classifiers, have identified a way of improving the data efficiency in the case of visual AI. This method focuses on the relative information that can be obtained form subsequent data entries.

Each new image that the AI is given can only provide a vanishingly small amount of new information to learn from. The first image of a cat in the above example will teach the AI a large amount of information. Cats have fur, they have four legs, two eyes, whiskers, and so on. The next image may help diversify that description, including different breeds of cats or sizes of cats. But as the images go on, and the visual data becomes increasingly iterative, the AI will learn less and less useful information about cats. But because the bulk image input is largely random, and not so selective with the data being applied, this diminishing return isn’t being accounted for. They’re just images of cats, after all.

领英推荐

The Future of Production-Ready AI Agents, OpenAI’s…

Open Data Science Conference (ODSC) 8 个月前

How ModelOps Helps You Execute Your AI Strategy

Giuliano Liguori 3 年前

Data Labeling: The Silent Engine Powering AI and…

Objectways 5 个月前

First, the researchers used a pool of images (roughly 14,000 samples). The goal of this was to prime the AI with the body of images for pre-classification. Then, around 140 images were selected at random from the pool as the initial active training set, where each image is labeled correctly. The GP classifier is trained with these images set against the pool of 14,000. This process is better at handling imbalanced sets of images, where each class of image has few representations. The trained GP evaluates the new information that is obtained from the training set, and accuracy of the GP classifier. The results can then be compared back to the body of images, where the GP classifier can show the relative novelty and confusion regarding the body of images. The next set of images that are fed into the classifier will be the ones that are flagged as the most “confusing,” thus providing the highest degree of new information to the platform.?This process can then be repeated, until there is high certainty in the accuracy of the classifier.

The result of this process is that instead of manually labeling thousands of images, Autolabel is taught using only 10% of the total. By selecting the most “confusing”, or rather the most informative images to label and re-input, the AI’s training is accelerated. This maximizes data efficiency and avoids the copious time, data, and headaches that the previous methods required.

Where to next?

The applications for this method are universal with regards to visual analytics. Training time can be cut down drastically, allowing for cost-effective and quick onboarding of the platform. This benefits the end-user of visual AI by giving them the functioning algorithm as soon as possible.

The implications for learning algorithms and methodologies for AI are interesting as well. By experimenting further, it may be possible to develop semi-supervised learning, where the platform learns not just the patterns in images, but the schemas to be able to interpret images after limited supervised input.

This data efficiency allows for far more complex visual AI as well. By maximizing data efficiency, then additional data sets can be used to teach more and more concepts. Instead of merely cats, why not dogs, or sheep? From a time and data perspective, this allows for more flexibility and greater interpretive power for each platform with the same training time as before.

Want to read more?

PatchNet: Unsupervised Object Discovery based on Patch Embedding

Highly Efficient Representation and Active Learning Framework for Imbalanced Data and its Application to COVID-19 X-Ray Classification

Credits: This article was written by?Ryan Cann.

Dominik Schlicht

Digital Engineering and Manufacturing for a sustainable future

3 年

Interesting approach, but how do you monitor the bias of the training set and the confusion set?

1 次回应

Garry McQueen

Commercial Service Engineer

3 年

Rock de Vocht

1 次回应

Lester Ifill

Field Operations Specialist in Asset Integrity Mgt. (Fabric M'tce./Mech.- QA/QC) || Well Integrity Mgt. "WIMS" (Well Control-Supervisory Level-4 Surface Stack) || Reliability & Data Analyst Engineer (Drilling Fluids).

3 年

Thanks for posting Patrick, greatly appreciated.

1 次回应

Charles Brun

Entrepreneur, Sales Leader, Advisor

3 年

Great article. Another reason why synthetic data is increasingly being used to train and validate models.

1 次回应

Annabel Christie

3 年

Thought-provoking - thank you for sharing.

2 次回应

查看更多评论

要查看或添加评论，请登录

Patrick Bangert的更多文章

Operating Models for Data and Analytics

2023年12月2日

Operating Models for Data and Analytics

Data and its related analytics, charts and reports are increasingly important for all enterprises. They are accessed by…

7 条评论
Framework for AI Ethics: A practical guide for technology organizations

2022年7月28日

Framework for AI Ethics: A practical guide for technology organizations

Abstract Ethical considerations have recently become prominent in artificial intelligence (AI) and represent a major…

32 条评论
Semiconductor Manufacturing QA/QC using Visual AI

2021年7月27日

Semiconductor Manufacturing QA/QC using Visual AI

Demand for semiconductors is rising due to increased need for processing power. As new models are being designed faster…
To teach a computer: computer vision training and COVID scans (Part 2 of 2)

2021年5月12日

To teach a computer: computer vision training and COVID scans (Part 2 of 2)

In the previous article, we looked at the way that AI can scan x-ray images and identify potential COVID patients. The…
Artificial intelligence and COVID-19: screening for COVID using computer vision

2021年4月19日

Artificial intelligence and COVID-19: screening for COVID using computer vision

There are many ways for hospitals to screen patients for SARS-COV-2. One option that seems very powerful is utilizing…

3 条评论
ODDS: Getting Ready for AI.

2020年10月5日

ODDS: Getting Ready for AI.

At a recent discussion session at The Data Standard (https://datastandard.io/), we asked the question “Are you ready…

2 条评论
DistilBERT Benchmark: Distributed Training trains Model over 13 Times Faster by using 8 Times the Resources

2020年9月11日

DistilBERT Benchmark: Distributed Training trains Model over 13 Times Faster by using 8 Times the Resources

Abstract The introduction of masked language models like BERT and the ability to fine tune them for downstream tasks…

2 条评论
AI in the Service of Humanity: Guidelines for Ethical AI

2020年8月25日

AI in the Service of Humanity: Guidelines for Ethical AI

As models made by artificial intelligence interact with human beings in their daily life, we must ask whether those…

1 条评论
The Status and Future of AI

2020年6月29日

The Status and Future of AI

This article will present some thoughts on the status and the next years of AI evolution. Center stage are two…

15 条评论
The Case for Collaboration: Data Science Is Done Best When an Operator Works With a Data Scientist

2019年5月23日

The Case for Collaboration: Data Science Is Done Best When an Operator Works With a Data Scientist

In the past year, the number of presentations and papers submitted to SPE conferences and similar events in the oil and…

21 条评论

See all articles

Data efficiency: solving a problem of visual analytics

Patrick Bangert

Chief of AI | Data Science | Artificial Intelligence (AI) | Machine Learning (ML) | Data Analytics | Product Development | Software Engineering | CTO

Identifying the problem

Learning Methods

Data efficiency

领英推荐

Where to next?

Want to read more?

Patrick Bangert的更多文章

社区洞察

其他会员也浏览了

Why AI Implementation is Not “Business as Usual”

Discriminative AI vs Generative AI: Keys to understanding them

AI Knowledge Engineer: a key role in any AI project

From Automation to Intelligence: The Rise of Agentic AI in Process Automation

If automation handles automation, what should we focus on?

What is the impact of Artificial Intelligence in quality control of solar modules?

From ML to AI Engineering: Transforming How We Build AI Applications

Compositional AI: the future of Enterprise AI

OpenAI's o1 Model Series: A Breakthrough in AI Safety and Capabilities

6. AI CXOs Practice: Some job titles related to AI CXOs and similar high-level roles in the AI field

Identifying the problem

Learning Methods

Data efficiency

领英推荐

Where to next?

Want to read more?

Patrick Bangert的更多文章

Operating Models for Data and Analytics

Framework for AI Ethics: A practical guide for technology organizations

Semiconductor Manufacturing QA/QC using Visual AI

To teach a computer: computer vision training and COVID scans (Part 2 of 2)

Artificial intelligence and COVID-19: screening for COVID using computer vision

ODDS: Getting Ready for AI.

DistilBERT Benchmark: Distributed Training trains Model over 13 Times Faster by using 8 Times the Resources

AI in the Service of Humanity: Guidelines for Ethical AI

The Status and Future of AI

The Case for Collaboration: Data Science Is Done Best When an Operator Works With a Data Scientist

社区洞察

其他会员也浏览了

Why AI Implementation is Not “Business as Usual”

Discriminative AI vs Generative AI: Keys to understanding them

AI Knowledge Engineer: a key role in any AI project

From Automation to Intelligence: The Rise of Agentic AI in Process Automation

If automation handles automation, what should we focus on?

What is the impact of Artificial Intelligence in quality control of solar modules?

From ML to AI Engineering: Transforming How We Build AI Applications

Compositional AI: the future of Enterprise AI

OpenAI's o1 Model Series: A Breakthrough in AI Safety and Capabilities

6. AI CXOs Practice: Some job titles related to AI CXOs and similar high-level roles in the AI field