Data Annotators: The Unsung Heroes Of Artificial Intelligence Development
Bertalan Meskó, MD, PhD
Director of The Medical Futurist Institute (Keynote Speaker, Researcher, Author & Futurist)
The wonders of AI in healthcare are undeniable, revolutionising diagnostics and prevention with astonishing advancements. From detecting diabetic retinopathy to algorithms identifying skin cancer, and predicting cardiovascular risks, AI's achievements are reshaping patient care.
But underpinning these headline-grabbing breakthroughs is an invisible army whose meticulous work powers the algorithms that save lives. They're the unsung heroes of the AI healthcare revolution, often underpaid and underappreciated. So, who are these hidden figures? What exactly do they do?
Have you ever wondered how to create a smart algorithm? Where and how do you get the data for it? What makes a pattern-recognizing program work well and what are the challenges? Nowadays, everyone seems to be building artificial intelligence-based software, also in healthcare. Still, no one talks about one of the most important aspects of the work: data annotation and the people who do this time-consuming, rather monotonous task without the flare that usually encircles AI.
Without their dedicated work, it is impossible to develop algorithms, so we all need to know and talk about the superheroes of algorithm development: data annotators.
How to make algorithms dream of cats?
The method for creating and teaching an algorithm depends on the question it aims to solve. Let’s say you want the algorithm to spot lung tumors in chest X-rays. For that, you will need tools for pattern recognition – the question doesn’t differ much from spotting cats on Instagram.
At first, it sounds easy. Until you start thinking about how to explain to a computer what a cat is. Our usual human clues - fur, ears, eyes, whiskers, four legs, cuteness, and grace - mean nothing to an algorithm that only sees pixels.
“You will need millions of cat photos, appropriately labeled as having a cat. That way, a neural network or a multilayered deep neural network can be trained using supervised learning to recognize pictures with cats in them”, David Albert, M.D., Co-Founder and President of AliveCor, the company that has been developing a medical-grade, pocket-sized device to measure EKG anywhere in less than 30 seconds, explained The Medical Futurist.
So, you won’t tell the algorithm what’s a cat, but you rather show it millions of examples to help it figure it out by itself. That’s why data and data annotation is critical for building smart algorithms.
What is data annotation?
Annotating data is time-consuming and tedious without any of the flare promised by artificial intelligence associated with sci-fi-like thinking and talking computers or robots. In healthcare, the creation of algorithms is rather about utilizing existing databases which mainly encompass imaging files, CT or MR scans, samples used in pathology, etc. At the same time, data annotation will be drawing lines around tumors, pinpointing cells or designating ECG rhythm strips. Thousands, tens of thousands of them. No magic, no self-aware computers.
That’s what Dr. Albert has been doing. “You need accurately labeled and annotated data to develop these deep neural diagnostic solutions. But it's an awful lot of work.
For example, I may annotate or diagnose ten thousand ECGs over several weeks, then another expert goes through the same ten thousand - and then we see where we disagree. After that, we have a third person, who is the adjudicator - who comes in and says, okay, regarding this five hundred where you disagree, this is what I think the answer is.
So it takes at least three people and weeks of work to give you a reasonably confident answer. Deep neural networks to perform correctly in order to take advantage of big data require a tremendous amount of annotation work”.
Counting cells and drawing precise lines around tumors
Katharina von Loga is a consultant pathologist at The Royal Marsden NHS Foundation Trust. A while ago she explained how she uses software-based image analysis to monitor the changes of immune cells within cancerous tumors during therapy. The computer helps her count the cells after she designates carefully the set of cells she’s looking for.
领英推荐
“I have an image of a stain in front of me, where I can click on the specific set and annotate that that's a tumor cell. Then I click on another cell and say that’s a subtype of an immune cell. It needs a minimum of all the different types I specified, only after that can I apply it to the whole image. Then I look at the output to see if I agree with the ones that I didn't annotate but the computer classified. That’s the process you can do indefinitely,” she explained.
The hardships of data annotation
Although it sounds perfect in theory that you can train an algorithm to support medical work in pathology, the practice is much more complicated. As medical data archives were (obviously) not created with mathematical algorithms in mind, it’s gargantuan work to standardize existing sampling processes or to have enough “algorithm-adjusted” samples.
It matters how the sample was processed from getting the specimen from the patient until it’s under the microscope. The staining method, the age of the sample, the department where the sample was produced – are all factors to count when deciding on a sample for successful algorithmic teaching.
Beyond the problems of the massive variability in the samples, we have another issue: the lack of experts for data annotation, as well as the difficulty of finding databases of scale. Usually, the precision of an algorithm depends on the size of the sample – the bigger, the better. However, hospitals or medical centers, even really resourceful ones, don’t have enough data or enough annotations. It takes companies like Google, Amazon, or Tencent with unlimited financial resources and a global footprint to derive the kind of scale that you need to develop accurate AI.
What is more, the human resources problem is aggravating. There are only 30-35,000 cardiologists in the United States, all very busy. They don't have time to mark ECGs. On the same note, there are only about 50,000 radiologists - they don't have time to read more chest X-rays. So, we have to do something.
From medical students through online annotators to AI building its own AI
Experts often mention the option to employ medical students or pre-med students in university for simpler annotation tasks – to at least solve the human resources trouble. David Albert played with the thought of building online courses for training prospective annotators, who would afterward get some financial incentives for the annotation of millions of data points. Medical facilities could basically crowdsource data annotation through platforms such as Amazon Mechanical Turk. The process could employ the “wisdom of the crowds”.
Another option would be the employment of algorithms also for annotation tasks – so basically building AI for teaching another smart software. We've seen deep learning-based tools that can do completely automatic annotation by themselves and then the user just has to correct where this automatic process did not work well.
Katharina von Loga also mentioned how international and national committees are working on the standardization of the various sampling processes, which could really ease annotation work and accelerate the building of algorithms. All these could lead to better and bigger datasets, more optimised data annotation and more efficient AI in every medical subfield.
What will the future bring for smart algorithms in healthcare?
We'll see the widespread appearance of smart algorithms in the next five to ten years. We will have much more sophisticated artificial intelligence for healthcare. It would augment doctors and allow them to return to being physicians, not just documenters. We all know how administrative tasks considerably add to the problem of physician burnout, thus such AI solutions are much needed.
Artificial intelligence will not replace physicians, the combination of their work with that of fellow medical professionals should be the direction to take for the future. However, we also see that doctors who don’t use algorithms might get replaced by the ones who do so.
While there will be (and should be) countless debates about the ways of cooperation between artificial intelligence and physicians, one thing is certainly clear. We will never have smart algorithms in healthcare without data annotators.
That’s why we felt the need to talk about and appreciate the experts who right now might be sitting in dark hospital rooms in front of computers and annotating radiology or ophthalmology images so someone else somewhere else could create a potentially lifesaving medical application from them. Without the data annotation heroes, we'll never have artificial intelligence in healthcare.
Kudos to all data annotators out there!
CEO @ Medcase | MBA, CPA
6 个月Thank you for this beautiful explanation. This is what Medcase does for over 4 years with excellence delivery to our customers. Appreciate the recognition of helping the Future of Global Digital Health.
PBO at Meezan Bank Ltd | BBA - IBA Sukkur
6 个月#jellyGOOD Hats off to them.
Highly valuable article that put light on a critical role to build reliable AI products in healthcare. If you want to know how much correctness you expect from the program, ask who trained it
Fantastic overview and 100% on the mark ... the AI revolution could not have happened without proper data annotation. This is a novel profession that deserves recognition and professional standards to ensure efficiency and optimal alignment and best practices across disciplines and industries.