4 tips on DICOM handling in Machine Learning for Healthcare
<abstract> This post followed a question from a post about cancer detection ML models asking around DICOM healthcare image handling on AI with tensorflow, there is technical resources links and some caveat that we want to avoid, non exhaustive, please feel free to add your pro tips as comments </abstract>
Ibrahima GORY thanks for asking about Dicom and Tensorflow
Here are some pieces you may want to explore to start your journey in healthcare AI
1 Technical resources
Tensorflow dicom handling : https://www.tensorflow.org/io/tutorials/dicom
DLTK tutorials : https://blog.tensorflow.org/2018/07/an-introduction-to-biomedical-image-analysis-tensorflow-dltk.html
You may want to join Tensorflow community groups : https://www.tensorflow.org/community/groups
2 Literature
Also a book that bring nice context, Chapter 7 and 8 : Deep Learning for the Life Sciences: Applying Deep Learning to Genomics, Microscopy, Drug Discovery, and More
By Bharath Ramsundar, Peter Eastman, Patrick Walters, Vijay Pande
https://g.co/kgs/1LC2MF
3 Data preparation
Handling Dicom format is key, you want to ensure you’re working on the diagnostic image pixels not the headers data
https://www.postdicom.com/en/blog/handling-dicom-medical-imaging-data
Something that happened in the past, a team let the hospital name on the input dataset, train a cancer detection model on it. The model reached a 99% precision score.
Model learned basically that when a specific hospital name appeared on the sample image this was cancer, because in reality the given hospital had an oncology service and the training dataset was componed about cancer images from this oncology service only and no cancer images from other various hospitals.
By ensuring you work on the Diagnostic images pixels you ensure you’re working on Signal and not introduce any noise.
Ensure you know what a DICOM image is and that you use a right dimensioned tech stack, DICOM images could be very big as volume of Bytes on hard drive at rest and could consume huge bandwidth on network transition, if you're using the correct tech stack, may be using SSD and/or a cloud you ensure you will be able to repeat training jobs on the most efficient way, iterating on training jobs is also key to optimize the models
4 AI Explanation
Salient Pixels map : makes the important pixels green or at least flashy in order to understand which zone in the image the model used to classify. It helps to understand if the model is recognizing right patterns!
This is really well explained in this book, Machine Learning Design Patterns, Chapter 7, Design pattern 29 : Explainable predictions :
Book by Michael Munn, Sara Robinson, and Valliappa Lakshmanan
https://g.co/kgs/dZrdPG
And an illustration of an Salient Map:
img source: https://www.researchgate.net/figure/Example-classification-result-for-a-WSI-with-a-consensus-label-of-ADH-a-Original-image_fig7_326529524
This publication may also help: https://arxiv.org/pdf/1911.11293.pdf
Credits : Efficient Saliency Maps for Explainable AI
T. Nathan Mundhenk, Barry Y. Chen, and Gerald Friedland
Hope this could help, as stated this is not meant to be an exhaustive list, please feel free to add as comments you're important piece to keep in mind while working on DICOM image in ML space
Estudiante en Universidad Tecnológica Metropolitana
1 年Muchas gracias, esto es sagrado
Traffic Acquisition Manager - Prisma Media
4 年L'avancée du Machine Learning est définitivement une source d'inspiration pour divers domaines, félicitations !
Chef de Projet consultant FTTA Hors ZTD
4 年Great piece very inspiring thanks for sharing