登录查看更多内容

Buying and Selling Image Data, A Practical Solution for VNA, AI and Clinical Research

Kyle Henson, DM

Helping CEOs & CIOs fix every IT issue in their healthcare system. If it’s broken, call me and your problem is solved.

发布日期: 2018年4月9日

Every now and then I am asked about, or I read an article about, someone selling massive amounts of data to one of the big companies involved in AI and/or clinical research. And, when you have a lot of data the obvious thought is, I want some of that free money!

As a thought-exercise lets look at some of the realities of moving more than a PetaByte of image data. A PetaByte is 1,024 TeraBytes or 1,048,576 GigaBytes. Many, but not all, VNA’s store data in a proprietary format that is like DICOM but is not a straight .dcm file. This means that to get data out you can’t simply copy the file but instead, have to do a DICOM transaction. Whether the data is stored as a DICOM file or not, both data transfer types require the additional step of de-identification, so although DICOM transactions add time, it is not the end of the world.

In my experience a single server tops out at somewhere around 15,000 studies per day, which is approximately 500GB. So, less than 5/100th of 1% of that PetaByte of data is moved. Doing the simple math, 10 servers dedicated to nothing but copying this data, ignoring a penalty for de-identification or additional compression will move 1 PB in 209 days. I submit that this is not practical and there is a better way.

First, we are looking at the problem from the wrong end. Whether for clinical research or training an artificial intelligence (AI) engine, it is likely that the buyer doesn’t want ALL data, but is instead looking for very specific use cases. In particular, what diagnosis are they trying to research or train? Instead of dumping billions of images on them and letting the buyer sort through it all, I propose that preparing a system that can provide a targeted data set, instead of a generic query like “send me all chest x-rays” will go a long way in cultivating a long-term relationship. To achieve this targeted approach, we must begin at the report level, not the images.

To initiate this targeted method, we would build a database that holds all reports (not images) for the enterprise. Simply start with pulling an extract from the EMR for all existing reports, and then add the HL7 or FHIR connection to get all new reports. With the reports parsed into the database any future questions or required data parameters, can be answered with relative ease. This database can then be queried for the specific data set desired, the output of this query would be accession number, patient ID, date of service and procedure description. Obviously, there should be a 1-1 relationship between accession number on the report and the images in VNA, but the additional output data will help if there is an accession number mismatch.

Now, armed with this export, a savvy VNA team can, instead of drowning the buyer in millions of “chest x-rays”, provide them with the several thousand “non-smoker, males, between the ages of 15-30 with a lung cancer diagnosis” files they actually want. I am not a researcher, but I suspect that this type of fine-tuned data capture would be more beneficial to them, as well as much easier to service from the VNA, in effect a win-win for all involved.

To view all of my articles and posts visit me at kylehenson.org

Max Ma, PhD

AI Architect (Model & ML Eng.) with depth and breadth for real world ML solution in healthcare and life science, strong science and engineering discipline with creative and curious mind

6 年

Thanks Kyle Henson, actually it is pretty common, and good you brought up. Data Scientist (researcher) have a very good reason to say: I need all data: it is about data discovery, kind “chicken and egg” thing. Data Scientist (researcher) often does not have enterprise experience. Data complexity (schema, dependency, quality issue and historical embedding) and cost (money & time) is not part of “need all data” thinking. I used to build "data scientist friendly" pipeline or layer to support their work before I became one of them ?? At same time, I have some trouble to buy-in the notion: one can handle 1,000 PB data daily, one knows how to handle the data, well, volume is not necessary the key, enterprise complexity is 1,000 PB more difficult than web traffic or user log… one should learn from the vertical search.

1 次回应

Jimmie Roberts

Healthcare IT Analyst and Solutions Specialist

6 年

Hi Kyle, I deleted that earlier comment because I believe I may have framed it incorrectly and thinking out loud on any social platform just sometimes shows ignorance. So, let me back up and 1st say, thank you for writing the article as predictive analytics and ML with respect to DICOM imaging is not something I come across very often. I’m curious to understand it better as I have seen what you’ve described before and I expect we will be seeing a lot more of it as we start to collect more and more healthcare image data. What I am aiming to understand, related to your article, is, do we get more accurate predictions if we standardize image training data sets before applying a statistical model-taking the type of image data (US, XA, CT, MR, etc.) completely out of the equation or does it not matter? For example, let’s say we wanted to try and predict whether an acre of land would have more evergreens vs. hardwoods (& vise-versa) in a particular region of a particular forest. Would our resultant predictive model be more or less accurate if we...threw in some images of cats and birds (something other than images of evergreen or hardwood trees)? Please forgive my newbie understanding, but as I stated, I want to understand it better.

Ron Gilbert MS RT(R)CIIP

Instructor at MTMI specializing in PACS Administration and AI

6 年

Great article! Thanks Kyle

1 次回应

John Hally

Team Lead, Product Management - Cloud and Technical Platforms

6 年

The other biggest challenge outside of the EMR/ Report Data is the segmentation of a positive finding to teach AI the imaging characteristics!

1 次回应

Herman Oosterwijk

The AI-guy, Assisting in AI technology deployment, entrepreneur, expert trainer/consultant on PACS, interoperability, standards.

6 年

Yes I agree, a PACS archive or VNA is optimized to support the clinical workflow of diagnosing and reviewing medical images and definitely not optimal to provide a source for data analytics, decision support or deep learning (AI). You'll need to copy relevant information into a datawarehouse which can serve that purpose, joined as you mention with hl7 and emr info. This might change when FHIR gets widely deployed as it would allow resources to be queried for that purpose.

4 次回应

查看更多评论

要查看或添加评论，请登录

Kyle Henson, DM的更多文章

There is no Wrong Way to Render an Image

2021年6月1日

There is no Wrong Way to Render an Image

When you talk to most people about rendering an image, you will get either a confused look and a shrug or a passionate…

6 条评论
A Better Way to Close Enterprise Imaging IT Incidents

2020年5月21日

A Better Way to Close Enterprise Imaging IT Incidents

According to a 2020 study of organizations in the US, UK, Australia, and New Zealand, 96% of organizations have…
You didn’t know the System was Down?

2019年9月18日

You didn’t know the System was Down?

You know those learning moments in your career that really stick with you? I vividly remember one such incident; it…

3 条评论
Do you need an EMPI?

2019年8月21日

Do you need an EMPI?

While participating in an ‘Ask Industry’ session at the most recent SiiM meeting, one of the audience members posed the…

1 条评论
Building A Better Way

2018年11月14日

Building A Better Way

One of the more frustrating things about running my VNA was taking help desk calls. Not because I had to be on call…

1 条评论
Fall Cleaning - Imaging Style (Part 2)

2018年10月31日

Fall Cleaning - Imaging Style (Part 2)

Welcome back to the Fall Cleaning series! Hope you found some helpful takeaways in part 1. As you might remember we…
Fall Cleaning- Imaging Style (part 1)

2018年10月10日

Fall Cleaning- Imaging Style (part 1)

As the days get shorter and a bit cooler it is time to fill the heating oil, check the thermostat and … do your 4th…

2 条评论
What makes up an Enterprise Imaging Team?

2018年6月13日

What makes up an Enterprise Imaging Team?

I’m more and more frequently being asked ‘what does an Enterprise Imaging team look like?’ Given that the term is…

9 条评论
Back to Basics

2018年5月21日

Back to Basics

5 years into my VNA experience the one thing I can say is that, the KISS principle applies. The only way to easily…
Searching for commitment between PACS and VNA

2018年4月2日

Searching for commitment between PACS and VNA

Many moons ago when most PACS was designed the archive was local. It is the A after all in PACS.

20 条评论

See all articles

Buying and Selling Image Data, A Practical Solution for VNA, AI and Clinical Research

Kyle Henson, DM

Helping CEOs & CIOs fix every IT issue in their healthcare system. If it’s broken, call me and your problem is solved.

Kyle Henson, DM的更多文章

社区洞察

其他会员也浏览了

HOW AI AGENTS ARE SHAPING THE FUTURE OF HEALTHCARE

Cutting Through the AI Hype in Healthcare: A Realistic Examination

Getting Your Data Ducks in a Row: Streamlining Healthcare Data for AI Models

Intelligent Application of Data at MEDITECH LIVE

Pedaling in Tandem: Mastering Integrated Decision-Making with Healthcare AI

Unleashing Generative AI for Streamlined Healthcare Operations

Avoiding The Toothless Tiger: How Underfitting Models Can Impact Patient Outcomes in Healthcare AI

Bridging Healthcare Data Gaps: Using LLMs for Unified Terminology Integration

CSIRO seeks to harness SNOMED CT and AI to enable real-time interoperability, better clinical decision support and data analytics

Data at the Speed of Thought: Leveraging Vector Indexes in Healthcare AI

Kyle Henson, DM的更多文章

There is no Wrong Way to Render an Image

A Better Way to Close Enterprise Imaging IT Incidents

You didn’t know the System was Down?

Do you need an EMPI?

Building A Better Way

Fall Cleaning - Imaging Style (Part 2)

Fall Cleaning- Imaging Style (part 1)

What makes up an Enterprise Imaging Team?

Back to Basics

Searching for commitment between PACS and VNA

社区洞察

其他会员也浏览了

HOW AI AGENTS ARE SHAPING THE FUTURE OF HEALTHCARE

Cutting Through the AI Hype in Healthcare: A Realistic Examination

Getting Your Data Ducks in a Row: Streamlining Healthcare Data for AI Models

Intelligent Application of Data at MEDITECH LIVE

Pedaling in Tandem: Mastering Integrated Decision-Making with Healthcare AI

Unleashing Generative AI for Streamlined Healthcare Operations

Avoiding The Toothless Tiger: How Underfitting Models Can Impact Patient Outcomes in Healthcare AI

Bridging Healthcare Data Gaps: Using LLMs for Unified Terminology Integration

CSIRO seeks to harness SNOMED CT and AI to enable real-time interoperability, better clinical decision support and data analytics

Data at the Speed of Thought: Leveraging Vector Indexes in Healthcare AI