登录查看更多内容

The cyclical nature of enrichment and analysis

Matthew Hardman

Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership

发布日期: 2019年4月6日

When most people think about Data Analytics, it's usually a linear journey which involves gathering data from a variety of sources, cleansing it, blending it, visualising it, and finally getting some sort of insight from it to present to the user.

The insight gained, while "insightful", is never really related back to the data. In a crude way its like making a sandwich, if you chose to put lettuce in the sandwich, you don't usually go back to the head of lettuce and leave a comment that this particular head of lettuce was used for a sandwich, it just doesn't seem necessary.

The thing is, that if we continue down this linear journey of understanding and developing insights, unless published to a wider community, the insights are kept with the person who uncovered them, and acceleration of knowledge will be hampered. However there is a way we can start to address this, and metadata is key to making it happen.

If you have ever read any of my articles, you will know of my fondness for metadata, and how it is just as valuable and useful as the data itself. Put simply metadata is known as data that describes data, the GPS coordinates of where a photo was taken, the date a file was created, the level of radiation used to produce a cranial scan. Metadata need not be just data that describes the data it is attached to, it can be any form of data that can enrich a particular piece of data on how it has been used, or additional "insights" about it.

A Practical Example

Let's imagine for a second the analysis flow for a medical research project. In a simple flow of steps you might do the following;

Process a series of medical scans to identify a set of particular scans that contain the necessary information for your analysis.
The actual analysis you want to run.

If you stop and think purely about the first step, this might involve identifying 100 particular scans from a repository of a million such scans. Something that might take a massive amount of time to process, to identify the correct images. That might be a tax that you are willing to pay and plan for, after all you want the correct scans for your project, but what happens if you want to run through the same process again, unless once you have identified all those images you copied them all to your own repository, you would go through the same process again, consuming unnecessary time.

Maybe you could look at it in reverse, and once you have identified all the images that are right for your study, you delete all the other images from the repository, so you focus only on what is important. Ok, so we know that is not going to work, just because you don't see value in those other scans, doesn't mean that another researcher won't find value.

Enter Metadata

Metadata provides the value here, once your scans have been identified they could be tagged with information indicating their status as suitable data for your analysis. In fact it might not be a tag that indicates its good for "your" analysis, but for all future projects that are based around similar analysis, such a tag could help other researches reduce massive amounts of time trying to find the right sources of information for their projects as well.

This is where the cyclical nature of enrichment and data analysis is realised. Once we have identified some sort of insight from a type of data, we can tag that source information with the insight itself to help reduce the need for additional processing at later date. An index that maintains the locations of the data, and maintains the information about the data is key here.

This continual cycle of analysis and enrichment not only speeds up the identification of relevant information for the individual, but can help accelerate the time to insight for an entire body of researchers in an organisation or a community itself.

Coming back to our example, we can get to a point where we eliminate the first step, meaning that organizations can focus on the analysis and research that comes from having good data, rather than spending expensive cycles in identifying the good data.

In Closing

As we continue to consume a greater variety of data from a more diverse set of data sources, metadata will be the key to ensuring that we identify the right data to drive the important outcomes.

Thanks for stopping by, and I hope you all have a great weekend!

The cyclical nature of enrichment and analysis

Matthew Hardman

Hybrid Cloud | High Performance Applications | Data Ops | Strategy | Leadership

A Practical Example

Enter Metadata

In Closing

更多精彩文章

社区洞察

其他会员也浏览了

The Growing Importance of Data Science in Today's World

Data vs. Knowledge: Navigating the Information Spectrum

Wrestling with Data Science “Second Surgeries”? Try MVE!

Edition 2: Introduction

Data Preparation Processes in Machine Learning Applications

Measuring Data Quality: Metrics and KPIs

Sensing the Enterprise: The Push and Pull of Data Analytics

Mishandling Missing Values @ DS ML models

Marie Kondo-ing Your Data: Giving Your AI the Glow-Up It Deserves

How to Integrate AI and Data Strategies

A Practical Example

Enter Metadata

In Closing

The future is hybrid... but not that hybrid.

2023年4月4日

Are we missing the real value of employee development?

2023年2月22日

The big opportunity in the microcosm of user driven innovation

2022年7月11日

Excessive Explanation - A cry for help?

2021年8月27日

The decade of understanding and action

2020年1月9日

Knowing what you don't know

2019年3月9日

Backups should be stored in the cloud

2019年2月23日

Building better performance goals

2018年10月13日

Daddy learning from Daughter

2018年8月6日

The Data Center of Future is Data Driven - Part 2

2018年3月20日

社区洞察

其他会员也浏览了

The Growing Importance of Data Science in Today's World

Data vs. Knowledge: Navigating the Information Spectrum

Wrestling with Data Science “Second Surgeries”? Try MVE!

Edition 2: Introduction

Data Preparation Processes in Machine Learning Applications

Measuring Data Quality: Metrics and KPIs

Sensing the Enterprise: The Push and Pull of Data Analytics

Mishandling Missing Values @ DS ML models

Marie Kondo-ing Your Data: Giving Your AI the Glow-Up It Deserves

How to Integrate AI and Data Strategies