What is Metadata?

What is Metadata?

I like crosswords -- they keep my brain's semantic web on its toes (a 'meta'phor about 'meta'tarsals). One trick toward a faster solve-rate is to look at prefixes and suffixes and conjugations based on the clue. All of these are meta-data that drive the word architecture and possible solutions. The classic definitions of metadata are that it is "data about data" or "description about data." Biology has quite a few words that begin with 'meta', with examples (in no specific order): metabolic, metastasize, metacarpal, metatarsal, metagenomics, metaplasm, metazoic...

Metadata Types

To understand metadata, we need to classify the classifier itself. NISO, the National Information Standards Organization succinctly classifies metadata into the following types:

No alt text provided for this image

Taxonomy, Ontology and the Semantic Web

Taxonomy is the glossary of terms from their hierarchical classification, while ontology is the model that describes the naming, definition of types, properties and relationships among the entities in the model.

Speaking of markup languages, the first thesis of Sir Tim Berners Lee, the primary inventor of the World-Wide Web and the URL (uniform resource locator), was the "semantic web." In their 2001 paper, Lee, Hendler and Lasilla introduced the concepts of expressing meaning, knowledge representation, ontologies and agents which leads to links, connections and knowledge graphs via the Resource Descriptor Format (RDF) which then evolves knowledge within the semantic web.

The compositional grammar of the WHO's (World Health Organisation) International Classification of Diseases (ICD), version 11 (ICD-11) has a semantic structure which can be derived using its RDF via the web ontology language (OWL).

Observation and Algorithms

I have written and spoken before about Drug-Drug-Interactions (DDI) and Adverse Event Reporting (AER). Here is a representation of semantic causal ontology of an adverse event during and after a clinical trial:

No alt text provided for this image

It is not software, open-source software, cloud, multi-cloud or hybrid-cloud that will eat the world, but data and metadata. Data and metadata need to be a managed as a corporate asset with long-term implications. Classification and valuation of this data (and metadata) and identifying the critical data (and metadata) elements are key to: access control, authorization, audit, identity management, confidentiality, privacy and reliability -- all of which determines risk. Data quality by "observational and algorithmic" (machine learning) methods can build a continuous risk score at the enterprise level to identify and mitigate these risks. A brief few words on observational ontologies: Read the Odyssey book! (The book of OHDSI that is).

It is not software, open-source software, cloud, multi-cloud or hybrid-cloud that will eat the world, but data and metadata.

Federation and Inferencing on the Edge

How do we federate metadata across decentralized edge devices or edge meshes? The National Institutes of Standards and Technology (NIST), in the document NISTIR 8112 "Attribute Metadata: A Proposed Schema for Evaluating Federated Attributes" describes metadata about an individual during an online transaction: their access control (rules and policies) to protected resources in a federated infrastructure on how attribute and value are obtained, determined and vetted; the confidence in the authorization outside the data domain and their decisions; and its promotion to a federated system. It is important to note the metadata fields: Provenance, Accuracy, Currency, Privacy, Classification.

Clinical Studies and Trials

As we stand on soap-boxes and describe "Precision Population Health" and "Continuous Clinical Trials" with a lot of hand-waving, two major "flat world" cliches remain:

1. We continue to make clinical decisions in a two-dimensional, rectangular world of structured (row-column) data (the CDISC -- Clinical Data Standards Interchange Consortium -- standard still uses this process) while the real world is far from the classical A/B (Randomized Control Trial RCT) testing or a nice binomial distribution (most probabilistic inferencing).

2. The clinical trials workflow is very siloed and uses "throw-over-the-wall" or "baton-handover" techniques: IRB-> Protocol -> Approval -> Cohort Enrolment -> Data Entry -> Data Review -> Statistical Analytics -> Reporting and Review -> Data Lock -> Filings and Registrations -> Regulatory Approvals -> Post-Market Process -> AER.

According to CDISC, this data flow process happens as: Organize -> Plan -> Collect -> Organize -> Analyze -> Submit, Publish, Report (and Exchange).

The integration of business, functional and technical metadata will drive the real innovations in clinical trials. This is the cross-world we are entering, the answer is metadata...

####? 2020 Sanjay Joshi

A related article: What is Data?

Arun Batchu

Research VP | Artificial Intelligence in Software Engineering

5 年

Thanks for writing this Sanjay Joshi . A lot of food for thought. i am looking forward to your thoughts on how this and ML, especially Deep learning can complement each other. Re: data , software , world , recently I have likened it to the classic rock, paper, scissors. What will eat what ? It depends :-)

要查看或添加评论,请登录

Sanjay Joshi的更多文章

社区洞察

其他会员也浏览了