What Researchers Should Know About Efficient Data Enrichment

What Researchers Should Know About Efficient Data Enrichment

For researchers today, our current challenge lies in the need to enrich siloed and unstructured scientific data.

There are hundreds of publicly available databases containing information about the life sciences and healthcare. For example, PubMed Central? (PMC) is a free full-text archive of biomedical and life sciences journal literature at the U.S. National Institutes of Health's National Library of Medicine (NIH/NLM). Europe PMC is the European version of PMC.

There are specialized databases for almost every therapeutic area, and for some therapeutic areas, there are aggregators of public domain databases, such as the Cancer Data Research Commons.

But while there is an abundance of data, much of it is siloed and is available “as is,” meaning that there’s been little value added on top of these disparate, often unstructured databases.

Another way of putting this is that the databases are “unenriched.”

The Many Forms of Data Enrichment

Data enrichment can take many forms:

  • Data can be classified and tagged to make searching easier and facilitate the clustering of results.
  • Sentiment scores can be added to individual records.
  • A variety of machine learning techniques can be used to identify the data that would be of most interest to a particular researcher. A data enrichment challenge is applying these techniques to disparate datasets from different sources in a consistent manner.

Applying Consistent, Structured Metadata

Foundation is a service aggregating about 20 databases, most in the public domain, unstructured, and related to life sciences. We enrich the content in these databases by adding a consistent layer of structured metadata to facilitate granular searching, information discovery, and uncovering hidden connections. Our proprietary, machine-learning-driven tagging engine generates the tags and categories assigned to each piece of content.

As the technological landscape evolves, we recognize the significance of staying at the forefront of data enrichment methods to support advanced scientific research effectively. It is crucial to continually enhance our understanding and implementation of these methods to better serve the evolving needs of the scientific community.

An early enhancement in this area was adding PubChem tagging to several of our key databases: PubMed, Patents, Clinical Trials, Grants, and Tech Transfer. This allowed more fine-tuned searches for documents mentioning a specific chemical compound and allowed a researcher to quickly get basic information about a compound mentioned in a document.

Customer response to this capability was highly positive, so we looked for more metadata to enrich our corpus.

We were aware of the Unified Medical Language System (UMLS) and the National Center for Biomedical Ontology (NCBO). The goal was to annotate our key datasets further and simultaneously build a process by which we could tag our customers' proprietary datasets with these more specific taxonomies.

To learn more about fast and secure data enrichment at scale, read the full blog here: https://bit.ly/47yOmlr

Discover the Power & Potential of Enriched Data With Our Product Suite

Harness the capability to transform your research process, uncover new insights, and unlock hidden potential in your information assets. We invite you to explore a live demo of our comprehensive suite of products and experience firsthand the dramatic improvements in research productivity our solutions provide.

Schedule a demo today and propel your organization towards data-informed decision making.


要查看或添加评论,请登录

Research Solutions的更多文章

社区洞察

其他会员也浏览了