登录查看更多内容

How do we use structured queries to tackle unstructured (big) data?

Mira Nair

Senior B2B commercial leader; 12+yrs of marketing & sales in life sciences+software

发布日期: 2016年10月3日

I recently attended a talk by Linguamatics CTO David Milward on Structured Queries for Unstructured Data, delivered to the Data Insights Cambridge Meetup group.

The data science community wants to know:

How can we deliver insights from big data?
What are the optimal approaches to ‘handle’ (store, capture) and analyze (query, structure, repurpose) big data?

The amount of data we can store and generate is many times what we could store or capture just 10 years ago. SQL Database technology is able to handle structured data well and has not changed significantly since the 1980s. It’s easier to deliver insights from structured data for basic queries than it is for unstructured data in free text sources.

Unstructured data is the new frontier for data science

What drew so many people to David’s talk is the promise of the ‘data insights’ that are locked away in unstructured data. The audience spanned various industries, from those dealing with astronomical data to financial data sources, to many people concerned with health and life science unstructured data. Many industries rely heavily on data to inform their day to day business decisions. For healthcare and life science, where Linguamatics is the text mining leader, transforming how we understand and improve upon population health and patient outcomes will primarily entail extracting data insights from unstructured data sources.

Effectively mining unstructured data requires Natural Language Processing (NLP) technology

Unstructured data is challenging to dive into and analyze for business and health outcome-critical insights; it encompasses different syntactic constructions and patterns than are seen in structured data. This makes it difficult to identify entities and relations in the words, and identify relationships across different documents.

David illustrated how the upcoming version of Linguamatics NLP-driven text mining tool, I2E, addresses these challenges by normalizing data values. I2E maps the same concepts to each other no matter how they are expressed (ie non-smoker is the same as does not smoke).

If you query a large amount of unstructured data with a relatively straightforward question like “Which, What, Who?”, I2E can directly take you to the answers that matter. If you ask a broader question like ‘Tell me everything about X,’ I2E search will provide only the most relevant documents mentioning X by clustering facts extracted from all documents. This text mining approach allows the user to search for key information (e.g. a particular date, mutation, measurement, etc.) in unstructured source data regardless of how the information is expressed or formatted in the text.

Linguamatics upcoming release introduces Normalized Values and Advanced Range Search, which enable powerful range searches over unstructured data. For example a range search of "between 0.5kg and 2kg" will find weights expressed in the source text in different units e.g. 1.5lb, 600g, 1.5kg.

All of these insights coming from unstructured data sources are presented as structured results that draw your attention to the answer while linking back easily to the raw data.

David presented excellent examples of how structured querying can enable us to tap into the gold nuggets hidden within unstructured text. I look forward to seeing more examples of how NLP-based text mining is being applied at the upcoming Linguamatics Text Mining Summit, October 17-19 in Cape Cod.

要查看或添加评论，请登录

Mira Nair的更多文章

Generative AI for Drug Discovery in the era of ChatGPT: The Top 7 Key Considerations For Biopharma

2023年6月6日

Generative AI for Drug Discovery in the era of ChatGPT: The Top 7 Key Considerations For Biopharma

The noise about ChatGPT and Generative AI technology (“GenAI”) has raised important questions for the biopharma…

1 条评论
Five things I’ve learned about working in a marketing agency

2020年9月13日

Five things I’ve learned about working in a marketing agency

Original full-length post here: https://content.biostratamarketing.
What's it like to be an Inbound Account Director?

2020年9月11日

What's it like to be an Inbound Account Director?

Original full length post here: https://content.biostratamarketing.
3 tips for making the most of a small marketing budget

2020年9月8日

3 tips for making the most of a small marketing budget

In these financially difficult and uncertain times, many companies are under the pressure to get better value from…
AI In Life Sciences Marketing: The Reality Behind The Hype

2019年2月5日

AI In Life Sciences Marketing: The Reality Behind The Hype

How AI is reshaping the healthcare and life sciences industries overall The hype around AI seems toned down in the life…
Round-up for life science marketers: good reads to kick off February 2019

2019年2月1日

Round-up for life science marketers: good reads to kick off February 2019

Below are links to some very new happenings in life science and B2B marketing that every life science marketer should…
Top 5 marketing priorities for 2019

2019年1月3日

Top 5 marketing priorities for 2019

Another New Year ritual is upon us as marketers. Predictions of top 2019 marketing trends now abound.

1 条评论
Savvy marketer New Years Resolutions

2018年12月31日

Savvy marketer New Years Resolutions

Excited like me about all the forecasts of what will be the latest and greatest next year in marketing? But also ready…

3 条评论
The unspoken costs of not changing the way you do marketing

2018年12月22日

The unspoken costs of not changing the way you do marketing

Think of the last marketing conference or event you attended. Was there at least one talk in which encouraged you to…

1 条评论
Marketers beware: Don't put the tech cart before the storytelling horse

2018年8月28日

Marketers beware: Don't put the tech cart before the storytelling horse

If you follow my Twitter feed, you notice my focus on storytelling as the essential ingredient today of successful…

1 条评论

See all articles

How do we use structured queries to tackle unstructured (big) data?

Mira Nair

Senior B2B commercial leader; 12+yrs of marketing & sales in life sciences+software

Unstructured data is the new frontier for data science

Effectively mining unstructured data requires Natural Language Processing (NLP) technology

Mira Nair的更多文章

社区洞察

其他会员也浏览了

Demystifying AI-Driven Data Engineering: Transforming Raw Data into Actionable Insights

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

Blueprint for Leveraging Vector Database in Business

Step-by-Step Guide to Integrating AI Chatbots with Databases

Top 10 Future Trends in Data Science to Follow in 2024

Supervised Machine Learning in Time Series Forecasting

Data Exploration with Chat Powered by GPT-4

Edition 25 - What Retrieval Approaches Actually Work?

Analytics and Data Science News for the Week of October 25; Updates from Starburst, UC San Diego, Cambridge Advance Online & More

Unstructured data is the new frontier for data science

Effectively mining unstructured data requires Natural Language Processing (NLP) technology

Mira Nair的更多文章

Generative AI for Drug Discovery in the era of ChatGPT: The Top 7 Key Considerations For Biopharma

Five things I’ve learned about working in a marketing agency

What's it like to be an Inbound Account Director?

3 tips for making the most of a small marketing budget

AI In Life Sciences Marketing: The Reality Behind The Hype

Round-up for life science marketers: good reads to kick off February 2019

Top 5 marketing priorities for 2019

Savvy marketer New Years Resolutions

The unspoken costs of not changing the way you do marketing

Marketers beware: Don't put the tech cart before the storytelling horse

社区洞察

其他会员也浏览了

Demystifying AI-Driven Data Engineering: Transforming Raw Data into Actionable Insights

Fine-Tune Llama 3.1 with Your Data [No-Code] ??

Blueprint for Leveraging Vector Database in Business

Step-by-Step Guide to Integrating AI Chatbots with Databases

Top 10 Future Trends in Data Science to Follow in 2024

Supervised Machine Learning in Time Series Forecasting

Data Exploration with Chat Powered by GPT-4

Edition 25 - What Retrieval Approaches Actually Work?

Analytics and Data Science News for the Week of October 25; Updates from Starburst, UC San Diego, Cambridge Advance Online & More