登录查看更多内容

Using machine learning to turn verbal conversations into structured contextual data records

Robert Svebeck

Driving Responsible AI Implementation in Region Stockholm / Karolinska University Hospital

发布日期: 2023年3月20日

Introduction

Many businesses, organizations, and researchers benefit from extracting insights from conversations between people. However, manually transcribing and analysing these conversations is time-consuming and error-prone. Just consider how much time and effort goes into creating meeting minutes for every meeting that takes place globally! There is a significant amount of time and money to be saved, benefiting everyone involved, while creating new business opportunities for innovative application developers.

In recent years, advances in Machine Learning (ML) have made it possible to automatically and successfully transcribe audio conversations into text. However, this alone did not completely solve the problem, since the conversations in our meetings are not structured.

With the latest achievement in Large Language Models (LLM), it's now possible to extract relevant information from that text and create structured meeting notes. Moreover, it's possible to take that structured text even further and transform it into a tabular data structure. With this capability, the possibilities and use cases are truly endless!

It's important to keep in mind that ML algorithms are biased and can make serious errors, which can lead to inaccurate data. Therefore, to successfully implement this automation technology, we must establish strong approval processes around it, ensuring that the output of the algorithm is approved by all parties involved.

Once the conversation is stored in a structured way in a database, it can be approved by those who participated in the conversation, and then used to create more insights by other ML models.

In this blog post, I'll explore how to use machine learning to turn audio conversations into structured data while addressing bias and error problem. I will do this by showing you an example where a fictional doctor is meeting a fictional patient and the goal is to take this conversation into a well designed database, automatically.

I will be using chatGPT for this example, but the concept can be implemented in many other available models.

Step 1: Transcribing Audio to Text

The first step in turning audio conversations into structured data is to transcribe the audio into text.

There are several frameworks available for transcribing audio to text, including:

Google Cloud Speech API: This framework uses Google's speech recognition technology to transcribe audio to text.
IBM Watson Speech to Text: This framework uses IBM's Watson speech recognition technology to transcribe audio to text.
DeepSpeech: This is an open-source speech-to-text engine developed by Mozilla that uses machine learning to transcribe audio to text.
Kaldi: This is a free, open-source toolkit for speech recognition that provides a complete set of tools for building automatic speech recognition systems.
SpeechRecognition: This is a Python library that supports several speech recognition engines, including Google Cloud Speech API, IBM Watson Speech to Text, and others.

These frameworks vary in terms of their features, accuracy, and ease of use. It's important to evaluate each framework based on your specific needs before deciding which one to use.

In my example, I used Google Cloud Speech API to transcribe the audio in my example, and this is the part of the result (complete textfile of the conversation can be found here and an audio recording in MP3 format is here).

person 1: how are you doing today?

person 2: I am not doing to well.

person 1: oh, tell me about it.

person 2: last night I had chest pain. It woke me up and I couldn't go back to sleep.

person 1: what time did it wake you up?

person 2: It was around 2 in the morning.

person 1: do you still have the pain?

person 2: no, it went away after about 40 minutes.

...

Step 2: Extracting Context from Text

After transcribing the audio into text, the next crucial step is to extract the relevant information from it. This can be achieved through the use of LLM algorithms, which are capable of identifying and extracting key information such as names, dates, locations, and even sentiment. Although there are many options available in the market, it's important to choose a framework that has been trained on your specific market, language, and context to minimize errors and bias.

In my example I used chatGPT with the following prompt:

Consider this text as a conversation between a patient and a doctor. First task is to figure out who is the patient and who is the doctor. Then I want you to make a resume of the conversation: [Pasted the full text conversation here]

Brij kishore Pandey 3 个月前

What Are The Latest Trends in Data Science?

Bernard Marr 3 年前

Transforming Business Decision-Making with Advanced…

Pratibha Kumari J. 3 个月前

The result from chatGPT is surprisingly good, out of the box:

ChatGPT: Person 1 is the doctor and Person 2 is the patient.

Yay! Correct!

ChatGPT: Resume of the conversation:

The doctor asked the patient how they were feeling, and the patient responded that they were not feeling well. The patient explained that they had chest pain the previous night, which woke them up and lasted for around 40 minutes. The doctor asked the patient to describe the pain and if they had any other symptoms. The patient said that the pain felt like someone was stabbing them on the left side of their chest and they felt sick to their stomach. They also mentioned that their left arm hurt a little. The doctor asked if the patient had experienced this before, and the patient said that they had experienced it once when they were running in a 5k race about 6 months ago. The doctor asked if the patient had any medical problems, and the patient mentioned that they take high blood pressure medication and a cholesterol medication called Zocor. The patient also had a hernia repair 25 years ago when they were playing college sports. The doctor asked if the patient was allergic to any medications, smoked or drank alcohol. The patient responded that they were not allergic to any medications and that they only have a few beers while watching the Cowboys. The doctor reassured the patient that they need to get more information before making any conclusions about their health condition.

Again! Really impressive summary!

Important to note here that chatGPT is not specifically trained on medical conversations so we should expect mistakes.

Step 3: Structuring Data for further automation and analysis

The final step in turning audio conversations into structured data is to organize the extracted information into a format that can be easily analysed and processed.

This can be done by asking the LLM to reshape that conversation again, but this time into a even more structured format, suitable for a database. The structured data can then be used for further analysis or to train machine other learning models on it to understand more about a disease etc.

This third step is the most difficult part of the process and it will most likely need several sub steps to make the AI understand first what database table the data should be posted to, because again, the conversation might not be cover just one single area like in my simple example here.

To try this, I give ChatGPT the following prompt:

Consider this conversation again, and create a SQL INSERT QUERY based on the structure of this table: patient_id, conversation_summary, patient_age, pain_type, primary_pain_description, pain_location, pain_event_date, pain_duration, secondary_pain_description, other_symptoms_text, prior_pain, prior_pain_context, prior_pain_time, heart_problems, lung_problem, medical_history, medication_history, allergy_history, smoking_history, alcohol_history

ChatGPT: Unfortunately, the conversation does not provide enough information to fill out all the fields in the table you have provided. However, here is an example SQL insert query based on the information that is available to me:

INSERT INTO patient_info (patient_id, conversation_summary, patient_age, pain_type, primary_pain_description, pain_location, pain_event_date, pain_duration, secondary_pain_description, other_symptoms_text, prior_pain, prior_pain_context, prior_pain_time, heart_problems, lung_problem, medical_history, medication_history, allergy_history, smoking_history, alcohol_history) VALUES (1, 'Patient had chest pain in the middle of the night, on the left side of the chest. Pain lasted for about 40 minutes and was accompanied by nausea and left arm pain. Patient has taken hydrochlorothiazide for high blood pressure and Zocor for cholesterol. No known allergies or heart problems. Had similar pain once 6 months ago while running in a 5k. Had hernia repair surgery 25 years ago while playing college sports. Drinks alcohol occasionally while watching football.', 50, 'Sharp', 'Stabbing', 'Left side of chest', '03/18/2023', '40 minutes', NULL, 'Nausea and left arm pain', 'Similar pain once 6 months ago while running in a 5k', 'During running', '6 months ago', 'No', NULL, 'Hernia repair surgery 25 years ago', 'Hydrochlorothiazide for high blood pressure, Zocor for cholesterol', 'None', 'Never', 'Occasionally while watching football');

So now if I had a database and a table as I requested, I can just run this query and the job is done! I am impressed. It did a really good job, consider again that chatGPT is not specifically trained on medical data.

For sure, today LLM are perhaps not fully ready for this workflow to be smooth enough for general conversations. Perhaps the doctor in this case need to follow a strict pattern in questions to make sure everything is correctly mapped. But the massive amount of time that can be saved by this workflow is mind-blowing.

Step 4: Review the data

Once the data has been posted to the database, along with the original audio recording and text transcription (for auditing purposes), the data should be reviewed and verified by all parties involved in the conversation to ensure its accuracy. If any errors or discrepancies are identified, a feedback loop can be established to improve the AI and reduce the likelihood of future errors. This feedback can be used to retrain the machine learning model used for the transcription and information extraction. This continuous improvement process ensures that the accuracy and quality of the data collected and processed is continually improving over time.

Process overview

This is a simplified process map that describes the concept. Blue rectangles represent individuals performing manual tasks, while green rhomboids represent Machine Learning algorithms automating certain tasks.

Conclusion

Using AI to turn audio conversations into structured data can save time, reduce errors and enable more insights. However, it's important to address AI bias and errors in each step of the process, from transcription to NLP to structuring data. By using diverse data, testing for fairness, and reviewing for errors, businesses, organizations, and researchers can gain valuable insights from conversations that are accurate and unbiased. Additionally, it is important to implement an approval process that allows everyone involved in the conversation to verify the results. This can ensure that the data is not only accurate and unbiased but also reflects the intended meaning of the conversation. By taking these steps, businesses, organizations, and researchers can leverage AI to unlock powerful insights from conversations that were previously untapped.

I want to clarify that the conversation presented in the example is entirely fictional. The conversation mentioned has never taken place. The audio from the dialogue was generated by another AI model that converted a conversation from a text file. I am grateful to David Carnahan for providing me with the conversation example and the idea to use ML to process it.

Prakhar Jain

Senior AI engineer at Qure.ai | CT Product

1 周

Great insights. What do you think could be a possible business for this database? Is it mandatory to have these records for auditing?

要查看或添加评论，请登录

Robert Svebeck的更多文章

Dags att ?ppna skattkistan?

2024年10月10日

Dags att ?ppna skattkistan?

AI-assisterad journalgranskning: Att l?sa upp sjukv?rdens dolda skatt Nyligen hade jag f?rm?nen att delta i ett…
A Personal Perspective on Regulation vs Innovation

2024年9月30日

A Personal Perspective on Regulation vs Innovation

As I sit and stare at my desk, fingers poised over the keyboard, I'm acutely aware of the invisible lines that shape my…

3 条评论
European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

2023年4月1日

European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

In a bold and unexpected move, the European Parliament has unveiled a groundbreaking new law that will revolutionize…

5 条评论
I wrote a story with ChatGPT

2022年12月23日

I wrote a story with ChatGPT

One could say that, much like John Bunyan as he crafted "The Pilgrim's Progress" I find myself employing allegory in my…

3 条评论
How to be specific in AI image creation process

2022年10月3日

How to be specific in AI image creation process

Disclaimer: The pace of development in this area is so incredible that writing a "how to" article is almost a waste of…
Qlik RGB() to HSL() to RGB() / ARGB()

2021年2月10日

Qlik RGB() to HSL() to RGB() / ARGB()

For a project I recently had a special RGB color code and I needed to make it just slightly brighter and more colorful.…

7 条评论
How to make a 2D Interactive Solar System in Qlik Sense

2021年1月25日

How to make a 2D Interactive Solar System in Qlik Sense

Here is just a super quick guide on how to create a simple animated and interactive 2D Solar System using Qlik Standard…

13 条评论
AI and Machine Learning for Qlik Developers - Case Study 3 - part 2

2020年5月14日

AI and Machine Learning for Qlik Developers - Case Study 3 - part 2

This is an example in Qlik where we are using a Logistic Regression Algorithm, built only with Qlik Script, to be able…

4 条评论
AI and Machine Learning for Qlik Developers - Case Study 3 - part 1

2020年5月11日

AI and Machine Learning for Qlik Developers - Case Study 3 - part 1

This is an example in Qlik where we are using a Logistic Regression Algorithm, built only with Qlik Script, to be able…

22 条评论

See all articles

Using machine learning to turn verbal conversations into structured contextual data records

Robert Svebeck

Driving Responsible AI Implementation in Region Stockholm / Karolinska University Hospital

Introduction

Step 1: Transcribing Audio to Text

Step 2: Extracting Context from Text

领英推荐

Step 3: Structuring Data for further automation and analysis

Process overview

Conclusion

Robert Svebeck的更多文章

社区洞察

其他会员也浏览了

Components of a RAG System: A Comprehensive Overview

Big Data Analytics Big Data & AI

Statistical inference vs machine learning inference: significance of iid

Commercial AI vs Open Source: What You Should Choose?

Data Phoenix Digest - ISSUE 8.2023

A Comprehensive Guide to Building Multimodal RAG Systems

Amazon Q: A Business Analyst's New Best Friend

Create Machine Learning Models Without Needing to Write Code

Why Small Data is Essential for Advancing AI

Data and artificial intelligence (4th part)

Introduction

Step 1: Transcribing Audio to Text

Step 2: Extracting Context from Text

领英推荐

Step 3: Structuring Data for further automation and analysis

Process overview

Conclusion

Robert Svebeck的更多文章

Dags att ?ppna skattkistan?

A Personal Perspective on Regulation vs Innovation

European Parliament Unveils Groundbreaking New Law: 18-Hour Days to Revolutionize Productivity

I wrote a story with ChatGPT

How to be specific in AI image creation process

Qlik RGB() to HSL() to RGB() / ARGB()

How to make a 2D Interactive Solar System in Qlik Sense

AI and Machine Learning for Qlik Developers - Case Study 3 - part 2

AI and Machine Learning for Qlik Developers - Case Study 3 - part 1

社区洞察

其他会员也浏览了

Components of a RAG System: A Comprehensive Overview

Big Data Analytics Big Data & AI

Statistical inference vs machine learning inference: significance of iid

Commercial AI vs Open Source: What You Should Choose?

Data Phoenix Digest - ISSUE 8.2023

A Comprehensive Guide to Building Multimodal RAG Systems

Amazon Q: A Business Analyst's New Best Friend

Create Machine Learning Models Without Needing to Write Code

Why Small Data is Essential for Advancing AI

Data and artificial intelligence (4th part)