Using machine learning to turn verbal conversations into structured contextual data records
Robert Svebeck
Driving Responsible AI Implementation in Region Stockholm / Karolinska University Hospital
Introduction
Many businesses, organizations, and researchers benefit from extracting insights from conversations between people. However, manually transcribing and analysing these conversations is time-consuming and error-prone. Just consider how much time and effort goes into creating meeting minutes for every meeting that takes place globally! There is a significant amount of time and money to be saved, benefiting everyone involved, while creating new business opportunities for innovative application developers.
In recent years, advances in Machine Learning (ML) have made it possible to automatically and successfully transcribe audio conversations into text. However, this alone did not completely solve the problem, since the conversations in our meetings are not structured.
With the latest achievement in Large Language Models (LLM), it's now possible to extract relevant information from that text and create structured meeting notes. Moreover, it's possible to take that structured text even further and transform it into a tabular data structure. With this capability, the possibilities and use cases are truly endless!
It's important to keep in mind that ML algorithms are biased and can make serious errors, which can lead to inaccurate data. Therefore, to successfully implement this automation technology, we must establish strong approval processes around it, ensuring that the output of the algorithm is approved by all parties involved.
Once the conversation is stored in a structured way in a database, it can be approved by those who participated in the conversation, and then used to create more insights by other ML models.
In this blog post, I'll explore how to use machine learning to turn audio conversations into structured data while addressing bias and error problem. I will do this by showing you an example where a fictional doctor is meeting a fictional patient and the goal is to take this conversation into a well designed database, automatically.
I will be using chatGPT for this example, but the concept can be implemented in many other available models.
Step 1: Transcribing Audio to Text
The first step in turning audio conversations into structured data is to transcribe the audio into text.
There are several frameworks available for transcribing audio to text, including:
These frameworks vary in terms of their features, accuracy, and ease of use. It's important to evaluate each framework based on your specific needs before deciding which one to use.
In my example, I used Google Cloud Speech API to transcribe the audio in my example, and this is the part of the result (complete textfile of the conversation can be found here and an audio recording in MP3 format is here).
person 1: how are you doing today?
person 2: I am not doing to well.
person 1: oh, tell me about it.
person 2: last night I had chest pain. It woke me up and I couldn't go back to sleep.
person 1: what time did it wake you up?
person 2: It was around 2 in the morning.
person 1: do you still have the pain?
person 2: no, it went away after about 40 minutes.
...
Step 2: Extracting Context from Text
After transcribing the audio into text, the next crucial step is to extract the relevant information from it. This can be achieved through the use of LLM algorithms, which are capable of identifying and extracting key information such as names, dates, locations, and even sentiment. Although there are many options available in the market, it's important to choose a framework that has been trained on your specific market, language, and context to minimize errors and bias.
In my example I used chatGPT with the following prompt:
Consider this text as a conversation between a patient and a doctor. First task is to figure out who is the patient and who is the doctor. Then I want you to make a resume of the conversation: [Pasted the full text conversation here]
领英推荐
The result from chatGPT is surprisingly good, out of the box:
ChatGPT: Person 1 is the doctor and Person 2 is the patient.
Yay! Correct!
ChatGPT: Resume of the conversation:
The doctor asked the patient how they were feeling, and the patient responded that they were not feeling well. The patient explained that they had chest pain the previous night, which woke them up and lasted for around 40 minutes. The doctor asked the patient to describe the pain and if they had any other symptoms. The patient said that the pain felt like someone was stabbing them on the left side of their chest and they felt sick to their stomach. They also mentioned that their left arm hurt a little. The doctor asked if the patient had experienced this before, and the patient said that they had experienced it once when they were running in a 5k race about 6 months ago. The doctor asked if the patient had any medical problems, and the patient mentioned that they take high blood pressure medication and a cholesterol medication called Zocor. The patient also had a hernia repair 25 years ago when they were playing college sports. The doctor asked if the patient was allergic to any medications, smoked or drank alcohol. The patient responded that they were not allergic to any medications and that they only have a few beers while watching the Cowboys. The doctor reassured the patient that they need to get more information before making any conclusions about their health condition.
Again! Really impressive summary!
Important to note here that chatGPT is not specifically trained on medical conversations so we should expect mistakes.
Step 3: Structuring Data for further automation and analysis
The final step in turning audio conversations into structured data is to organize the extracted information into a format that can be easily analysed and processed.
This can be done by asking the LLM to reshape that conversation again, but this time into a even more structured format, suitable for a database. The structured data can then be used for further analysis or to train machine other learning models on it to understand more about a disease etc.
This third step is the most difficult part of the process and it will most likely need several sub steps to make the AI understand first what database table the data should be posted to, because again, the conversation might not be cover just one single area like in my simple example here.
To try this, I give ChatGPT the following prompt:
Consider this conversation again, and create a SQL INSERT QUERY based on the structure of this table: patient_id, conversation_summary, patient_age, pain_type, primary_pain_description, pain_location, pain_event_date, pain_duration, secondary_pain_description, other_symptoms_text, prior_pain, prior_pain_context, prior_pain_time, heart_problems, lung_problem, medical_history, medication_history, allergy_history, smoking_history, alcohol_history
ChatGPT: Unfortunately, the conversation does not provide enough information to fill out all the fields in the table you have provided. However, here is an example SQL insert query based on the information that is available to me:
INSERT INTO patient_info (patient_id, conversation_summary, patient_age, pain_type, primary_pain_description, pain_location, pain_event_date, pain_duration, secondary_pain_description, other_symptoms_text, prior_pain, prior_pain_context, prior_pain_time, heart_problems, lung_problem, medical_history, medication_history, allergy_history, smoking_history, alcohol_history) VALUES (1, 'Patient had chest pain in the middle of the night, on the left side of the chest. Pain lasted for about 40 minutes and was accompanied by nausea and left arm pain. Patient has taken hydrochlorothiazide for high blood pressure and Zocor for cholesterol. No known allergies or heart problems. Had similar pain once 6 months ago while running in a 5k. Had hernia repair surgery 25 years ago while playing college sports. Drinks alcohol occasionally while watching football.', 50, 'Sharp', 'Stabbing', 'Left side of chest', '03/18/2023', '40 minutes', NULL, 'Nausea and left arm pain', 'Similar pain once 6 months ago while running in a 5k', 'During running', '6 months ago', 'No', NULL, 'Hernia repair surgery 25 years ago', 'Hydrochlorothiazide for high blood pressure, Zocor for cholesterol', 'None', 'Never', 'Occasionally while watching football');
So now if I had a database and a table as I requested, I can just run this query and the job is done! I am impressed. It did a really good job, consider again that chatGPT is not specifically trained on medical data.
For sure, today LLM are perhaps not fully ready for this workflow to be smooth enough for general conversations. Perhaps the doctor in this case need to follow a strict pattern in questions to make sure everything is correctly mapped. But the massive amount of time that can be saved by this workflow is mind-blowing.
Step 4: Review the data
Once the data has been posted to the database, along with the original audio recording and text transcription (for auditing purposes), the data should be reviewed and verified by all parties involved in the conversation to ensure its accuracy. If any errors or discrepancies are identified, a feedback loop can be established to improve the AI and reduce the likelihood of future errors. This feedback can be used to retrain the machine learning model used for the transcription and information extraction. This continuous improvement process ensures that the accuracy and quality of the data collected and processed is continually improving over time.
Process overview
This is a simplified process map that describes the concept. Blue rectangles represent individuals performing manual tasks, while green rhomboids represent Machine Learning algorithms automating certain tasks.
Conclusion
Using AI to turn audio conversations into structured data can save time, reduce errors and enable more insights. However, it's important to address AI bias and errors in each step of the process, from transcription to NLP to structuring data. By using diverse data, testing for fairness, and reviewing for errors, businesses, organizations, and researchers can gain valuable insights from conversations that are accurate and unbiased. Additionally, it is important to implement an approval process that allows everyone involved in the conversation to verify the results. This can ensure that the data is not only accurate and unbiased but also reflects the intended meaning of the conversation. By taking these steps, businesses, organizations, and researchers can leverage AI to unlock powerful insights from conversations that were previously untapped.
I want to clarify that the conversation presented in the example is entirely fictional. The conversation mentioned has never taken place. The audio from the dialogue was generated by another AI model that converted a conversation from a text file. I am grateful to David Carnahan for providing me with the conversation example and the idea to use ML to process it.
Senior AI engineer at Qure.ai | CT Product
1 周Great insights. What do you think could be a possible business for this database? Is it mandatory to have these records for auditing?