Healthcare Analysis Using SQL

Healthcare Analysis Using SQL

Published By: Madeeha Umar

Healthcare is the most important industry for any country in the world. As the world has reshaped after Covid-19 pandemic, healthcare has become a crucial part of patients well being. In today's ever-evolving healthcare industry, data analysis plays a necessary role in driving informed decisions and improving patient care.

With the vast amount of healthcare data available, SQL (Structured Query Language) is an undeniably critical tool in data analysis. This powerful programming langrage helps to gain valuable insights.

Recently I have been working as a healthcare data analyst at a hospital. This analysis aims to reach the depths of healthcare data, exploring various aspects such as patient demographics, medical procedures, and treatment outcomes. We can uncover patterns, trends, and correlations within the data that can inform healthcare providers, administrators, and policymakers in their efforts to enhance the quality and accessibility of healthcare services.

This analysis strive to transform data into actionable knowledge and ultimately contributing to the advancement of patient care.


Key Findings:

  • The length of stay is at least 1 day and at most 14 days. On average it is 4 days.
  • The most number of surgeries and procedures are performed in these departments that include Surgery-Thoracic, Surgery-Cardiovascular, Radiologist, Cardiology and Surgery-Vascular.
  • There?is a correlation?between patients staying longer in the hospital and having more lab procedures done.
  • There are patients who are being readmitted to the hospital within 30 days that are coming through the emergency room.
  • There is an increase?of lab procedures in African American and Hispanic patients, especially in men.


Let's start with the data:

This dataset represents 10 years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. It includes over 50 features representing patient and hospital outcomes. For this project we are using a dataset from Kaggle. You can find it here Dataset.

Information was extracted from the database for encounters that satisfied the following criteria.

  • It is an inpatient encounter, a hospital admission.
  • It is a diabetic encounter, that is, one during which any kind of diabetes was entered to the system as a diagnosis.
  • The length of stay was at least 1 day and at most 14 days.
  • Laboratory tests were performed during the encounter.
  • Medications were administered during the encounter.

The data contains attributes such as patient number, race, gender, age, admission type, time in hospital, medical specialty of admitting physician, number of lab tests performed, HbA1c test results, diagnosis, number of medications, diabetic medications, number of outpatient, inpatient, and emergency visits in the year before the hospitalization.


Exploration & Tools:

I am using MySQL to uncover the objectives of this analysis. Here are the SQL commands and statements used in this project,

  • DISTINCT
  • HAVING
  • Visualized Data with Histogram
  • CASE WHEN
  • Merging columns in tables with JOINS
  • Combining rows with UNION
  • SUBQUERIES | CTEs
  • Combining data with CONCAT.
  • GROUP BY | ORDER BY


Business Question:

  • What is the distribution of time spent in the hospital for patients, are most of them staying less than 7 days?
  • Which departments perform the most average number of procedures and surgeries?
  • Do longer hospital stays correlate to more labs procedures being performed?
  • Is there a racial bias when it comes to evaluate the amount of lab procedures performed for patients, and does gender of patients has any effect?
  • How many people belong to African American race who are UP for 'Metformin' ?
  • When patients are discharged, do they readmit to the hospital in under 30 days and are these returning patients coming through emergency room?
  • A summary of top 50 patients with highest number of lab procedures.


Analysis:

The Hospital Nursing Director wants to confirm that patients who are staying longer than 7 days are acute in conditions. The Hospital management wants to have more beds available for patients. Time is a big factor in hospital in terms of business side and the Director does not want to miss the chance of getting more patients. To see the distribution of time spent in the hospital for patients, are most of them staying less than 7 days?

I visualized data with SQL and created a Histogram. A histogram is an approximate representation of the distribution of numerical data. Although SQL is not a traditional data visualizations tool, but we have used innovative approach to get some initial exploratory data analysis. The basic text characters are representing HISTOGRAM bars.

I used ROUND, COUNT, Right PAD, GROUP BY and ORDER BY statements. I used?Alias?to rename the column to make it easy to read. From the query, the average stay, regardless of race, is around 3-4 days. We can see that number of most patients stayed for 3 days, is around 17756.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


Next business questions is to know about departments who are performing the most average number of procedures and surgeries, I wrote a SQL query using DISTINCT statement.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


But this Query doesn't return a round number and it gives cluttered result, so we want to narrow it down and make it look good for stakeholders. The nursing director is really looking for any specialty that has more than 2.5 average procedures, so I performed this SQL query below. This query returns a list of medical specialties performing greater than 2.5 average procedures. Using?ROUND?will help keep the decimal point small and be easier to read. HAVING filters the aggregated results of average number of procedures, by using the Operator AND we can filter the medical specialties with more than 50 patients and Average Procedures Greater than 2.5. We can see the top 5 medical specialties performing the most lab procedures and Surgery-Thoracic has the highest number of procedures.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


Next business questions that Nursing director wants to know if there is a correlation between lab procedures and longer stays in the hospital. In a separate query, we learn the minimum number of lab procedures is 1, the average number of lab procedures is 43.09 and the maximum number of lab procedures is 99. I used?CASE WHEN?statement to further break down the procedures and made groups of patients according to the number of lab procedures they had.

Few?= 0 and 25?????Average?=25 to 55???????Many?=Anything over 55


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results

While the?CASE WHEN?query gave me an idea of the procedure frequency, the Nursing Director wanted more precise information. From this new query which returns average time spent in hospital we can find there is a correlation?between patients staying longer in the hospital and having more lab procedures done. We can see the results that patients with many lab procedures spent average of 5 days in hospital. The time spent in hospital decreases as the number of lab procedures decrease.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


Our next business question is to find out that if there is a racial bias when it comes to evaluate the amount of lab procedures performed for patients, and does gender effect this percentage. While looking at the results, we find that average results are not that different from one another if we evaluate them according to race and gender.

There is still some information is missing in this dataset. For this SQL query I have used JOIN statement to join data from two tables health & demographics as both tables have a similar column name patient_nbr.

According to results highest number of African American men have the most average number of lab procedures as 44.58%, on second number are again men but their racial data is missing. Then we have African American women with average of 43.75% and Hispanic men with 43.60% According to this dataset more men have been found with most number of lab procedures.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


My next business question to answer is that research department of hospital wants to conduct a study on the patients UP for "metformin" to conduct a research on African American people. The Chief of Nursing Staff wants a list of patients number who fall under this criteria. For returning these patients numbers I have used UNION to join rows from two different tables health & demographics. Here you can find a list of patient's numbers who are UP for metformin and belong to African American race for this study.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


In this business question we will find out when patients are discharged, do they readmit to the hospital in under 30 days and are these returning patients coming through emergency room. To know about instances where patients were admitted due to emergencies but managed to have a stay shorter than the hospital’s average. This SQL query was returned by using Common Table Expressions CTEs. WITH generated a variable that was subsequently used to return average time spent by patients in hospital without necessitating a subquery.


No alt text provided for this image
SQL Query
No alt text provided for this image
Query Results


Lastly, the Hospital Administrator wants to get the summary of top 50 patients with highest number of lab procedures. For getting summary I used CONCAT Function and focused on the top 50 patients who had the longest hospital stays and also had the most lab procedures performed. This SQL query returns patients numbers, their race, how many medications they have and the number of lab procedures followed by them. We used an INNER JOIN in this query to join two tables of patients information that are demographics and health.


No alt text provided for this image
SQL Query
No alt text provided for this image


Insights and Recommendations:

With this healthcare data analysis we found out that patients stay at least 1 day and the longest stay is 14 days. On average patient's stay is 4 days.

As we can see that most number of surgeries are performed by Surgery-Thoracic that are cardio-thoracic surgery. Heart patients have the most chronic health conditions, I would recommend healthcare systems to closely look how they can reduce number of lab procedures for these acute patients, and if there are any duplicate tests have been performed based on the insights we gained from this analysis. The goal would be to focus on chronic patients and decreasing the number of unnecessary lab procedures.

We have evaluated number of lab procedures in patients specifically according to their race and gender, there is a slight increase in male patients having more procedures than females. Hospitals can analyze more that what kind of health insurance plans most of the female patients have, do they cover variety of lab procedures and we can ensure that both men and women from any race can have as equal chances of better health to avoid any racial bias.

My next recommendation would be to look at the list of patients who are being readmitted to the hospital under 30 days. This shows that patient care could have been rushed, leading to more time in the hospital potentially increasing the cost of treatments. I would recommend to see where we can improve our care of patients and making sure that nothing was missed that would cause a readmit. Hospitals can take care of special care given in emergencies to ensure patients get right treatments to decrease their expected returns. This causes issues because beds can be limited, further testing is expensive, and overall patient experience can suffer when they are being readmitted.

Conclusion:

Thank you for reading my project article of HealthCare Analysis using SQL. If you have any feedback or questions about my insights and analysis, feel free to contact me and comment below. You can reach out to me at [email protected]?You can connect with me on?LinkedIn. You can find more articles written by me in my?Portfolio. I would love to have your suggestions.

Kathy Mucher, M.Ed.

Academic Data Analyst | Assessments & School Accountability | Academic Impact | Pearson Virtual Schools

1 年

Fantastic work, Madeeha Umar! ?? I enjoy your writing style and your visuals are well done. I appreciate your labels on the visuals and your use of bold on significant text. And your moving photo is remarkable!

回复
Crystal Peralta Yamamoto

Accountant and Data Analyst using facts to bring awareness- SQL, Python, Excel, Data Visualization

1 年

Great Work Madeeha, I especially enjoyed reading your insights and recommendations. Your analysis helped detect bias and issues that can't be detected by staff at the forefront.

Mohammad Shabbir Taibani

I help Data professionals to enhance their Skills | Team Lead : Data Analyst WFM & In-house Excel Trainer at TOM TOM | MCT-Microsoft Certified Trainer | Forecasting & Planning | Excel | VBA | Python| Power BI

1 年

Madeeha Umar This is a great project Madeeha! Doing analysis for healthcare is an exciting process:)

Avtar Cheema

Business, Data & Sales Analyst: SQL - Excel - Power BI - Python - Salesforce - Hubspot - GenAI - ChatBOTs

1 年

Nice project Madeeha, well laid out, very informative - looking forward to the next one.

Lillian Welsh

Data Analyst at LexisNexis

1 年

Looks so good, Madeeha. I think you did a nice job on the Recommendations section too.

要查看或添加评论,请登录

Madeeha Umar的更多文章

社区洞察

其他会员也浏览了