Analyzing Hospital Data: What I Learned from SQL and Tableau
Michael Whaley
Data Analyst | Transforming Data into Insights with SQL, Tableau, and Visualization Expertise.
In this article, I will share insights from my analysis, particularly focusing on patient hospital stays, the correlation between procedures and length of stay, and whether treatment varies by race. I’ll also touch on some surprises I encountered along the way, which made the process all the more enlightening.
Why THIS Project?
I chose this project because I wanted to sharpen my data analysis skills while diving into a subject that affects everyone at some point in their lives—healthcare. With so many questions surrounding hospital practices, such as how long patients stay and whether they receive fair treatment, I felt this project was not only interesting but also meaningful. The data I worked with came from a decade of clinical care, allowing me to examine real-life implications of healthcare practices.
Key Takeaways
Dataset Details
The dataset I used can be found on Kaggle and consists of clinical care data collected between 1999 and 2008 from 130 U.S. hospitals. It includes over 101,000 health records of diabetes patients, featuring extensive health-related and demographic information. This breadth of data made it an ideal choice for analyzing patient experiences and outcomes.
Analysis Process
Once the data was loaded into MySQL I immediately discovered that patient_nbr was the relationship between the tables to extract any useful information. Time to dig into the data and answer some questions for the stakeholders.
How long are patients staying?
To get a quick look at this information a histogram was created in MySQL as follows:
The result in MySQL Workbench is a nice histogram showing the distribution of patients grouped by number of days. This is great for the analyst but for the stakeholder a line chart in Tableau paints the same picture in a more aesthetic way.
Both visuals make it immediately clear that most patients are staying between one and five days with two and three being the most common. The average time in hospital is actually 4.4 days.
Are patients being treated differently by race?
Sticking with time in hospital for now I wanted to see the stay length grouped by race. After joining the demographics and health table on patient_nbr I noticed "?" as a race I decided to see how many patients of each race were in the data set to be sure the result was not skewed by bad data. I added a percentage column to make the data more readable. I discovered an unexpected benefit of adding the percentage column, my initial query did not count distinct patient_nbr which I quickly discovered when Caucasian was over 106% of patients! I added a count distinct clause to the query as shown below.
领英推荐
The resulting table was dropped into Tableau and reveals race does not seem to be a factor in length of stay for patients.
Next, I decided to look at lab procedures by race joining the demographics and health table on patient_nbr.
Once again no racial bias was discovered in treatment of patients.
Is there a relationship between number of lab procedures and length of stay?
Using a case when SQL query I was able to group lab procedures by few, average, and many and see the average days in hospital for each bucket.
The data suggests the longer a patient stays in the hospital the more lab procedures they will have.
Main Takeaways
Conclusion and Personal Reflections
Throughout this project, I learned the importance of focusing on the questions stakeholders are asking and using only relevant data to answer those queries.
This analysis has not only enhanced my technical skills but also my perspective on healthcare data's role in improving patient experiences. Moving forward, I hope to apply these insights in future projects to help drive meaningful change in healthcare practices.
I’d love to hear your thoughts on this project! Please leave a comment with your insights or questions. If you or someone you know is looking to hire a data analyst, let’s connect—I’m excited to explore new opportunities!
Data-Driven Math Teacher | Data Analytics | Data Visualization | SQL | Tableau | Excel
1 个月I loved the project it was easy to follow and the graphs were nice!
Community Product Manager @ Solid Data | Chair of Wild Coast Trails Association
1 个月Nice analysis Michael Whaley - I love that you created a few different views too. You know that different stakeholders like to see things in different ways: Analysts & managers often like seeing the data itself, directors & above often want a visual or summary.
Fraud Prevention Analyst @ M&G PLC | Data Analyst | Data Scientist | Python | SQL | Machine Learning | Data Analytics | Excel | Tableau | Power BI | R
1 个月Good job Michael ??????
Interim CEO | Driving Turnarounds, Restoring Profitability, and Scaling Growth
1 个月Nice looking analysis, Michael. I like the text based histogram. It was a nice touch.