Health is Wealth!

Health is Wealth!

Do you remember your last stay at the hospital? Was it pleasant? Did you wait a long time to get attention? We all have good and bad experiences when it comes to getting medical help. While most visits are positives, not everyone has the same “luck”. But luck shouldn’t be an important factor when visiting the hospital.

In this study, we'll analyze real hospital data using SQL. The goal is to ask questions and find insights that could help patients and medical professionals.


The Datasets

For this case, we will use 2 datasets provided by the University of California – Irvine Machine Learning Repository. Our data represents ten years (1999-2008) of clinical care at 130 US hospitals and integrated delivery networks. Each row concerns hospital records of patients diagnosed with diabetes, who underwent laboratory, medications, and stayed up to 14 days.


Data overview:

We’ll be using two datasets:

health.csv: Consists of 46 columns and 101766 rows.

Some of the most important and useful values for our study are:

patient_nbr: Unique identifier of a patient.

admission_type_id: Integer identifier corresponding to 9 distinct values, for example, emergency, urgent, elective, newborn, and not available.

time_in_hospital: Integer number of days between admission and discharge.

medical_specialty: Integer identifier of a specialty of the admitting physician, corresponding to 84 distinct values, for example, cardiology, internal medicine, family/general practice, and surgeon.

num_procedures: Number of procedures (other than lab tests) performed during the encounter.

num_lab_procedures: Number of lab tests performed during the encounter.

num_medications: Number of distinct generic names administered during the encounter.


demographics: Consists of 7 columns and 71518 rows.

The values we will use from this dataset are the following:

patient_nbr: Unique identifier of a patient.

race: Values: Caucasian, Asian, African American, Hispanic, and other.

gender: Values: male, female, and unknown/invalid.

age: Grouped in 10-year intervals: [0, 10), [10, 20),..., [90, 100).

weight: Weight in pounds.


Questions:

?Once we’ve taken a look at the data set and it’s tables, we can ask different questions in order to explore and find helpful insights:

?1.??? What’s the distribution of time spent in the hospital?

2.??? What are the busiest medical specialties and their average number of procedures?

3.??? Is there a race bias in the number of lab procedures?

4.??? Is there a correlation between the number of lab procedures and the number of days in the hospital?

5.??? How many emergency patients spend less than the average time in the hospital?


1. Time spent in the hospital

In order to understand how long patients spend at the hospital, we will have to categorize and count the cases by days. For this, we can group the data by time in spent in the hospital. We’ll call this value “bucket”. Next, we’ll count how many cases exist in each bucket. Finally, we can illustrate this with the RPAD function, using asterisk and diving it by 100 for better visualization.

We can also create two categories: More than 7 Days and Less than 7 Days. For this, we will use a counter, a CASE function and group by our categories. This will allow us to see the exact number of patients that spent less than 7 days in the hospital and the ones who didn't.


2. Average number of procedures per medical specialties.

First, let’s take a look at the different medical specialties that live in this data set:

Now that we see the 73 different medical specialties, we can calculate the average number of procedures for each one. We will also create a count and filter out the specialties with less than 50 patients. This will help us have a better understanding of the busiest specialties.

3. Race bias check

In order to check if there's a race bias in the number of lab procedures, we need to take a look at both tables, health and demographics. The first one contains data on the number of procedures and the second one on the patient’s race. To do this, we’ll need to join both tables, calculate the average number of lab procedures and group them by race. Let’s join the tables with this query:


Now that we have all the data in one place, we can find what the average of lab procedures from each group of patients and group the results by race:


4. Number of lab procedures and days in the hospital correlation

Before we look for a correlation between those two values, we need to find the min, max and average number of lab procedures.

?With this data, we can create 3 categories: Few, Average and Many lab procedures.

Once the categories have been created, it’s easier to find the average time in the hospital per number of lab procedures. We will also group this data by category.


5. Time spent in the hospital by emergency patients

How many emergency patients spend less than the average time in the hospital? To answer that question, first we have to find what the number of emergency patients is. We can filter by the admission type id, which for emergency patients is 1.

Now, we can create a filter that will show us the number of patients that spent less than the average time in the hospital:


Findings:

1.??? Most patients spend less than 7 days in the hospital, with a total of 80617 vs 21149 that went over the 7 days mark.

2.??? The busiest medical specialties by average number of procedures are:

Surgery-Thoracic: 3.5

Surgery-Cardiovascular/Thoracic: 3.25

Radiologist: 3.24

3.??? By looking at the data, we can conclude that there is no race bias in the number of lab procedures. All the numbers are close enough to each other.

4.??? We’ve identify a correlation between the number of lab procedures and the time spent in the hospital. The longer a patient spends at the hospital, the more lab procedures they have.

5.??? From 53990 emergency patients, 33684 spent less than the average time in the hospital. That’s a 62.3% of patients that left the facilities earlier than the average time.

要查看或添加评论,请登录

Diego Manssur的更多文章

  • Take The Shot!

    Take The Shot!

    When we talk about sports, we usually think of classic activities like soccer, football, hockey or baseball. But in the…

    6 条评论
  • Hired or Fired?

    Hired or Fired?

    The job market has changed a lot through the years. While traditionally, employees would spend most of their careers in…

    1 条评论
  • Let's Get The Iron

    Let's Get The Iron

    Mining and manufacturing can be a very complicated process. Besides finding a proper location to dig, professionals…

  • Dribble Pass & Shoot!

    Dribble Pass & Shoot!

    Have you ever been nervous before your favorite team plays a game? Have you wondered what the chances of winning are? I…

    5 条评论
  • Welcome To Canada

    Welcome To Canada

    In the past 8 years, immigration laws and the education system have drastically changed in order to welcome a higher…

    2 条评论
  • Where is The Money?

    Where is The Money?

    Ever since I was a kid, I’ve always been conscious about money and its role in life. Growing up in Ecuador, there was a…

    3 条评论
  • Analyzing DoorDash Sales Throughout The Year

    Analyzing DoorDash Sales Throughout The Year

    Ever since we experienced a lockdown for the first time, food delivery services have appeared and increased rapidly…

    4 条评论
  • Soulmates

    Soulmates

    Hi!! I'm happy to show you my film called "Soulmates". I hope you like it!

社区洞察

其他会员也浏览了