Examining 2019 MA Education Data
Several weeks ago, I saw that many of the students in Avery Smith 's recent cohort for his Data Analytics Accelerator Program had delved into some data from Massachusetts' Department of Elementary and Secondary Education (DESE). Being a former teacher in Massachusetts, it inspired me to examine some of the publicly available data myself. However, I wanted to put my own spin on the project and really try to showcase my growth in my analytics journey.
This has been by far my most ambitious project to date. I'm going to divide up my this article into two parts to describe my learning. The first part will be on the analysis itself, while the second will be on my experience and what I learned.
THE ANALYSIS
The Data
The data itself came almost exclusively from the Statewide Reports page of the department's website. I opted to use data solely from the 2018-2019 school year since it was the most complete available data unaffected by COVID shutdowns.
I narrowed down my data to 16 different tables, which I downloaded and put into Excel. As I hunted for my data, I created a document that described each of the table, linked to its source, and included the date it was last updated, which you can see below:
After putting all the data into Excel, I spent significant time cleaning it. I created a change log to keep track of my alterations, which you can access HERE. Primarily, I ensured that the data source was correct for each column, and made adjustments to some of the numbers so it would upload to Tableau and MySQL correctly. For some of the tables, especially the one on Per Pupil Expenditure, some of the information was woefully incomplete, so I salvaged what I could, and eliminated the rest.
The data selected was meant to inform the following:
Questions Asked
I wanted to approach this project as if I was hired by DESE leadership and tasked with evaluating performance of Massachusetts public education. The majority of my analysis, therefore, would be aimed at the statewide level. Primarily, I wanted to examine the following:
The report created could be used to guide both the department at large and district leadership, such as superintendents, in making actionable change moving forward.
In addition to this larger report, I wanted to create a tool that superintendents themselves could use to compare their performance to similar districts. This way, if they were underperforming in certain metrics, they could collaborate with those districts to see how to improve.
The Analysis and Visualization
Most of my analysis was done in Tableau, but before I could do that, I put the tables in SQL to examine them a bit. For one, I wanted to see exactly how many districts were represented across the 16 tables. I created one of the longest-queries of my life to return all of the values represented in the District_Name and District_Code columns:
SELECT District_Code, District_Name
FROM advanced_courses
UNION
SELECT District_Code, District_Name
FROM ClassSize_GenPop
UNION
SELECT District_Code, District_Name
FROM ClassSize_RaceEth
UNION
SELECT District_Code, District_Name
FROM Dropouts
UNION
SELECT District_Code, District_Name
FROM enrollment_grade
UNION
SELECT District_Code, District_Name
FROM Enrollment_RaceGen
UNION
SELECT District_Code, District_Name
FROM Enrollment_SelectedPop
UNION
SELECT District_Code, District_Name
FROM Grades_offered
UNION
SELECT District_Code, District_Name
FROM Grad_rates
UNION
SELECT District_Code, District_Name
FROM mcas_10_elamath
UNION
SELECT District_Code, District_Name
FROM mcas_3_8
UNION
SELECT District_Code, District_Name
FROM mcas_hssci
UNION
SELECT District_Code, District_Name
FROM perpupil_expenditure
UNION
SELECT District_Code, District_Name
FROM Staff_RaceGender
UNION
SELECT District_Code, District_Name
FROM staff_retention
UNION
SELECT District_Code, District_Name
FROM teachers;
When I did this, I noticed that in many cases, the district names were inconsistent across the tables. I created a separate table simply called "All_Districts" that I cleaned up in OpenRefine, so that I would have one cohesive list of all 407 districts in Massachusetts. This was extremely helpful for when I worked in Tableau later.
In addition, I wanted to use SQL to create a query for superintendents to use in order to find comparable districts. This query was aimed to do the following:
This would yield a list of the three districts above and the three districts below yours in terms of student demographics:
I created additional similar queries to allow superintendents to find comparable districts in terms of student population size, grades offered, and various high needs populations.
After my initial exploration of the data in both Excel and SQL, I moved the data into Tableau. The previous table I created with OpenRefine, "All_Districts," was immensely helpful here, because it created a consistent connection between all the data sets.
领英推荐
I started with creating my dashboard for superintendents to examine their own data and compare to other districts. I quickly found that with the amount of information available, trying to cram it into a single dashboard would not be effective. As a result, I designed five different dashboards, linked together with navigation bars. You can see the end result, by clicking on the image below:
The dashboard allows users to select from the list of districts on the side, and compare that information to the state averages, and to other districts. The creation of this dashboard demonstrated the immense range of numbers between districts in terms of size and performance.
The next step was in addressing the questions I started with. When creating my report for DESE leadership, I wanted to really tell a story, so I opted to develop a Tableau story (which ends up being a combination of multiple dashboards again).
First, I started off by developing a summary of both student and staff data. I found it concerning that only 87% of teachers were teaching in-field (meaning they were teaching the subject they were certified for), and it was surprising to see how poorly staff racial and ethnic demographics matched those of Massachusetts' students.
I then looked at some performance measures, mainly graduation and dropout rates, and MCAS data. There was a significant range in both the graduation and dropout rates, with some very low and high outliers, respectively. For the MCAS scores, there seemed to better scores from Elementary school to Grade 10, but Science scores were generally the lowest (which was interesting, because according to the DESE site, the Science tests were legacy tests with lower standards).
After looking at these measures of success, I wanted to see what had the largest impact on them. I decided to look at three elements:
I had hypothesized that each would have some correlation with student performance, but interestedly, the data only revealed that a district's percentage of high needs students had a relationship to their measurables. Lower classes did not indicate higher performance, nor did spending more money per student.
You can see the full Tableau Story by clicking on the image below:
Finally, just to brush up on my regression analysis, I shifted back to SQL to show how you could find trend lines and evaluate the model's effectiveness. I did this by seeing if there was any correlation between the percentage of students taking advanced courses in 11th and 12th grade and a district's graduation rates.
Spoiler: There isn't really any.
But you can see how I used SQL to come to this conclusion, and all my SQL queries, by going to my GitHub (which also includes all the cleaned tables).
Conclusions
When considering my findings, I would suggest the following to the Massachusetts DESE:
MY EXPERIENCE
This project was really close to my heart as a former educator. It was also incredibly instructive for an aspiring data analyst.
Here are some of my biggest take-aways, in terms of learning experience:
FINAL THOUGHTS
As I mentioned at the start of this article, this was definitely my most ambitious project. However, once I set forth the questions that I wanted to answer, the easier it became. I'm hoping this project really showcases my passion for examining data, and the growth I've made as an analyst in the last several months.
If you have any additional questions about my methods or findings, feel free to reach out. I'll continue developing projects, and I hope you enjoy them as much as I do!
Data-Driven Controller | Data Analytics | Data Visualization | SQL | Tableau | Excel
2 年Amazing article! I love how detailed you were!
Data Analyst @ dentsu || Sharing data insights with Tableau, SQL, Excel
2 年Great job James! I love your spin on this project, very thorough.