Examining 2019 MA Education Data

Examining 2019 MA Education Data

Several weeks ago, I saw that many of the students in Avery Smith 's recent cohort for his Data Analytics Accelerator Program had delved into some data from Massachusetts' Department of Elementary and Secondary Education (DESE). Being a former teacher in Massachusetts, it inspired me to examine some of the publicly available data myself. However, I wanted to put my own spin on the project and really try to showcase my growth in my analytics journey.

This has been by far my most ambitious project to date. I'm going to divide up my this article into two parts to describe my learning. The first part will be on the analysis itself, while the second will be on my experience and what I learned.


THE ANALYSIS

The Data

The data itself came almost exclusively from the Statewide Reports page of the department's website. I opted to use data solely from the 2018-2019 school year since it was the most complete available data unaffected by COVID shutdowns.

I narrowed down my data to 16 different tables, which I downloaded and put into Excel. As I hunted for my data, I created a document that described each of the table, linked to its source, and included the date it was last updated, which you can see below:

After putting all the data into Excel, I spent significant time cleaning it. I created a change log to keep track of my alterations, which you can access HERE. Primarily, I ensured that the data source was correct for each column, and made adjustments to some of the numbers so it would upload to Tableau and MySQL correctly. For some of the tables, especially the one on Per Pupil Expenditure, some of the information was woefully incomplete, so I salvaged what I could, and eliminated the rest.

The data selected was meant to inform the following:

  • An overview of student populations and demographics
  • An overview of staff/teacher numbers, demographics, and qualifications
  • Measurements of student performance (such as state-wide standardized testing and graduation rates)

Questions Asked

I wanted to approach this project as if I was hired by DESE leadership and tasked with evaluating performance of Massachusetts public education. The majority of my analysis, therefore, would be aimed at the statewide level. Primarily, I wanted to examine the following:

  • What areas of concern exist in the state's MCAS scores (Massachusetts Comprehensive Assessment System, the state's standardized tests)?
  • How are the state's graduation and dropout rates?
  • What has the greatest impact on student performance in the above measures?

The report created could be used to guide both the department at large and district leadership, such as superintendents, in making actionable change moving forward.

In addition to this larger report, I wanted to create a tool that superintendents themselves could use to compare their performance to similar districts. This way, if they were underperforming in certain metrics, they could collaborate with those districts to see how to improve.

The Analysis and Visualization

Most of my analysis was done in Tableau, but before I could do that, I put the tables in SQL to examine them a bit. For one, I wanted to see exactly how many districts were represented across the 16 tables. I created one of the longest-queries of my life to return all of the values represented in the District_Name and District_Code columns:

SELECT District_Code, District_Name
FROM advanced_courses
UNION
SELECT District_Code, District_Name
FROM ClassSize_GenPop
UNION
SELECT District_Code, District_Name
FROM ClassSize_RaceEth
UNION
SELECT District_Code, District_Name
FROM Dropouts
UNION
SELECT District_Code, District_Name
FROM enrollment_grade
UNION
SELECT District_Code, District_Name
FROM Enrollment_RaceGen
UNION
SELECT District_Code, District_Name
FROM Enrollment_SelectedPop
UNION
SELECT District_Code, District_Name
FROM Grades_offered
UNION
SELECT District_Code, District_Name
FROM Grad_rates
UNION
SELECT District_Code, District_Name
FROM mcas_10_elamath
UNION
SELECT District_Code, District_Name
FROM mcas_3_8
UNION
SELECT District_Code, District_Name
FROM mcas_hssci
UNION
SELECT District_Code, District_Name
FROM perpupil_expenditure
UNION
SELECT District_Code, District_Name
FROM Staff_RaceGender
UNION
SELECT District_Code, District_Name
FROM staff_retention
UNION
SELECT District_Code, District_Name
FROM teachers;        

When I did this, I noticed that in many cases, the district names were inconsistent across the tables. I created a separate table simply called "All_Districts" that I cleaned up in OpenRefine, so that I would have one cohesive list of all 407 districts in Massachusetts. This was extremely helpful for when I worked in Tableau later.

In addition, I wanted to use SQL to create a query for superintendents to use in order to find comparable districts. This query was aimed to do the following:

  • Narrow down the data based on the size of the district, and selects districts with student populations ranging from 200 less to 200 more (or about a difference of 15-20 students per grade)
  • Then narrows down the list further to provide a list of districts with similar demographics based on this list.

No alt text provided for this image

This would yield a list of the three districts above and the three districts below yours in terms of student demographics:

No alt text provided for this image

I created additional similar queries to allow superintendents to find comparable districts in terms of student population size, grades offered, and various high needs populations.

After my initial exploration of the data in both Excel and SQL, I moved the data into Tableau. The previous table I created with OpenRefine, "All_Districts," was immensely helpful here, because it created a consistent connection between all the data sets.

I started with creating my dashboard for superintendents to examine their own data and compare to other districts. I quickly found that with the amount of information available, trying to cram it into a single dashboard would not be effective. As a result, I designed five different dashboards, linked together with navigation bars. You can see the end result, by clicking on the image below:

No alt text provided for this image

The dashboard allows users to select from the list of districts on the side, and compare that information to the state averages, and to other districts. The creation of this dashboard demonstrated the immense range of numbers between districts in terms of size and performance.

The next step was in addressing the questions I started with. When creating my report for DESE leadership, I wanted to really tell a story, so I opted to develop a Tableau story (which ends up being a combination of multiple dashboards again).

First, I started off by developing a summary of both student and staff data. I found it concerning that only 87% of teachers were teaching in-field (meaning they were teaching the subject they were certified for), and it was surprising to see how poorly staff racial and ethnic demographics matched those of Massachusetts' students.

I then looked at some performance measures, mainly graduation and dropout rates, and MCAS data. There was a significant range in both the graduation and dropout rates, with some very low and high outliers, respectively. For the MCAS scores, there seemed to better scores from Elementary school to Grade 10, but Science scores were generally the lowest (which was interesting, because according to the DESE site, the Science tests were legacy tests with lower standards).

After looking at these measures of success, I wanted to see what had the largest impact on them. I decided to look at three elements:

  • How a district's percentage of high-needs students (those who are English-Language Learners, have a first language not English, are economically disadvantaged, or disabilities) impact's their performance
  • How a district's PPE (Per Pupil Expenditure) impact's their performance
  • How a district's class size impact's their performance

I had hypothesized that each would have some correlation with student performance, but interestedly, the data only revealed that a district's percentage of high needs students had a relationship to their measurables. Lower classes did not indicate higher performance, nor did spending more money per student.

You can see the full Tableau Story by clicking on the image below:

No alt text provided for this image

Finally, just to brush up on my regression analysis, I shifted back to SQL to show how you could find trend lines and evaluate the model's effectiveness. I did this by seeing if there was any correlation between the percentage of students taking advanced courses in 11th and 12th grade and a district's graduation rates.

Spoiler: There isn't really any.

No alt text provided for this image

But you can see how I used SQL to come to this conclusion, and all my SQL queries, by going to my GitHub (which also includes all the cleaned tables).

Conclusions

When considering my findings, I would suggest the following to the Massachusetts DESE:

  1. A district's high-needs population seems to be the largest indicator of academic success. Finding additional means to support these students should be a priority. It would be beneficial to connect with the state legislature as well as other state departments like the Executive Offices of Health and Human Services and the Executive Office on Housing and Economic Development to address this.
  2. Since a higher Per Pupil Expenditure does not equal success, it might be more advantageous to conduct a deeper study of how successful districts utilize their funds. This information is not consistently available publicly for every district.
  3. It was concerning to see how district's staff demographics do not match the demographics of their student population. The state should consider looking into their diversity practices.


MY EXPERIENCE

This project was really close to my heart as a former educator. It was also incredibly instructive for an aspiring data analyst.

Here are some of my biggest take-aways, in terms of learning experience:

  • I learned to trouble-shoot quite a bit with MySQL. In the past, I had just used the Table Import Wizard, and sometimes I had to upload with JSON files instead of CSV files. I found out that's because my Excel for Mac messes with the encoding when converting to CSV files, so the files were never downloading correctly. Now, I know how to correctly convert them AND know that importing the files works much better if defining the columns and data types first.
  • I was able to really practice table functions and CTEs in SQL. I'm more at the point now where I can figure things out on my own without having to Google everything.
  • I learned more about relational data in Tableau. This data set was not IDEAL for that, however, since every piece of data was linked to the same District_Code. Without creating the master list of district codes and names, this project would not have been successful.
  • I was able to practice some of my statistical knowledge with this project, but it revealed that I should revisit that in the near future to learn more. I have a greater understanding of regression analysis now, though.
  • Tableau can get incredibly slow. Patience is key, and so is constantly saving your work.
  • The Tableau story feature is a great tool to have in my arsenal for data visualization and story-telling. The ability to write captions to explain your thinking and process is very powerful.


FINAL THOUGHTS

As I mentioned at the start of this article, this was definitely my most ambitious project. However, once I set forth the questions that I wanted to answer, the easier it became. I'm hoping this project really showcases my passion for examining data, and the growth I've made as an analyst in the last several months.

If you have any additional questions about my methods or findings, feel free to reach out. I'll continue developing projects, and I hope you enjoy them as much as I do!

Kim Gasgarth

Data-Driven Controller | Data Analytics | Data Visualization | SQL | Tableau | Excel

2 年

Amazing article! I love how detailed you were!

Ashley Zacharias

Data Analyst @ dentsu || Sharing data insights with Tableau, SQL, Excel

2 年

Great job James! I love your spin on this project, very thorough.

回复

要查看或添加评论,请登录

James Charest的更多文章

  • Creating an Excel Dashboard

    Creating an Excel Dashboard

    For this project, I was really hoping to accomplish two things. First, I wanted to re-familiarize myself with a Windows…

    6 条评论
  • Practicing Window Functions

    Practicing Window Functions

    Recently, I finished the "Introduction to SQL Window Functions" project course on Coursera. It was an enjoyable…

    3 条评论
  • Google Analytics Capstone

    Google Analytics Capstone

    I've FINALLY finished my capstone for my Google Analytics Certificate. This entire certification journey has been a…

    2 条评论
  • Delving into a Super-Messy Superstore Dataset!

    Delving into a Super-Messy Superstore Dataset!

    (Okay, the data set wasn't THAT messy, but I needed a unique title..

    5 条评论
  • What I Learned from the 21 Days to Data Challenge

    What I Learned from the 21 Days to Data Challenge

    Introduction I've been following Avery Smith on LinkedIn ever since I made the decision to transition out of education…

    3 条评论

社区洞察

其他会员也浏览了