Leveraging Regression Analysis to Understand Key Factors Influencing Student Exam Scores

Leveraging Regression Analysis to Understand Key Factors Influencing Student Exam Scores

In today's data-driven world, understanding the factors that impact student performance is crucial for educators, policymakers, and stakeholders. Rahul Kumar and myself under the mentorship of Veena Bansal (Singhal) tried to analyse a dataset comprising various student attributes such as Hours Studied, Attendance, Parental Involvement, and other personal and school-related characteristics, aiming to predict Exam Scores.

Project Overview: We conducted a regression analysis to predict student exam scores using Python, with the primary goal of identifying the significant predictors of performance. The dataset includes 20 features, with the target variable being Exam Score (a continuous variable). We explored relationships between predictors like study habits, attendance, and parental involvement.

Key Steps in the Analysis:

Data Preparation

  • The dataset contains 6,607 entries, with features like Hours Studied, Attendance, and categorical variables such as Parental Involvement and Access to Resources.
  • We handled missing values, converted categorical variables into numerical ones, and removed outliers from the target variable.

Univariate Analysis

We focused on the relationships between Hours Studied and Exam Score. Using histograms and scatterplots, we found a positive correlation between the two, indicating that increased study time generally leads to higher scores.

Regression Modeling

Simple Linear Regression was performed with Hours Studied as the predictor and Exam Score as the target.


Regression Line with Hours Studied as predictor and Exam Score as Target Variable

  • The regression equation derived was:
  • Exam_Score = 0.286 × Hours_Studied + 61.55
  • The model had an R2 of 0.247, explaining around 24.7% of the variance in exam scores, suggesting other factors also influence performance.

Multivariate Analysis

We expanded the model by adding Parental Involvement and Attendance as predictors.

  • The regression equation became
  • Exam_Score = 38.64 + 0.1956 × Attendance + 11.9872 × Parental_Involvement(Low) + 12.8368 × Parental_Involvement_Medium + 13.8149 × Parental_Involvement_High
  • With this model, R2 increased to 0.367, showing that 36.7% of the variation in exam scores could now be explained by the predictors.

Further we expanded the model with Exam_Score as Target Variable and Attendance, Sleep_Hours, Parental_Involvement and Access_to_Resources as predictors.


OLS Regression Output with 2 numerical and 2 categorical variable

The regression equation becomes

  • Exam_Score = 30.84 + 0.1963 × Attendance + 9.3547 × Parental_Involvement_Low + 10.246 × Parental_Involvement_Medium + 11.24 × Parental_Involvement_High - 0.0076 x Sleep_Hours + 11.28 x Access_to_Resources_High + 10.30xAccess_to_Resources_Medium + 9.25 x Access_to_Resources_Low
  • With this model R2 increased to 0.40 showing that 40% of the variation in exam scores could now be explained by the predictors.

Key Findings

  1. The second model explains more variance in exam scores (R-squared = 0.400), mainly due to the inclusion of Access_to_Resources, which is highly significant. However, Sleep_Hours does not significantly contribute to the prediction.
  2. Both models indicate that Attendance and Parental_Involvement are strong predictors of exam scores, but the impact of Parental_Involvement slightly decreases when more variables are added in the second model.
  3. The models suggest that while study habits and attendance are key, external factors like parental involvement and resource availability play significant roles.

Correlation Matrix

We further examined the correlation between different variables to understand if we have chosen the appropriate variables for the analysis.


Correlation Matrix

  • The heatmap reveals that Attendance and Hours_Studied are the most important factors positively correlated with Exam_Score. Other variables such as Tutoring_Sessions, Previous_Scores, and Physical_Activity have weaker but positive correlations. Sleep duration (Sleep_Hours) seems to have no meaningful impact on exam scores in this context.
  • This analysis aligns with the findings from the regression models, where Attendance was a significant predictor of exam performance. It may suggest that focusing on improving attendance and study habits could be the most effective way to enhance academic outcomes.

Reference



Rahul Kumar

IIT Kanpur '25 | DoMS | Data Science & Business Analytics | App Development Analyst @accenture | 10X Salesforce Certified | Salesforce Trailhead Double Star Ranger ? ? | 3X Tosca Certified | 2X Copado Certified

5 个月

The result we found out was quite interesting. It gave us new perspective about factors affecting student performance.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了