LITERACY IN INDIA
Rajat Shandilya
IMI Bhubaneswar PGDM'25 | Summer Intern - Hindustan Times | Head of Marketing Club, IMIB
Abstract –
Many discussions and academic studies have addressed the relationship between government spending and literacy rates worldwide. Examining the ways that public education spending affects people's capacity to read, write, and understand basic knowledge, this report explores the complex relationship between these developments. Our objective is to reveal the complex aspects of this important dynamic by looking at past trends, regional differences, and the efficiency of different spending strategies. In the end, this study aims to pinpoint the best way to allocate resources within national economies to accelerate the goal of universal literacy and realize its full potential as a force for social prosperity and human advancement.
Introduction-? Millions of people worldwide still have unmet aspirations related to literacy, which is essential for both societal advancement and personal empowerment. Despite notable progress in the last few years, differences in literacy rates continue to exist, frequently reflecting glaring disparities in government spending on education. This report delves into this complex web of relationships, examining the complex relationship between public investment and literacy outcomes.
Using well-established economic models and pertinent scholarly insights, we first establish a thorough theoretical framework to guide our investigation. The empirical data will then be examined, and patterns in government spending on education and global trends in literacy rates will be compared. Both macro-level analyses of local and national data and micro-level research concentrating on particular educational interventions will be included in this investigation.
We will also talk about the subtleties and complexity of this relationship. Several factors will be carefully taken into account, including the effectiveness of resource allocation, the standard of educational infrastructure, and the cultural context of learning. Understanding the dynamic interaction between societal and economic factors, we will investigate how more general socioeconomic conditions—such as political stability, gender inequality, and poverty—affect literacy outcomes.
The objective of this report is to equip policymakers and stakeholders with the necessary knowledge and evidence to develop effective strategies for achieving universal literacy by shedding light on the diverse aspects of this crucial nexus. Using comprehensive examination and perceptive suggestions, we aim to create opportunities for a time when literacy will not be a luxury enjoyed by a select few, but an essential skill for all.
Research Objectives –
This report on the relationship between the global literacy rate and government expenditure in the economy aims to achieve the following research objectives:
By achieving these objectives, this report hopes to contribute meaningfully to the global effort towards achieving universal literacy. By providing a comprehensive understanding of the complex interplay between government expenditure and literacy outcomes, we can inform effective policy interventions and pave the way for a future where everyone has the opportunity to acquire the fundamental skills of reading and writing.
Remember, these are just a few potential research objectives. You can refine and tailor them to your specific research focus and interests.
?
Research Methodology - Here's how regression analysis be used to investigate the relationship between global literacy rates and government expenditure in the economy:
1. Defining Variables:
2. Data Collection:
3. Model Selection:
4. Model Estimation:
5. Interpretation of Results:
6. Robustness Checks:
7. Policy Implications:
8. Limitations and Future Research:
Linear Regression is a statistical method for modeling the relationship between a dependent variable (the target you want to predict) and one or more independent variables (the predictors). The relationship is modeled as a linear equation, and the goal is to find the best-fitting line that minimizes the difference between the observed and predicted values of the dependent variable.
Linear Relationship:
Linear Expression:
Training and testing:
Interpretation:
Data
Data Collection
We analyzed a dataset of population census 2011 this data was taken from data.gov.in. The dataset had 28 rows which consisted of the then 28 states and the literate population in the states during that time. This dataset also consisted of the spending of various state governments between 2001 and 2011.
Data Processing:
We’ve used QQnorm to get the QQ plot and draw a simple linear regression.
We’ve used train and test data functions to finally build a relationship between the two variables “X” government spending and “Y” literate population.
With the above data we achieved the p’ value of less than 0.05 and rejected the null hypothesis.
Interpretation
Code Block
library(tidyverse)
library(dplyr)
library(car)
install.packages("Metrics")??????????????????
library("Metrics")
library(caret)
library(lmtest)
#creating data frame by reading the CSV file
领英推荐
literacy = as.data.frame( read.csv("C:/Users/SURAJ/Desktop/data1.csv") )
head(literacy)
summary(literacy)
str(literacy)
#checking for unavailable data in the data frame found no missing data
is.na(literacy)
#checking column names of data frame
colnames(literacy)
#checking for multicollinearity between independent variables
cor(student_enroll$GovtSpendCrs,student_enroll$Schools,method="pearson")
#creating multiple linear regression model using lm function taking only GovtSpend as independent varible
literacy_predict=lm(LT~ Edu.Dep1, data=literacy)
summary(literacy_predict)
#Check for heteroscedasticity
plot(literacy_predict$residuals, literacy_predict$fitted.values)
plot(density(literacy_predict$residuals))
#Breuch Pagan test
bptest(tourists)
# Check for normality of residuals
qqnorm(literacy_predict$residuals)
?
#splitting the data for training and testing
set.seed(123)
train_index <- sample(1:nrow(literacy), size = nrow(literacy) * 0.8)
train_data <- literacy[train_index, ]
test_data <- literacy[-train_index, ]
?
head(train_data)
head(test_data)
#applying the lm model on the training data
trained_model= lm(LT~ Edu.Dep1,data=train_data)
#applying the trained model on test data
predictions = trained_model %>% predict(test_data)
#RootmeanSquareError
RMSE = rmse(predictions, test_data$LT)
#MeanAbsoluteError
MAE = mae(predictions, test_data$Studentlakhs)
print(test_data$Studentlakhs)
print(predictions)
print(RMSE)
print(MAE)
plot(predictions,test_data$Studentlakhs)
Conclusion
The provided R code performed a multiple linear regression analysis on a dataset related to literacy. Here are some conclusions and observations based on the code:
Data Loading and Exploration:
The code loads a dataset from a CSV file into a data frame named "literacy."
Basic exploration functions like head(), summary(), and str() are used to understand the structure and summary statistics of the dataset.
Handling Missing Data:
The code checks for missing data using is.na() and doesn't find any missing values.
Multicollinearity Check:
There is an attempt to check for multicollinearity between the independent variables (GovtSpendCrs and Schools).
Building a Simple Linear Regression Model:
A simple linear regression model is built using the lm() function with "LT" as the dependent variable and "Edu.Dep1" as the independent variable.
Checking for Heteroscedasticity:
Visual checks for heteroscedasticity are performed using residual plots (plot() and density()).
Breusch-Pagan Test:
There's an attempt to perform the Breusch-Pagan test for heteroscedasticity using the bptest() function.
Checking Normality of Residuals:
The normality of residuals is checked using a quantile-quantile (QQ) plot (qqnorm()).
Data Splitting:
The dataset is split into training and testing sets using the set.seed() function and the sample() function.
Building and Evaluating the Model:
A linear regression model is built using the training data, and predictions are made on the test data.
Performance metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are calculated to assess the model's accuracy.
Visualizing Predictions:
The code plots the predicted values against the actual values for visualization using the plot() function.