Caste and Concrete: Deciphering India's Demographics through PMAY-G Data Analysis
Venugopal Adep
AI Leader | General Manager at Reliance Jio | LLM & GenAI Pioneer | AI Evangelist
As a Data Scientist, part of my role involves delving into data to unearth insights
With India's vast population of 1.4 billion, understanding the intricacies of its demographics
Link to my code:
import pandas as pd
import warnings
# Ignore all warnings
warnings.filterwarnings('ignore')
data = pd.read_csv('/content/OGDSECCAwaasPlusData_18092023.csv', error_bad_lines=False)
data.head()
# Dropping rows with missing values in any of the social group columns
data_cleaned = data.dropna(subset=['Minority', 'Others', 'SC', 'ST'])
# Convert social group columns to numeric, if not already
data_cleaned['SC'] = pd.to_numeric(data_cleaned['SC'], errors='coerce')
data_cleaned['ST'] = pd.to_numeric(data_cleaned['ST'], errors='coerce')
data_cleaned['Minority'] = pd.to_numeric(data_cleaned['Minority'], errors='coerce')
data_cleaned['Others'] = pd.to_numeric(data_cleaned['Others'], errors='coerce')
Population Distribution by State
# Population Distribution by State
# We'll calculate the total population for each state
state_population_totals = data_cleaned.groupby('state_name')['Total'].sum().reset_index()
# Sorting states by total population
sorted_state_population = state_population_totals.sort_values(by='Total', ascending=False)
Most populous states
# Identifying the top 5 most populous states
top_5_populous_states = sorted_state_population.head(5)
top_5_populous_states
Least populous states
My Insights:
My Recommendations:
Social Group Distribution Across States
# Social Group Distribution Across States
# We'll calculate the proportion of each social group (SC, ST, Minority, Others) in each state
# Convert social group columns to numeric, if not already
data_cleaned['SC'] = pd.to_numeric(data_cleaned['SC'], errors='coerce')
data_cleaned['ST'] = pd.to_numeric(data_cleaned['ST'], errors='coerce')
data_cleaned['Minority'] = pd.to_numeric(data_cleaned['Minority'], errors='coerce')
data_cleaned['Others'] = pd.to_numeric(data_cleaned['Others'], errors='coerce')
# Group by state and calculate total for each social group
state_social_group_totals = data_cleaned.groupby('state_name')[['SC', 'ST', 'Minority', 'Others']].sum()
# Calculating proportion of each social group in each state
state_social_group_totals['Total'] = state_social_group_totals.sum(axis=1)
state_social_group_totals['SC_proportion'] = (state_social_group_totals['SC'] / state_social_group_totals['Total']) * 100
state_social_group_totals['ST_proportion'] = (state_social_group_totals['ST'] / state_social_group_totals['Total']) * 100
state_social_group_totals['Minority_proportion'] = (state_social_group_totals['Minority'] / state_social_group_totals['Total']) * 100
state_social_group_totals['Others_proportion'] = (state_social_group_totals['Others'] / state_social_group_totals['Total']) * 100
Top 5 states with highest SC population
# Sorting states based on SC proportions
sorted_by_sc = state_social_group_totals.sort_values(by='SC_proportion', ascending=False).head(5)
sorted_by_sc[['SC_proportion']]
Top 5 states with highest ST population
# Sorting states based on ST proportions
sorted_by_st = state_social_group_totals.sort_values(by='ST_proportion', ascending=False).head(5)
sorted_by_st[['ST_proportion']]
My Insights:
My Recommendations:
Demographic Composition at the Panchayat Level (Rural Demographics)
# Analyzing demographic composition at the Panchayat level
# Grouping data by state and panchayat and calculating the mean for social groups
panchayat_demographics = data_cleaned.groupby(['state_name', 'Panchayat_Code']).agg({
'Minority': 'mean',
'Others': 'mean',
'SC': 'mean',
'ST': 'mean',
'Total': 'mean'
}).reset_index()
# Calculating state-wise average demographics at the panchayat level
state_avg_panchayat_demographics = panchayat_demographics.groupby('state_name').mean()
# Displaying the demographics for a few states for brevity
state_avg_panchayat_demographics.head()
Insights:
领英推荐
Recommendations:
Geographical Distribution of Social Groups within Selected States
I analyzed the mean population of SC, ST, Minority, and Others groups by district in a few selected states: West Bengal, Bihar, Maharashtra, Tamil Nadu, and Uttar Pradesh.
# Geographical Distribution of Social Groups within Selected States
# For this analysis, we'll select a few states and look at the distribution of social groups by district
# Selecting a few states for analysis
selected_states = ['WEST BENGAL', 'BIHAR', 'MAHARASHTRA', 'TAMIL NADU', 'UTTAR PRADESH']
# Filtering data for the selected states
selected_states_data = data_cleaned[data_cleaned['state_name'].isin(selected_states)]
# Grouping by state and district and calculating the mean for social groups
district_social_group_distribution = selected_states_data.groupby(['state_name', 'district_name']).agg({
'Minority': 'mean',
'Others': 'mean',
'SC': 'mean',
'ST': 'mean'
}).reset_index()
# Displaying the social group distribution for districts in these states
district_social_group_distribution.head()
Insights:
Recommendations:
Comparison of Districts within Selected States
I've compared the districts within selected states (West Bengal, Bihar, Maharashtra, Tamil Nadu, and Uttar Pradesh) to identify those with notably high or low populations for each social group (Minority, Others, SC, ST). Here's a summary for a few districts in Bihar:
# Comparison of Districts within Selected States
# Calculating the total population for each social group by district in each state
district_totals = selected_states_data.groupby(['state_name', 'district_name']).sum().reset_index()
# We'll look for districts with notably high or low populations for each social group within each state
# Calculating the mean and standard deviation for each social group in each state
state_district_stats = district_totals.groupby('state_name').agg({
'Minority': ['mean', 'std'],
'Others': ['mean', 'std'],
'SC': ['mean', 'std'],
'ST': ['mean', 'std']
})
# Merging these stats back into the district totals
merged_data = district_totals.merge(state_district_stats, on='state_name', suffixes=('', '_state_stats'))
# Calculating the z-score for each social group in each district
for group in ['Minority', 'Others', 'SC', 'ST']:
merged_data[f'{group}_z_score'] = (merged_data[group] - merged_data[(group, 'mean')]) / merged_data[(group, 'std')]
# Filtering districts with high or low populations (z-score > |2|)
unusual_districts = merged_data[(merged_data['Minority_z_score'].abs() > 2) |
(merged_data['Others_z_score'].abs() > 2) |
(merged_data['SC_z_score'].abs() > 2) |
(merged_data['ST_z_score'].abs() > 2)]
# Selecting relevant columns for display
unusual_districts_display = unusual_districts[['state_name', 'district_name', 'Minority', 'Others', 'SC', 'ST',
'Minority_z_score', 'Others_z_score', 'SC_z_score', 'ST_z_score']]
# Displaying a few entries for brevity
unusual_districts_display.head()
Insights:
Recommendations:
Block-Level Demographic Analysis in Bihar
I conducted a block-level demographic analysis in Bihar, focusing on the mean population of social groups (Minority, Others, SC, ST) in each block. Here are the findings for a few blocks in the district of Araria as an example:
# Block-Level Analysis
# Selecting a state for a more detailed block-level analysis
# For demonstration, let's choose 'Bihar'
bihar_data = data_cleaned[data_cleaned['state_name'] == 'BIHAR']
# Grouping by block and calculating the mean for social groups
block_level_demographics = bihar_data.groupby(['district_name', 'block_name']).agg({
'Minority': 'mean',
'Others': 'mean',
'SC': 'mean',
'ST': 'mean',
'Total': 'mean'
}).reset_index()
# Displaying the demographics for a few blocks for brevity
block_level_demographics.head()
Insights:
Recommendations:
Analysis of Panchayat-Level Variance Across States
I analyzed the variance in population sizes and social group compositions at the panchayat level. This analysis helps us understand the diversity and disparities in demographic distributions within rural areas of different states. Here are the top states with the highest variance in total population at the panchayat level:
# Analysis of Panchayat-Level Variance
# We'll calculate the variance in population sizes and social group compositions at the panchayat level
# This will help us understand the diversity within rural areas in different states
# Grouping data by state and panchayat
panchayat_variance_analysis = data_cleaned.groupby(['state_name', 'Panchayat_Code']).agg({
'Minority': 'var',
'Others': 'var',
'SC': 'var',
'ST': 'var',
'Total': 'var'
}).reset_index()
# Calculating state-wise average variance
state_avg_panchayat_variance = panchayat_variance_analysis.groupby('state_name').mean()
# Sorting states based on variance in total population
sorted_states_by_population_variance = state_avg_panchayat_variance.sort_values(by='Total', ascending=False)
# Displaying the top states with the highest variance in total population
sorted_states_by_population_variance.head()
Insights:
Recommendations: