Data Science with Python: Regression Modeling

Data Science with Python: Regression Modeling

In this article, I will walk you through essential data exploration, cleaning, and processing techniques using Python's pandas library. We'll also dive into regression modeling using the Scikit-learn library. Let's get started!


1. Data Exploration

We begin by loading the data and performing a quick exploration.

import pandas as pd

# Load the dataset
data = pd.read_csv("WA_Fn-UseC_-Telco-Customer-Churn.csv")

# Basic exploration
print(data.shape)
print(data.info())
print(data.describe())
print(data.head())
print(data.isnull().sum())        

This allows us to understand the structure of the dataset, view the first few rows, and check for missing values.


2. Slicing Data for Analysis

We can slice data for specific columns and rows for a more granular view.

# Column slicing
print(data["gender"])
print(data[["gender", "Partner"]])

# Row slicing
print(data[5:10])

# Combine row and column slicing
print(data[5:10][["gender", "Partner"]])        


3. Conditional Slicing

Sometimes we need to filter data based on conditions. Here’s how to slice the data by gender and churn status.

# Conditional slicing (single condition)
male_customers = data[data["gender"] == "Male"]
print(male_customers["customerID"])

# Conditional slicing (multiple conditions)
male_churn = data[(data["gender"] == "Male") & (data["Churn"] == "Yes")]
print(male_churn)        


4. Data Processing

Now, we will handle missing values, drop irrelevant columns, and encode categorical data for model building.

# Fill missing values with mode or mean
data["gender"].fillna(data["gender"].mode()[0], inplace=True)
data["tenure"].fillna(data["tenure"].mean(), inplace=True)

# Drop duplicates and unnecessary columns
data.drop_duplicates(inplace=True)
data.drop(labels=["customerID"], axis=1, inplace=True)

# Label encoding for categorical columns
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
data["gender"] = encoder.fit_transform(data["gender"])
data["Partner"] = encoder.fit_transform(data["Partner"])        


5. Data Normalization

We need to normalize the numerical data to prepare it for model training.

from sklearn.preprocessing import MinMaxScaler

# Normalize the data
scaler = MinMaxScaler()
x = data.iloc[:, :-1]
x.drop(columns=["MonthlyCharges", "TotalCharges"], inplace=True)
x_scaled = scaler.fit_transform(x)

# Define target variable
y = data.iloc[:, -1]        


6. Splitting the Data for Training and Testing

Split the data into training and testing sets to evaluate the model's performance.

from sklearn.model_selection import train_test_split

xtrain, xtest, ytrain, ytest = train_test_split(x_scaled, y, train_size=0.8)
print(xtrain.shape, xtest.shape)        


7. Building and Evaluating a Regression Model

We will build a linear regression model to predict the target variable. The model's performance will be evaluated using R-squared and mean squared error (MSE).

from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Train the model
model = LinearRegression()
model.fit(xtrain, ytrain)

# Predict and evaluate
ypred = model.predict(xtest)
r_squared = model.score(xtest, ytest)
mse = mean_squared_error(ytest, ypred)

print(f"R-Squared: {r_squared}")
print(f"Mean Squared Error: {mse}")        


8. Visualizing the Results

Lastly, let's visualize the test data and predicted results.

# Visualize test data vs predicted
import matplotlib.pyplot as plt

plt.scatter(xtest[:, 0], ytest, color='blue', label='Actual')
plt.scatter(xtest[:, 0], ypred, color='red', label='Predicted')
plt.legend()
plt.show()        



Conclusion

This code demonstrates how to explore, clean, and preprocess data, followed by building and evaluating a regression model. These techniques are essential for building predictive models and deriving insights from data.

要查看或添加评论,请登录

Nagaraja Kharvi的更多文章

  • Analyzing Loan Data with Machine Learning Models: A Comprehensive Guide

    Analyzing Loan Data with Machine Learning Models: A Comprehensive Guide

    Machine learning has revolutionized the way we analyze and predict outcomes in structured datasets. This article walks…

  • Data Science with Python: Classification Modeling

    Data Science with Python: Classification Modeling

    Introduction Start with a quick intro to classification modeling, why it’s essential in machine learning, and some…

  • Overview of Machine and Deep Learning

    Overview of Machine and Deep Learning

    In today’s rapidly changing digital world, sticking to the “basics” is no longer enough. We are in an era where AI and…

    3 条评论
  • Machine Learning and e-commerce

    Machine Learning and e-commerce

    Transforming E-Commerce with Machine Learning: A New Era in Retail As e-commerce continues to evolve, the adoption of…

    5 条评论
  • REST api to fetch MongoDB data - JAVA

    REST api to fetch MongoDB data - JAVA

    This is an article to show how we can create a rest api to fetch mongo database colletion using JAVA and JAVAX. REST…

  • DYNamic Categories - a real success

    DYNamic Categories - a real success

    People think creating products and assigning to a category is first step for Magento/SFCC but there is more than that…

    4 条评论
  • Configurable/Group Product

    Configurable/Group Product

    We have something called configurable product in MAGENTO and Masters in Demandware (SFCC), which is just the…

  • Headless/Decoupling + Magento

    Headless/Decoupling + Magento

    Magento is based on Zend - PHP framework with additional layers built by Magento. Additional layers such as blocks…

社区洞察

其他会员也浏览了