登录查看更多内容

Fake News Prediction Model

Dhruvrajsinh Vansia

Instrumentation and Control Engineer | Intern @PRL | Ex-President, PRAKALPA LDCE | IBM’24 Intern | Robotics Honours Degree | LDCE25 | OpenCV | MediaPipe | YOLOv8 | Matplot | ??

发布日期: 2024年6月3日

+ 关注

In this model, we will give a labeled data set. It consists of several thousands of news articles.

Labels are: -

Fake News
Real News

In this, we will use a Logistic Regression Model. Logistic Regression Model is?a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

It is a binary classification problem.

There will be only 2 outputs whether fake news or real news.

领英推荐

A/B Testing, Canary and Shadow deployments for ML…

Qwak (Acquired by JFrog) 2 年前

Step Guide to Creating a Heatmap of E&R U.S. 500 Jan…

AlphaBlock 11 个月前

A Practical Guide to Healing Missing Data with…

DataR Labs 1 年前

WORKFLOW

News Data
Data Preprocessing
Train Test Split
Logistic Regression Model
New data to Trained Logistic Model

Dataset Link: - https://www.kaggle.com/competitions/fake-news/data?select=submit.csv

GitHub Link: - https://github.com/dvvansia1805/Fake-News-Prediction-Model/tree/main

import numpy as np
import pandas as pd
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

import nltk
nltk.download('stopwords')

# Print the list of stopwords
print(stopwords.words('english'))

# Load the dataset using a properly formatted path
news_dataset = pd.read_csv(r'D:\College Work\AI\Fake News Prediction\train.csv')

# Print the shape of the dataset
print("Shape of the dataset:", news_dataset.shape)

# Display the first few rows of the dataset
print("First few rows of the dataset:")
print(news_dataset.head())

# Check for missing values
print("The number of missing values in the dataset:")
print(news_dataset.isnull().sum())

# Fill missing values in 'author' and 'title' columns with empty strings
news_dataset['author'].fillna('', inplace=True)
news_dataset['title'].fillna('', inplace=True)

# Combine the author and title columns to create a single text column
news_dataset['content'] = news_dataset['author'] + ' ' + news_dataset['title']

# Display the first few rows of the new dataset with the content column
print("First few rows of the dataset with the new content column:")
print(news_dataset[['content']].head())

# Data preprocessing
# Initialize the PorterStemmer
port_stem = PorterStemmer()

# Function to preprocess the text
def preprocess_text(text):
    text = re.sub('[^a-zA-Z]', ' ', text)  # Removing all the special characters and numbers
    text = text.lower()  # Converting to lowercase
    text = text.split()  # Splitting into words
    text = [port_stem.stem(word) for word in text if not word in stopwords.words('english')]  # Stemming and removing stopwords
    text = ' '.join(text)  # Joining the words back to form a single string
    return text

# Apply the preprocess_text function to the content column
news_dataset['content'] = news_dataset['content'].apply(preprocess_text)

# Display the first few rows of the dataset after preprocessing
print("First few rows of the dataset after preprocessing:")
print(news_dataset[['content']].head())

# Separating the data and label
X = news_dataset['content'].values
y = news_dataset['label'].values  # Assuming there's a 'label' column

# Convert the textual data to numerical data
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(X)

# Splitting the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, stratify=y, random_state=42)

# Initialize the Logistic Regression model
model = LogisticRegression()

# Train the model
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Calculate the accuracy score of test data
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

X_new = X_test[0]
def predict_news(model, X_new):
    # Make predictions
    prediction = model.predict(X_new)
    
    # Print the prediction
    print("Prediction:", prediction[0])
    
    # Interpret the prediction
    if prediction[0] == 0:
        print('The news is Real')
    else:
        print('The news is Fake')

# Example usage:
# Assuming X_new is a single new data point
X_new = vectorizer.transform(["New news article to predict"])
predict_news(model, X_new)

IronLink Logistics

9 个月

This is amazing work. We absolutely need more of this to better combat fake news.

1 次回应

查看更多评论

要查看或添加评论，请登录

Dhruvrajsinh Vansia的更多文章

Credit Card Fraud Detection Model

2024年6月6日

Credit Card Fraud Detection Model

This project aims to develop a robust and efficient machine-learning model that detects fraudulent credit card…
Movie Genre Recommendations

2024年6月4日

Movie Genre Recommendations

This project aims to build a movie recommendation system using Machine Learning (ML) techniques. The system will…
Diabetes Prediction

2024年6月2日

Diabetes Prediction

Machine learning (ML) models can be used to predict the likelihood of developing diabetes. These models use information…
SONAR Rock VS Mine Prediction

2024年6月1日

SONAR Rock VS Mine Prediction

We will be predicting if there is a rock or mine under the submarine by using data from SONAR. SONAR Stands for - Sound…
The Impact of Technology on College Life: Pros and Cons of the Digital Age

2023年8月8日

The Impact of Technology on College Life: Pros and Cons of the Digital Age

In the modern world, technology has revolutionized every aspect of human life, including education. College students…
The Role of Soft Skills in Student Success: Beyond Academics

2023年6月6日

The Role of Soft Skills in Student Success: Beyond Academics

When we think of student success, we often focus on academic skills such as reading, writing, math, and science…
5 Tips for Building a Strong Professional Network as a Student

2023年6月5日

5 Tips for Building a Strong Professional Network as a Student

In today's competitive job market, building a strong professional network is essential for students looking to…
Mastering Stress Management: Strategies for College Students

2023年6月3日

Mastering Stress Management: Strategies for College Students

College life is an exciting and transformative period, but it can also be a source of stress and overwhelm. The key to…

See all articles

Fake News Prediction Model

Dhruvrajsinh Vansia

Instrumentation and Control Engineer | Intern @PRL | Ex-President, PRAKALPA LDCE | IBM’24 Intern | Robotics Honours Degree | LDCE25 | OpenCV | MediaPipe | YOLOv8 | Matplot | ??

领英推荐

WORKFLOW

Dhruvrajsinh Vansia的更多文章

社区洞察

其他会员也浏览了

Fit & predict for regression

LINEAR REGRESSION ON BOSTON DATASET

Differences Between DBSCAN and RANSAC

Overfitting in the context of a small GLM

?? Day 98 of 365: Review and EDA Case Study ??

What is difference between Simple Linear and Multiple linear Regression (::)

Listing statistical tests

Why use t-stat?

How important is that variable?

Numeric and Character Functions in R

领英推荐

WORKFLOW

Dhruvrajsinh Vansia的更多文章

Credit Card Fraud Detection Model

Movie Genre Recommendations

Diabetes Prediction

SONAR Rock VS Mine Prediction

The Impact of Technology on College Life: Pros and Cons of the Digital Age

The Role of Soft Skills in Student Success: Beyond Academics

5 Tips for Building a Strong Professional Network as a Student

Mastering Stress Management: Strategies for College Students

社区洞察

其他会员也浏览了

Fit & predict for regression

LINEAR REGRESSION ON BOSTON DATASET

Differences Between DBSCAN and RANSAC

Overfitting in the context of a small GLM

?? Day 98 of 365: Review and EDA Case Study ??

What is difference between Simple Linear and Multiple linear Regression (::)

Listing statistical tests

Why use t-stat?

How important is that variable?

Numeric and Character Functions in R