登录查看更多内容

Email Spam Detection using Pre-Trained BERT Model: Part 2 - Model Fine Tuning

madhukara phatak

Chief Architect at Tellius

发布日期: 2023年2月16日

Recently I have been looking into Transformer based machine learning models for natural language tasks. The field of NLP has changed tremendously in the last few years and I have been fascinated by the new architectures and tools that come out at the same time. Transformer models are one such architecture.

As the frameworks and tools to build transformer models keep evolving, the documentation often becomes stale and blog posts are often confusing. So for any one topic, you may find multiple approaches which can confuse beginners.

So as I am learning these models, I am planning to document the steps to do a few of the essential tasks in the simplest way possible. This should help any beginner like me to pick up transformer models.

In this two-part series, I will be discussing how to train a simple model for email spam classification using a pre-trained transformer BERT model. This is the second post in the series where I will be discussing fine-tuning the model for spam detection. You can read all the posts in the series?here.

Data Preparation and Tokenization

Please make sure you have gone through the first part of the series where we discussed how to prepare our data using bert tokenization. You can find the same in the below link.

Email Spam Detection using Pre-Trained BERT Model: Part 1 - Introduction and Tokenization.

领英推荐

LLM Agents, Text Vectorization, Advanced SQL, and…

Towards Data Science 7 个月前

Hyperfast Contextual Custom LLM with Agents…

Vincent Granville 6 个月前

??Top ML Papers of the Week

DAIR.AI 1 年前

Model Fine Tuning

Once the tokenization is done, we are now ready to fine-tune the model.

A pre-trained model comes with a body and head. In most of the use cases, we only retrain the head part of the model. So that’s why we call it fine-tuning rather than retraining. You can read more about the head and body of a transformer model at the below link.

https://huggingface.co/course/chapter1/4.

要查看或添加评论，请登录

madhukara phatak的更多文章

Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

2023年2月13日

Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

Recently I have been looking into Transformer based machine learning models for natural language tasks. The field of…
Java Streams: Write Functional Collection code in Java

2023年1月23日

Java Streams: Write Functional Collection code in Java

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Higher Order Functions in Java

2022年10月17日

Higher Order Functions in Java

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Functional Interfaces: Java Lambda Expressions and Backward Compatibility

2022年10月13日

Functional Interfaces: Java Lambda Expressions and Backward Compatibility

I started my career as a Java developer back in 2011. I developed most of my code in the 1.

1 条评论
Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

2022年10月10日

Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

2022年9月14日

Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

I started my career as a Java developer back in 2011. I developed most of my code in the 1.
Pandas API on Apache Spark - Part 2: Hello World

2021年7月23日

Pandas API on Apache Spark - Part 2: Hello World

Pandas API on Apache Spark brings the familiar python Pandas API on top of distributed spark framework. This…
Pandas API on Apache Spark- Part 1: Introduction

2021年7月21日

Pandas API on Apache Spark- Part 1: Introduction

Apache Spark has revolutionized the data science field with its support for big data. With its support for multiple…
Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

2020年11月20日

Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

Barrier execution mode is a new execution mode added to spark in 3.0 version.
Barrier Execution Mode in Spark 3.0 - Part 1: Introduction

2020年11月11日

Barrier Execution Mode in Spark 3.0 - Part 1: Introduction

Barrier execution mode is a new execution mode added to spark in 3.0 version.

See all articles

Email Spam Detection using Pre-Trained BERT Model: Part 2 - Model Fine Tuning

madhukara phatak

Chief Architect at Tellius

Data Preparation and Tokenization

领英推荐

Model Fine Tuning

madhukara phatak的更多文章

社区洞察

其他会员也浏览了

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

TF-IDF: How Machines Understand What Matters in Text ?

A Deep Dive into Retrieval-Augmented Multi-modal Chain-of-Thought Reasoning

Battle of the AI Titans: LangChain vs Semantic Kernel...who will win?

Why I Joined You.com (and Why You Should Start Using You.com Today)

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

DeepSeek: The ‘Ding’ in the Data Elevator – Instant Clarity with AI-Powered Search"

A gentle introduction to Parameter Efficient Fine-Tuning for Vision Models

How Good are GPT-4 and GPT-4V at Solving Conceptual Puzzles? A Study Using the ConceptARC Benchmark

Effectively Using AI: How Automated Research & Insights Enhance GTM

Data Preparation and Tokenization

领英推荐

Model Fine Tuning

madhukara phatak的更多文章

Email Spam Detection using Pre-Trained BERT Model : Part 1 - Introduction and Tokenization

Java Streams: Write Functional Collection code in Java

Higher Order Functions in Java

Functional Interfaces: Java Lambda Expressions and Backward Compatibility

Latest Java Features from a Scala Dev Perspective - Part 2: Lambda Expressions

Latest Java Features from a Scala Dev Perspective - Part 1: Type Inference

Pandas API on Apache Spark - Part 2: Hello World

Pandas API on Apache Spark- Part 1: Introduction

Barrier Execution Mode in Spark 3.0 - Part 2: Barrier RDD

Barrier Execution Mode in Spark 3.0 - Part 1: Introduction

社区洞察

其他会员也浏览了

Formulation of Node Embeddings in Graphs: Node2Vec Algorithm - Part 6 of X of my notes

TF-IDF: How Machines Understand What Matters in Text ?

A Deep Dive into Retrieval-Augmented Multi-modal Chain-of-Thought Reasoning

Battle of the AI Titans: LangChain vs Semantic Kernel...who will win?

Why I Joined You.com (and Why You Should Start Using You.com Today)

Class 32 - DOCUMENT GPT 2.0 Notes from the AI Basic Course by Irfan Malik & Dr Sheraz Naseer (Xeven Solutions)

DeepSeek: The ‘Ding’ in the Data Elevator – Instant Clarity with AI-Powered Search"

A gentle introduction to Parameter Efficient Fine-Tuning for Vision Models

How Good are GPT-4 and GPT-4V at Solving Conceptual Puzzles? A Study Using the ConceptARC Benchmark

Effectively Using AI: How Automated Research & Insights Enhance GTM