Gmail's new RetVec module enhances its text categorization algorithm to improve spam filtering

Gmail's new RetVec module enhances its text categorization algorithm to improve spam filtering


You can read the full article on my blog in either Hebrew or English.


Gmail’s spam filtering is now undergoing dramatic changes.

Fast, efficient, and accurate identification of harmful and abusive content, such as phishing attacks, fraudulent attempts, spam, or offensive comments in comments or posts, is an important pillar, and now Google is launching an efficient and innovative module to improve the detection of these attacks and has implemented it in Gmail as well.

Sophisticated and malicious users use various methods that make it very difficult for text models to identify accurately and efficiently, such as by using homoglyphs (replacing similar characters), invisible characters, and multiple keywords to trick the machine learning (ML)-based defense mechanisms.

RetVec: A New Revolution in Text Categorization

The new model, called RETVec, which stands for Resilient & Efficient Text Vectorizer, and according to Google, is revolutionary in both accuracy and efficiency – 83% more efficient in terms of required processing power (measured in TPU – Tensor Processing Units) and its showed in energy efficiency, savings in valuable processing time and improved memory management.

The RETVec model is lightweight compared to other models (about 200,000 parameters), multilingual, and its level of accuracy is exceptional compared to other models they have implemented so far.

The model supports all languages and is submitted as an open source. Due to its great efficiency, it will run in applications that run on-devices, including mobile devices. It can be used for various applications for analyzing and classifying text.

According to Google, in its announcement of 11/29/2023, it tested the RETVec model in Gmail in recent years and is now operational.

Gmail blocks about 15 billion unwanted emails every day, and it says it detects about 99.9% of phishing, spam, and malware that reaches Gmail subscribers and prevents them from entering their inboxes.

A dramatic change in the effectiveness of Gmail spam filtering

The RETVec module improves spam detection by 38% and, just as significantly, improves false detection accuracy (19.4% improvement in false positive detection and 17.71% improvement in false negative detection).

source: google announcement

The importance of free text in email

Over the years, spammy words have become obsolete in B2C email filtering, yet Microsoft and other providers still rely on Bayes-based filtering, which measures “bad” words vs. “good” words to measure the “spamminess” of an email.

Gmail now uses the new RETVec mechanism, emphasizing the importance of using live text in the email’s body. Image-only emails that are so popular may be easier to produce but are inaccessible; they do not include a separate CTA for each link (only one click on the image itself), and do not allow searching the email inbox. This is a disadvantage.

Words that in the past were considered spammy, such as “free”, may even increase recipients’ engagement with the emails.

Gmail toughens the requirements for senders

Gmail is filling another gap and toughening its requirements for marketers starting in February 2024.

Gmail finally wants to prevent marketers from sending emails from email platforms (ESPs) by using their private email address (emails in domains such as Gmail, Outlook, Yahoo, etc), as their sender email.

This requirement will benefit mailers and Gmail customers and will require marketeers to take responsibility and use their personal domains wisely.

From now on, Gmail will enforce a DMARC Policy in quarantine mode on its Gmail and googlemail domains, which will effectively no longer allow private Gmail addresses to be retired from mailing systems.

Gmail’s new requirements for marketers apply to those who send over 5,000 emails daily from all email platforms to the gmail.com domain.

These are Gmail’s new requirements for marketers:

Domain verification:

Email senders must verify their sending domain using SPF or DKIM.

SPF authentication alone meets this Gmail requirement. However, in a shared IP pool (the situation for many senders), there is no ability to associate the IP addresses that the SPF record approves with a specific domain.

Sometimes, there are hundreds or thousands of addresses. Therefore, it is essential to verify the domain with DKIM as well, because the reputation of a domain is linked to a specific DKIM.

Easy unsubscription:

Allow subscribers to remove themselves from the list by easily unsubscribing (a code that the email platform attaches to the header of the message). This guideline is published in RFC 8058.

Almost zero tolerance for spam

Email senders will be required to meet a very low level of user-reported spam complaints. The range allowed by Gmail is between 0.1% and 0.3%. The level of reported spam from Gmail cannot be seen in the email platform but only in Google Postmasters Tools.

Google Postmaster Tools - user-generated spam

Publish a DMARC record:

Email senders must publish a DMARC record, even if the policy is p=none.

I recommend setting up a DMARC Policy with an external deployment and monitoring tool and not using the settings provided by the various ESPs.

Martin Akil

Senior Account Manager bei Skysnag | DMARC Experte | Email Sicherheit

1 年

We at Skysnag contribute to the combat against spam. With strict email sending policies on enforcement levels, increasing inbox placement and email deliverability.

回复
Jay Schwedelson

Founder SubjectLine.com & President and CEO Outcome Media [Worldata Group] & Founder GURU Media Hub [Parent Co of GURU Conference, DELIVERED Conference, EVENTASTIC Conference, CertifiedGURU.com

1 年

great info in this newsletter!

要查看或添加评论,请登录

Sella Yoffe的更多文章

社区洞察