Email Validation - A Bug's Unfortunate Fate -ep01
Once in a while as a software developer, we encounter bugs that look simple enough on the surface and should be easy to find and fix(debug) in under an hour but always turn out to take more time; In my case, it was email validation.
BackStory
One of our services uses Fastify as the server and Zod for schema validation. The system is set up in such a way that inputs are validated and outputs are serialised to ensure that they meet the Zod schema validation. We also set up a flag to obscure and redact user-sensitive information like email, phone number and physical address.
The Issue
In this case, I am returning the user details which contain an obscure email field; I use the asterisk(*) as the character for obscurity
// User Object
const user = {
...
email: 'exa***[email protected]'
}
Based on the RFC-3696 standard validating the user object from above should pass
// Zod schema
const UserResponse = z.object({
email: z.string().email(),
});
const vResult = UserResponse.safeParse(user)
// vResult = {success: false, error: ZodError}
Well, that is what I thought.
I was surprised when I noticed that the API was returning an error that says Response does not match the schema
After a bunch of console.log, inspecting the returned data, and googling, I found out that "Zod does not treat email like emails."
Prognosis
Zod does not treat email addresses like emails
The RFC-3696 provides guidelines and standards on how to construct and validate an email address. These guidelines include the characters and size of an email address.
The asterisk(*) is a valid character from the guidelines. colinhacks the author of Zod detailed why this decision was made.
The summary of that decision is
领英推荐
Emails take different forms, from a simple [email protected] to "example"@127.0.0.1 and to more complex forms like Ipv6 address and multi-level sub-domain. There is no way to support all these emails without breaking the simple email validation(no one email regex can make everyone happy), plus the majority of developers using Zod are expecting simple email addresses from their users.
With this simple reason in mind, a simple regex was used.
z.string().email() does not treat all email addresses as valid RCF emails but as simple email addresses like [email protected].
Resolution
To resolve this, I had two option
1. Validate and treat all emails sent from the client as simple email addresses. Treat all email returned to the client as a string (z.string())
2. Use z.superRefine method with my custom regex to extend Zod email validation
I picked option 1; for the following reasons
1. We do not expect our users to use complicated email addresses and Zod regex perfectly fits our use case
2. We can guarantee that all email address returned to the user is a valid email address
If I had gone with Option 2, our email schema would have used this regex
const emailRegex = /^([A-Z0-9_*+-]+\.?)*[A-Z0-9_*+-]@([A-Z0-9][A-Z0-9-]*\.)+[A-Z]{2,}$/i;
This regex only introduces support for the asterisk(*) character on the local part of the email address.
import z from 'zod';
export const EmailSchema = z
.string({
required_error: 'Email address is required',
})
.toLowerCase()
.trim()
.min(5)
.max(320) // see - https://github.com/colinhacks/zod/issues/3155
.superRefine((data, ctx) => {
const emailRegex = /^([A-Z0-9_*+-]+\.?)*[A-Z0-9_*+-]@([A-Z0-9][A-Z0-9-]*\.)+[A-Z]{2,}$/i;
if (!emailRegex.test(data)) {
ctx.addIssue({
code: z.ZodIssueCode.invalid_string,
message: 'Invalid email address',
validation: 'email',
});
return z.NEVER;
}
return z.NEVER;
});
// TEST
EmailSchema.safeParse('[email protected]')
// Returns {success: true, data: [email protected]}
EmailSchema.safeParse('ex***[email protected]')
// Returns {success: true, data: ex***[email protected]}
Conclusion
Fixing this issue took over 4 hours because I had to take a detour to fix how errors are logged and returned when they arise from validation and serialisation. Researching how Zod handles email validation also took some time. Fixing the issue took less than 10 minutes - I implemented both Option 1 and Option 2 in our codebase. I implemented Option 2 because I wanted to support some of the characters RFC specification allow for the local part an email address.