How to store passwords in the Database
Expectation from this article
Topics to be covered
Stories of losses due to mishandling of password
Before understanding actual password handling in the software systems it's important to understand how much value it holds
I believe the real-life events give us more kicks/seriousness about the issue
So lets see couple of such examples
Hacker leaks millions of Hotmail, Gmail, and Yahoo Mail usernames and passwords
This happened 6-7 years ago but a good read, more detail here
25 Million – Mathway, May 25, 2020
A popular website for helping students and children learn mathematics suffered from a data breach, resulting in more than 25 million records being exposed.?
There are many more such incidents, more list can be found here and if still looking for more then Google Baba is there??
How the passwords are stored in the DB
So it's obvious that we can't store the password in plain text. We all agree here without any doubt.
We have to covert the clear text password into some secret form
We have two approaches for that
If we pass the clear text password through HASHING / ENCRYPTION, both generate the secret but the difference lies in whether we can get back the clear text password or not.
So imagine we are using encryption, and somehow passwords got leaked and hackers got the private-key/pattern that was used to encrypt the password then they can easily decrypt the password and get the every clear text password which is a huge risk.
Now come to HASHING, it's one way so even if passwords got leaked the hackers won't be able to get back the clear text password and so it's not of any use to hacker. We are safe even if data got leaked.
And this you can see in your real life if you call customer-care of the bank(or any institute you have account with) and ask for password they can't tell that because it's not in clear text and only option is to reset the password.
Only the USER/CUSTOMER knows the password.
We can use encryption for general purpose which the systems need to decrypt, like Whatsapp chats are stored in encrypted form and when rendering on the screen they are decrypted on the fly.
The important question is why hashed passwords can't be restored back -
There is more detailed reason behind this - here
How the password validation happens
We have 2 ways-
As we have seen getting back the clear text password from the hashed is impossible in real time so option#1 is not feasible. Option#2 is used to validate the password. And this is something that we don't have to do manually the hashing algorithm libraries provides us the way to validate the passwords as well. We have to pass the clear text and hashed password and it will return the result accordingly.
Now before looking at different secure hashing algorithms lets understand few terms related to secure storage of password -
Salting
It's unique randomly generated string added to the passwords before generating hash.
Every password has its own unique salt which is stored along with the hashed password
It provides one extra layer of security so attacker has to crack hashes one at a time using the respective salt rather than calculating a hash once and comparing it against every stored hash.?Earlier we needed to add the salt manually at the time of generating hash of the password but modern hashing algorithms such as Argon2id, bcrypt, and PBKDF2 automatically salt the passwords, so no additional steps are required when using them.
Peppering
It's combination of hashing and encryption. Once hashes are generated usually but before storing they are encrypted with some private key. This process of adding encryption is Peppering and private key is referred as Pepper
Pepper is same for all the passwords unlike salt which was different for each password.
Pepper must be stored in secret vaults which is different from DB
Pepper can be rotated as and when needed.
Peppering gives one more additional layer of security because now hacker has to get both database passwords as well as private key.
领英推荐
You can see in the above diagram there is one more layer of decryption added which makes life of hacker even more harder
Work factor
It's the number of iterations that can be added to make calculation of hashing even more difficult. In simple terms if work factor is 10 it will be do 10 iterations to calculate the hash and if it's 20 then it will do the 20 iterations to calculate the hash. So here we can see the attacker will have to spent more resources/time to validate each hash.
Adding more work factor will make usual user validation slow so it must be chosen carefully to maintain the balance between security and usability.
Different Hashing algorithms that can be used here. (Most recommended at the top, use as per availability)
Argon2id was the winner of the?Password Hashing Competition?in July 2015, it is intentionally resource (CPU, memory, etc) intensive. In Argon2, we can configure the length of the salt, the length of the generated hash, iterations, memory cost, and CPU cost to control the resources that are needed to hash a password.
There are different versions of Argon2 -
Code example for password hashing
Lets see the example of Argon2id
The java library for Argon2 is Argon2-JVM, always use the latest version.
If you are using Spring Security then Argon2PasswordEncoder can be used.
We are using the argon2-jvm for our demo.
import de.mkammerer.argon2.Argon2;
import de.mkammerer.argon2.Argon2Factory;
import java.time.Instant;
import java.time.temporal.ChronoUnit;
public class Argon2PasswordHashingDemo {
public static void main(String[] args) {
int saltLength = 16;//in bytes
int hashLength = 32;//in bytes
//Utility to create the Argon2 instance,
//by default it gives the Argon2i
Argon2 argon2 = Argon2Factory.create(saltLength, hashLength);
char[] password = "Green Learner".toCharArray();
Instant start = Instant.now();
int iterations = 20;// Number of iterations
int memory = 65536;//Sets memory usage to x kibibytes
int parallelism = 1;// Number of threads
try {
String hash = argon2.hash(iterations, memory, parallelism, password);
System.out.println(hash);
System.out.println("hash generation took : " + ChronoUnit.MILLIS.between(start, Instant.now()) + "ms");
} finally {
argon2.wipeArray(password); }
}
}
When we executed the above code -
First run -
$argon2i$v=19$m=65536,t=20,p=1$e+YmpON+cE3JSeq1ZcM4Gg$iz3yShNTo99JTPToNG4h7PBckloBvrFgpjwdFQJvwL
hash generation took : 1244ms
Second run -
$argon2i$v=19$m=65536,t=20,p=1$yNu/+CWUPvGhbHtYSuZTYA$tdYrX7f7Va74vWTg+cH8fRjSsfb4e28uefVZfM60ZH
hash generation took : 1560ms
To verify the password the utility method -
boolean verify = argon2.verify(hashedPassword, userEnteredPassowrd);
If we want our hash function to take specified time(maxMilliSecond in the code below) but not sure about the number of iterations then below is the utility method for that
int memory = 65536;//Sets memory usage to x kibibytes
int parallelism = 1;// Number of threads
int iterations = Argon2Helper.findIterations(argon2, maxMilliSecond, memory, parallelism);
Finally let's conclude the article with the best practices. If we are on same page I am sure you would agree with below points
Best Practices
What Next??
If you wish to see the practical integration of this hashing mechanism in microservices do check-out the playlist on the youtube channel #greenlearner
References
https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html
https://mkyong.com/java/java-password-hashing-with-argon2/
https://crypto.stackexchange.com/questions/45377/why-cant-we-reverse-hashes
https://snyk.io/learn/password-storage-best-practices/