How to store passwords in the Database
Image source - https://betanews.com/2016/05/04/hotmail-gmail-yahoo-password-leak/

How to store passwords in the Database

Expectation from this article

Topics to be covered

  • Stories of losses due to mishandling of password
  • How the passwords are stored in the DB(Historical as well as current scenarios)
  • Hashing vs Encryption - Differences/Similarities
  • One code example with most advance algorithm to store/validate the password
  • The best practices
  • What Next??

Stories of losses due to mishandling of password

Before understanding actual password handling in the software systems it's important to understand how much value it holds

I believe the real-life events give us more kicks/seriousness about the issue
So lets see couple of such examples

Hacker leaks millions of Hotmail, Gmail, and Yahoo Mail usernames and passwords

This happened 6-7 years ago but a good read, more detail here

25 Million – Mathway, May 25, 2020

A popular website for helping students and children learn mathematics suffered from a data breach, resulting in more than 25 million records being exposed.?

There are many more such incidents, more list can be found here and if still looking for more then Google Baba is there??

How the passwords are stored in the DB

So it's obvious that we can't store the password in plain text. We all agree here without any doubt.

We have to covert the clear text password into some secret form

We have two approaches for that

  1. Hashing- It's one way, means we can't get back the clear text password back from the secret.
  2. Encryption- It's bidirectional, means we can get back the clear text password from the secret.

No alt text provided for this image

If we pass the clear text password through HASHING / ENCRYPTION, both generate the secret but the difference lies in whether we can get back the clear text password or not.

So imagine we are using encryption, and somehow passwords got leaked and hackers got the private-key/pattern that was used to encrypt the password then they can easily decrypt the password and get the every clear text password which is a huge risk.

Now come to HASHING, it's one way so even if passwords got leaked the hackers won't be able to get back the clear text password and so it's not of any use to hacker. We are safe even if data got leaked.

And this you can see in your real life if you call customer-care of the bank(or any institute you have account with) and ask for password they can't tell that because it's not in clear text and only option is to reset the password.

Only the USER/CUSTOMER knows the password.

We can use encryption for general purpose which the systems need to decrypt, like Whatsapp chats are stored in encrypted form and when rendering on the screen they are decrypted on the fly.

The important question is why hashed passwords can't be restored back -

  • There is complex mathematical computation in place at different layers.
  • Multiple passwords can result in the same hashed secret.

There is more detailed reason behind this - here

How the password validation happens

We have 2 ways-

  1. Decrypt the stored password and compare it with user entered password,
  2. Hash the user entered password and compare this hash with the one stored in the DB.

As we have seen getting back the clear text password from the hashed is impossible in real time so option#1 is not feasible. Option#2 is used to validate the password. And this is something that we don't have to do manually the hashing algorithm libraries provides us the way to validate the passwords as well. We have to pass the clear text and hashed password and it will return the result accordingly.

Now before looking at different secure hashing algorithms lets understand few terms related to secure storage of password -

Salting

It's unique randomly generated string added to the passwords before generating hash.

Every password has its own unique salt which is stored along with the hashed password

It provides one extra layer of security so attacker has to crack hashes one at a time using the respective salt rather than calculating a hash once and comparing it against every stored hash.?Earlier we needed to add the salt manually at the time of generating hash of the password but modern hashing algorithms such as Argon2id, bcrypt, and PBKDF2 automatically salt the passwords, so no additional steps are required when using them.

Peppering

It's combination of hashing and encryption. Once hashes are generated usually but before storing they are encrypted with some private key. This process of adding encryption is Peppering and private key is referred as Pepper

Pepper is same for all the passwords unlike salt which was different for each password.
Pepper must be stored in secret vaults which is different from DB
Pepper can be rotated as and when needed.

Peppering gives one more additional layer of security because now hacker has to get both database passwords as well as private key.

No alt text provided for this image

You can see in the above diagram there is one more layer of decryption added which makes life of hacker even more harder

Work factor

It's the number of iterations that can be added to make calculation of hashing even more difficult. In simple terms if work factor is 10 it will be do 10 iterations to calculate the hash and if it's 20 then it will do the 20 iterations to calculate the hash. So here we can see the attacker will have to spent more resources/time to validate each hash.

Adding more work factor will make usual user validation slow so it must be chosen carefully to maintain the balance between security and usability.


Different Hashing algorithms that can be used here. (Most recommended at the top, use as per availability)

Argon2id was the winner of the?Password Hashing Competition?in July 2015, it is intentionally resource (CPU, memory, etc) intensive. In Argon2, we can configure the length of the salt, the length of the generated hash, iterations, memory cost, and CPU cost to control the resources that are needed to hash a password.

There are different versions of Argon2 -

  • Argon2d - suitable for cryptocurrencies
  • Argon2i - suitable for password hashing
  • Argon2id, if no idea about the use-case apply this, kind of hybrid version of above two

Code example for password hashing

Lets see the example of Argon2id

The java library for Argon2 is Argon2-JVM, always use the latest version.

If you are using Spring Security then Argon2PasswordEncoder can be used.

We are using the argon2-jvm for our demo.

import de.mkammerer.argon2.Argon2;
import de.mkammerer.argon2.Argon2Factory;

import java.time.Instant;
import java.time.temporal.ChronoUnit;

public class Argon2PasswordHashingDemo {

    public static void main(String[] args) {

        int saltLength = 16;//in bytes
        int hashLength = 32;//in bytes
        
        //Utility to create the Argon2 instance, 
        //by default it gives the Argon2i
        Argon2 argon2 = Argon2Factory.create(saltLength, hashLength);

        char[] password = "Green Learner".toCharArray();

        Instant start = Instant.now();

        int iterations = 20;// Number of iterations
        int memory = 65536;//Sets memory usage to x kibibytes
        int parallelism = 1;// Number of threads
        
try {
    String hash = argon2.hash(iterations, memory, parallelism, password);

            System.out.println(hash);
            
System.out.println("hash generation took : " + ChronoUnit.MILLIS.between(start, Instant.now()) + "ms");

        } finally {
            argon2.wipeArray(password);        }
    }
}        

When we executed the above code -

First run -

$argon2i$v=19$m=65536,t=20,p=1$e+YmpON+cE3JSeq1ZcM4Gg$iz3yShNTo99JTPToNG4h7PBckloBvrFgpjwdFQJvwL
hash generation took : 1244ms        

Second run -

$argon2i$v=19$m=65536,t=20,p=1$yNu/+CWUPvGhbHtYSuZTYA$tdYrX7f7Va74vWTg+cH8fRjSsfb4e28uefVZfM60ZH
hash generation took : 1560ms        

To verify the password the utility method -

boolean verify = argon2.verify(hashedPassword, userEnteredPassowrd);        

If we want our hash function to take specified time(maxMilliSecond in the code below) but not sure about the number of iterations then below is the utility method for that

int memory = 65536;//Sets memory usage to x kibibytes
int parallelism = 1;// Number of threads

int iterations = Argon2Helper.findIterations(argon2, maxMilliSecond, memory, parallelism);        

Finally let's conclude the article with the best practices. If we are on same page I am sure you would agree with below points

Best Practices

  • Protect databases where the secrets are stored
  • Hash all the password and use strongest hash function possible
  • Salt your password(If you are using modern hashed it's automatically done but in case you are on legacy one then make sure to add it)
  • Pepper the password
  • Check passwords against dictionary list and common one which are easy to guess
  • Password length is more important than complexity - details here
  • Put validation/restriction at every layer, allow only specified attempts after that lock the user.

What Next??

If you wish to see the practical integration of this hashing mechanism in microservices do check-out the playlist on the youtube channel #greenlearner

References

https://cheatsheetseries.owasp.org/cheatsheets/Password_Storage_Cheat_Sheet.html

https://mkyong.com/java/java-password-hashing-with-argon2/

https://crypto.stackexchange.com/questions/45377/why-cant-we-reverse-hashes

https://snyk.io/learn/password-storage-best-practices/






要查看或添加评论,请登录

Arvind Kumar的更多文章

社区洞察

其他会员也浏览了