登录查看更多内容

Java Concepts: What everyone knows Vs What you should know? - Part 3A

Aayushi Khandelwal

Software Engineer 2 @Uber India

发布日期: 2020年9月6日

The most important Java concept that everyone should know as it is asked in almost all kind of technical interviews is HASHMAPS !!!

This blog is for folks who have heard and used the bunch of maps like HashMaps, HashTables, SynchronizedMaps, Concurrent HashMaps, TreeMaps etc in their code but still struggle to tell how these work internally and how they are related.

Yes, you guessed it right. In this blog, I will talk about the full family of Hashmaps, their origin, their variants and concurrency in Hashmaps in details. I talked about HashSet and its variant TreeSet in my previous blog so in case you haven't checked it, you can read it here.

So before starting the actual topic lets discuss a few questions that are really important to know to deep dive into the hashmaps.

What is the difference between iterators and enumerators?

What everyone knows:

Both are interfaces which are used to iterate over a collection of objects.
Iterators are still used today but enumerators are not used now as all the classes using them have been deprecated so they are used only for traversal of the objects of legacy classes.

What you should know

Well, the above information is correct but partially complete.

Enumeration is a legacy interface which is used for traversing Vector, Hashtable both have become legacy classes now. Iterator is not a legacy interface. Iterator can be used for the traversal of HashMap, LinkedList, ArrayList, HashSet, TreeMap, TreeSet etc.
The main difference or what the interviewer would like to listen from you is the point how you delete the elements in both and which one is fail-safe and which one is fail-fast? To understand that let us take the example of the below code:

import java.util.*;

public class IteratorSample {
    public static void main(String[] args) {
        Map<Integer,Integer> map = new HashMap<>();
        map.put(1,1);
        map.put(2,2);
        map.put(3,3);
        Iterator it = map.entrySet().iterator();
        while (it.hasNext())
        {
            Map.Entry item = (Map.Entry) it.next();
            System.out.println(" key : " + item.getKey() + " value : " + item.getValue());
            map.remove(item.getKey());
        }
    }
}
}

Here, I have one hashmap with some key and values and I want to print all keys and values and delete the keys in the hashmap after printing.

But my code throws below error :

 key : 1 value : 1
Exception in thread "main" java.util.ConcurrentModificationException
	at java.util.HashMap$HashIterator.nextNode(HashMap.java:1445)
	at java.util.HashMap$EntryIterator.next(HashMap.java:1479)
	at java.util.HashMap$EntryIterator.next(HashMap.java:1477)
	
    at classes.IteratorSample.main(IteratorSample.java:14)

Now, in this case, like we all developers do, I searched for the solution and did the following modification and the code worked.

import java.util.*;

public class IteratorSample {
    public static void main(String[] args) {
        Map<Integer,Integer> map = new HashMap<>();
        map.put(1,1);
        map.put(2,2);
        map.put(3,3);
        Iterator it = map.entrySet().iterator();
        while (it.hasNext())
        {
            Map.Entry item = (Map.Entry) it.next();
            System.out.println(" key : " + item.getKey() + " value : " + item.getValue());
            it.remove();
        }
    }
}

Let's understand why this happened. Let's know what the documentation of HashMap says :

The iterators returned [directly or indirectly from this class] are fail-fast: if the [collection] is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behaviour at an undetermined time in the future.

Now, you might be guessing it right that if Iterators are fail-fast so it means Enumerators are fail-safe. Well, then how are they so? Actually, it's quite funny to know that Enumeration has only read-only access to the elements in a collection. Unlike Iterators, they don't have remove() method and one can not do any modifications to Collection while traversing the elements of the Collection. So what if you want to remove elements in the HashTable, then for that Iterators are used like shown below:

public class EnumeratorSample {
    public static synchronized void fun() {
        Hashtable<Integer, String> players = new Hashtable<>();
        players.put(1,"Sachin Tendulkar");
        players.put(2,"Rahul Dravid");
        players.put(3,"Virat Kohli");
        players.put(4,"Rohit Sharma");
        Iterator<Map.Entry<Integer, String>> iter =      players.entrySet().iterator();

        while(iter.hasNext()) {
            Map.Entry<Integer, String> e = iter.next();
            if(e.getValue().startsWith("R")) {
                iter.remove();
            }
        }

        System.out.println(players);
    }

    public static void main(String[] args) {
        fun();
    }

}

Output:

{3=Virat Kohli, 1=Sachin Tendulkar}

How Hashmaps work internally?

What Everyone knows:

Hashmaps are used for storing key-value pairs and we can store, retrieve and delete elements with O(1) average time complexity and O(n) worst-case time complexity.
The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the hash table is rehashed (that is, internal data structures are rebuilt) so that the hash table has approximately twice the number of buckets. The default value of load factor is 0.75.

Threshold = (Current Capacity) * (Load Factor) = 12 ( if capacity = 16, LF = 0.75)

HashMap works on the principle of Hashing. There are three terms used in hashing: Hash Function, Hash Value and Bucket.
In Java, the default bucket size is 16 and the maximum bucket size is 2^30 and we can verify it with the following code found in HashMap class. In simple terms, Bucket is basically an element of the array to which a key is mapped to.

/**
 * The default initial capacity - MUST be a power of two.
 */
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16

/**
 * The maximum capacity, used if a higher value is implicitly specified
 * by either of the constructors with arguments.
 * MUST be a power of two <= 1<<30.
 */
static final int MAXIMUM_CAPACITY = 1 << 30;

The hashCode() method is the Hash Function that returns an integer X for an input key K. When bitwise AND of X and N-1 ( last index of the buckets array) is done, we get bucket number B for that key. Further, we create a node corresponding K in bucket B which has a hash(X), key(K), value(V) and next (pointer to next node).

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
    V value;
    
    Node<K,V> next;
}

Following diagrams illustrates it better:

Here , K = "KING" , V = 100, X = 2306996, N = 16 (default) and B = X & (N-1) = 4

and output of put operation will look like below:

Now if another Key K comes for which value of B comes out to be 4. This is called collision and as a solution, it will create a linked list as shown below:

When we do scores.get("KING"), the hash function again gives the same X and B and the linked-list at Bucket 4 is traversed. If the value for both value X and K(as hash can be the same for two different keys) are matched then value V is returned from the node.

What you should know

Since Java 8, the linked-list structure gets replaced by the balanced BST Red-Black Tree when a certain threshold is reached and there are now three more data members in the HashMap class.

/**
 * The bin count threshold for using a tree rather than list for a
 * bin.  Bins are converted to trees when adding an element to a
 * bin with at least this many nodes. The value must be greater
 * than 2 and should be at least 8 to mesh with assumptions in
 * tree removal about conversion back to plain bins upon
 * shrinkage.
 */
static final int TREEIFY_THRESHOLD = 8;

/**
 * The bin count threshold for untreeifying a (split) bin during a
 * resize operation. Should be less than TREEIFY_THRESHOLD, and at
 * most 6 to mesh with shrinkage detection under removal.
 */
static final int UNTREEIFY_THRESHOLD = 6;

/**
 * The smallest table capacity for which bins may be treeified.
 * (Otherwise the table is resized if too many nodes in a bin.)
 * Should be at least 4 * TREEIFY_THRESHOLD to avoid conflicts
 * between resizing and treeification thresholds.
 */
static final int MIN_TREEIFY_CAPACITY = 64;

So it means when there are at least 8 entries (TREEIFY_THRESHOLD) in a single bucket and the total number of buckets is more than 64 (MIN_TREEIFY_CAPACITY) then that single bucket will be transformed to a perfectly balanced red-black tree node such that the worst-case time complexity is reduced to O(log N) time as shown below.

We can note here, only the particular bucket was transformed to RB Tree and the hashmap can still have combination of Red-Black Trees and Linked-Lists.

If there are two keys with the same hashCode then how are they retrieved? We traverse through the linked list, comparing keys in each entries using keys.equals() until it returns true. Then the corresponding entry object Value is returned.
If there are two same keys with the same hashCode then how are they retrieved? If key needs to be inserted and already inserted hash key's hashcodes are same, and keys are also same(via reference or using equals() method) then override the previous key-value pair with the current key-value pair.
All the nodes in the linked-list / Red Black Tree are instances of Node class which implements Map. Entry Interface. To iterate over the nodes, we generally use iterators which are fail-fast. Thus, Hashmaps can't be used when lots of reads and writes are involved concurrently by multiple threads.
Thus, Hashmaps are fast but they are not synchronized and not thread-safe.On the other hand, Hashtables are thread-safe and synchronized.
When to use HashMap? the answer is if your application does not require any multi-threading task, in other words, HashMap is better for non-threading applications. Hashtable should be used in multithreading applications.

Now you have understood everything about HashMaps, let's discuss about HashTables.

How Hashtables are synchronized?

Hashtable is a subclass of Dictionary class which got obsolete in JDK 1.7, so, it is not used anymore. But we are discussing it here as it becomes the base for the birth of concurrent hashmaps. They are synchronized because:

They use enumerators rather than iterators in the traversal. Now you understood why I discussed that in the beginning. So there is not a chance that the value of any key is modified.
While using hashtable methods, you need the synchronized block. since the methods of hashtable are synchronized. So, the thread which works on Hashtable acquires a lock on it to make the other threads wait till its work gets completed. Thus, hashtables are synchronized and also slow than hashmaps.

One more question that generally comes in mind of the developers:

Why HashTable doesn’t allow null and HashMap does?

To successfully store and retrieve objects from a HashTable, the objects used as keys must implement the hashCode method and the equals method. Since null is not an object, it can’t implement these methods. HashMap is an advanced version and improvement on the Hashtable. HashMap was created later with the functionality to allow one null key.

So far, we have discussed iterators, enumerators, hashmaps and hashtables in part 3A. I have discussed further topics Synchronized Hashmaps and Concurrent Hashmaps in part 3B to maintain the length of this blog and give the chance to my readers to digest the above information.

In case you have questions or doubts, you are welcome to mention that in the comments section.

Stay tuned and keep reading my blogs :P

Yasir Shabbir

2 年

Good stuff about the Hashmap and really like it.

1 次回应

Aayushi Khandelwal

Software Engineer 2 @Uber India

4 年

Thanks Rajdeep Deb . Good to know you liked the post.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Java Concepts: What everyone knows Vs What you should know? - Part 3A

Aayushi Khandelwal

Software Engineer 2 @Uber India

更多精彩文章

社区洞察

其他会员也浏览了

Brief Notes From Effective Java Book

Tips To Write Better DTOs in Java

Java performance improvement — Java 8+ streams vs loops and lists vs arrays

Diagrams, diagrams, diagrams: Classloaders and the Java Developer Roadmap - JVM Weekly vol. 99

From Java 8 to Java 15 in Ten Minutes

Ten Optimization Tricks to Make Your Java Application Run Faster

What Is Serialization In Java?

Java Features 8-20

Java Generics

Java 9 - New Features

Java Concepts: What everyone knows Vs What you should know? - Part 3B

2020年9月6日

Java Concepts - What everyone knows vs What you should know? : Part 2

2020年7月27日

My journey with AntiPatterns - Part 2

2020年7月11日

My Journey with Anti-Patterns - Part 1

2020年7月2日

Azure Messaging Services

2020年3月22日

Java Concepts - What everyone knows vs What you should know? : Part 1

2019年8月16日

Elastic Search - What? Why? How?

2018年3月3日

社区洞察

其他会员也浏览了

Brief Notes From Effective Java Book

Tips To Write Better DTOs in Java

Java performance improvement — Java 8+ streams vs loops and lists vs arrays

Diagrams, diagrams, diagrams: Classloaders and the Java Developer Roadmap - JVM Weekly vol. 99

From Java 8 to Java 15 in Ten Minutes

Ten Optimization Tricks to Make Your Java Application Run Faster

What Is Serialization In Java?

Java Features 8-20

Java Generics

Java 9 - New Features