Java Concepts: What everyone knows Vs What you should know? - Part 3B
This is in continuation of my previous article Part 3A in which I discussed iterators, enumerators, hashmaps and hashtables. While discussing the hashtables, we got to know that hashmaps are not synchronized and are not suitable for multi-threaded applications. On the other hand, HashTables are synchronized but they have been deprecated now.
Hashmaps must be synchronized externally to avoid an inconsistent view of the contents. There are two ways to synchronize Hashmaps :
- Java Collections.synchronizedMap() method
- Use ConcurrentHashMap
//HashMap Map<String, String> normalMap = new HashMap<String, String>(); //synchronizedMap Map<String, String> synchronizedHashMap = Collections.synchronizedMap(new HashMap<String, String>()); //ConcurrentHashMap Map<String, String> concurrentHashMap = new ConcurrentHashMap<String, String>();
How does Collections.synchronizedMap() work?
- It works exactly like HashTables by acquiring a lock on the object. In Hashtables, all the methods were synchronized thus only one thread could access any method at a time.
- This essentially gives access to only one thread to the entire map & blocks all the other threads. This happens for both reads and writes.
- You get thread-safety but introduces performance overhead as the entire map gets blocked for other threads.
- Like HashMaps, it uses Iterator which fails-fast on concurrent modification.
What are Concurrent hashmaps?
- ConcurrentHashMap is an enhancement of HashMap as we know that while dealing with threads in our application HashMap is not a good choice because performance-wise HashMap is not up to the mark.
- The underlying data structure behind concurrent hashmap is none other than hashtable who died but left a child :P. That's why it's fail-safe and doesn't throw ConcurrentModificationException that we saw in case of the hash maps in the previous blog.
- Concurrent hashmap implements the ConcurrentMap interface as well as the Serializable interface.
- It is thread-safe without synchronizing the whole map as the locking is at a much finer granularity at a hashmap bucket level.
Now, let's look at the code and compare the performance of HashTables, Synchronized HashMaps and ConcurrentHashMaps.
public class SynchronizedVsConcurrentHashMaps { public final static int THREAD_POOL_SIZE = 5; public static Map<String, Integer> hashTableObject = null; public static Map<String, Integer> synchronizedMapObject = null; public static Map<String, Integer> concurrentHashMapObject = null; public static void main(String[] args) throws InterruptedException { // Test with Hashtable Object hashTableObject = new Hashtable<String, Integer>(); PerformTest(hashTableObject); // Test with synchronizedMap Object synchronizedMapObject = Collections.synchronizedMap(new HashMap<String, Integer>()); PerformTest(synchronizedMapObject); // Test with ConcurrentHashMap Object concurrentHashMapObject = new ConcurrentHashMap<String, Integer>(); PerformTest(concurrentHashMapObject); } public static void PerformTest(final Map<String, Integer> Threads) throws InterruptedException { System.out.println("Test started for: " + Threads.getClass()); long averageTime = 0; for (int i = 0; i < 5; i++) { long startTime = System.nanoTime(); ExecutorService exServer = Executors.newFixedThreadPool(THREAD_POOL_SIZE); for (int j = 0; j < THREAD_POOL_SIZE; j++) { exServer.execute(new Runnable() { @SuppressWarnings("unused") @Override public void run() { for (int i = 0; i < 500000; i++) { Integer RandomNumber = (int) Math.ceil(Math.random() * 550000); // Retrieve value. We are not using it anywhere Integer Value = Threads.get(String.valueOf(RandomNumber)); // Put value Threads.put(String.valueOf(RandomNumber), RandomNumber); } } }); } exServer.shutdown(); // Blocks until all tasks have completed execution exServer.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS); long entTime = System.nanoTime(); long totalTime = (entTime - startTime) / 1000000L; averageTime += totalTime; System.out.println("500K entried added/retrieved in " + totalTime + " ms"); } System.out.println("For " + Threads.getClass() + " the average time is " + averageTime / 5 + " ms\n"); } }
And the output of the program is as below:
Test started for: class java.util.Hashtable 500K entried added/retrieved in 1926 ms 500K entried added/retrieved in 2210 ms 500K entried added/retrieved in 1596 ms 500K entried added/retrieved in 1548 ms 500K entried added/retrieved in 1594 ms For class java.util.Hashtable the average time is 1774 ms Test started for: class java.util.Collections$SynchronizedMap 500K entried added/retrieved in 1507 ms 500K entried added/retrieved in 1535 ms 500K entried added/retrieved in 1459 ms 500K entried added/retrieved in 1578 ms 500K entried added/retrieved in 1754 ms For class java.util.Collections$SynchronizedMap the average time is 1566 ms Test started for: class java.util.concurrent.ConcurrentHashMap 500K entried added/retrieved in 698 ms 500K entried added/retrieved in 687 ms 500K entried added/retrieved in 686 ms 500K entried added/retrieved in 678 ms 500K entried added/retrieved in 570 ms For class java.util.concurrent.ConcurrentHashMap the average time is 663 ms
And we can see, ConcurrentHashMaps are the fastest and are the best choice for multi-threaded applications.
How are ConcurrentHashMaps internally implemented?
Out of curiosity, I looked at the constructor of the ConcurrentHashMap class and I found three parameters :
- Initial Capacity
- Load factor
- Concurrency Level
/** * Creates a new, empty map with an initial table size based on * the given number of elements ({@code initialCapacity}), table * density ({@code loadFactor}), and number of concurrently * updating threads ({@code concurrencyLevel}). * * @param initialCapacity the initial capacity. The implementation * performs internal sizing to accommodate this many elements, * given the specified load factor. * * @param loadFactor the load factor (table density) for * establishing the initial table size * * @param concurrencyLevel the estimated number of concurrently * updating threads. The implementation may use this value as * a sizing hint. * * @throws IllegalArgumentException if the initial capacity is * negative or the load factor or concurrencyLevel are * nonpositive */ public ConcurrentHashMap(int initialCapacity, float loadFactor, int concurrencyLevel) { if (!(loadFactor > 0.0f) || initialCapacity < 0 || concurrencyLevel <= 0) throw new IllegalArgumentException(); if (initialCapacity < concurrencyLevel) // Use at least as many bins initialCapacity = concurrencyLevel; // as estimated threads long size = (long)(1.0 + (long)initialCapacity / loadFactor); int cap = (size >= (long)MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : tableSizeFor((int)size); this.sizeCtl = cap; }
The first two initialCapacity and load factor I understood as I knew them from hashmaps internal implementation. Those who didn't get, please refer to the previous blog.
What is concurrency level?
As the documentation said, It is the estimated number of concurrently updating threads. The implementation performs internal sizing to try to accommodate this many threads.
Further, I found below variables in the class which says there are 16 buckets and 16 concurrent threads can work by default.
static final int DEFAULT_INITIAL_CAPACITY = 16; static final int DEFAULT_CONCURRENCY_LEVEL = 16;
Instead of a map-wide lock, ConcurrentHashMap maintains a list of 16 locks by default each of which is used to lock on a single bucket of the Map.
Yes, you got it right! This indicates that 16 threads can modify the map at the same time, given, each thread works on the different bucket.
Don't get confused, you will get the concept from the below diagram :
ConcurrentHashMap creates an array called Segments on the top of it and each index of this array represents a HashMap.
If Concurrency level is 10, it means at any given point of time Segment array size will be 10 or greater than 10, so that 10 threads can able to write to a map in parallel.
In general, segments size remains in the powers of 2, so for concurrency level = 10
2^X >=10 , so segment size = 16
How Read and Write takes place in ConcurrentHashMaps?
This can be better explained with the below flow chart:
What happens when a segment index is calculated for writing a new key-value?
So, let's suppose Thread T1 comes and calls map.put(1,1) and the segment index for the key is calculated as 1. So since it is a write request, a lock is acquired on the segment 1 and no other threads can access it until T1 performs its task successfully.
At the same time, 15 other write threads can come and acquire locks on the other 15 segments.
Now, a stream of following questions will be flooding in your mind:
Can multiple read operations by multiple threads happen concurrently?
Yes, read operations can be performed concurrently in the hashmaps of same or different segments and the lock is not acquired for read operations.
Can multiple write operations by multiple threads happen concurrently?
Yes, write operations can be performed concurrently in the hashmaps of different segments as the lock is acquired for a write operation in a segment.
Can multiple read and write operations by multiple threads happen concurrently in the same segment?
This question is actually tricky. Yes, multiple read operations can happen in the same segment by multiple threads but only thread can perform multiple write operations in the same segments concurrently.
Does Rehashing happens at the segment level as well?
Basically, Rehashing is the process of re-calculating the hashcode of already stored entries (Key-Value pairs), to move them to another bigger size map when Load factor threshold is reached.
In ConcurrentHashMap, Every segment is separately rehashed so there is no collision between Thread 1 writing to Segment index 1 and Thread 2 writing to Segment index 4.
For Example:- If say Thread 1 is putting data in Segment[] array index 3 and finds that HashEntry[] array needs to be rehashed due to exceed Load factor capacity then it will rehash HashEntry[] array present at Segment[] array index 3 only. HashEntry[] array at other Segment indexes will still be intact, unaffected and continue to serve put and get the request in parallel.
I hope this should help in understanding the implementation and internal working of SynchronizedHashMap and ConcurrentHashMap at present and in future.
Don't forget to mention in the comments if you have any doubts or questions. If you liked the content, you can also mention that. ??
Wonderful article Aayushi .
Senior Software Engineer at LinkedIn
4 å¹´Excellent ,well explained ...
Building Teams | Solving problems | Having fun
4 å¹´Very nicely written