Java Concepts: What everyone knows Vs What you should know? - Part 3B

Java Concepts: What everyone knows Vs What you should know? - Part 3B

This is in continuation of my previous article Part 3A in which I discussed iterators, enumerators, hashmaps and hashtables. While discussing the hashtables, we got to know that hashmaps are not synchronized and are not suitable for multi-threaded applications. On the other hand, HashTables are synchronized but they have been deprecated now.

Hashmaps must be synchronized externally to avoid an inconsistent view of the contents. There are two ways to synchronize Hashmaps :

  1. Java Collections.synchronizedMap() method
  2. Use ConcurrentHashMap
//HashMap

Map<String, String> normalMap = new HashMap<String, String>();
 
//synchronizedMap

Map<String, String> synchronizedHashMap = 
Collections.synchronizedMap(new HashMap<String, String>());
 
//ConcurrentHashMap
Map<String, String> concurrentHashMap = 
new ConcurrentHashMap<String, String>();


How does Collections.synchronizedMap() work?
  • It works exactly like HashTables by acquiring a lock on the object. In Hashtables, all the methods were synchronized thus only one thread could access any method at a time.
  • This essentially gives access to only one thread to the entire map & blocks all the other threads. This happens for both reads and writes.
  • You get thread-safety but introduces performance overhead as the entire map gets blocked for other threads.
  • Like HashMaps, it uses Iterator which fails-fast on concurrent modification.
What are Concurrent hashmaps?
  • ConcurrentHashMap is an enhancement of HashMap as we know that while dealing with threads in our application HashMap is not a good choice because performance-wise HashMap is not up to the mark.
  • The underlying data structure behind concurrent hashmap is none other than hashtable who died but left a child :P. That's why it's fail-safe and doesn't throw ConcurrentModificationException that we saw in case of the hash maps in the previous blog.
  • Concurrent hashmap implements the ConcurrentMap interface as well as the Serializable interface.
  • It is thread-safe without synchronizing the whole map as the locking is at a much finer granularity at a hashmap bucket level.

Now, let's look at the code and compare the performance of HashTables, Synchronized HashMaps and ConcurrentHashMaps.

public class SynchronizedVsConcurrentHashMaps {

    public final static int THREAD_POOL_SIZE = 5;

    public static Map<String, Integer> hashTableObject = null;
    public static Map<String, Integer> synchronizedMapObject = null;
    public static Map<String, Integer> concurrentHashMapObject = null;

    public static void main(String[] args) throws InterruptedException {

        // Test with Hashtable Object

        hashTableObject = new Hashtable<String, Integer>();

        PerformTest(hashTableObject);

        // Test with synchronizedMap Object

        synchronizedMapObject = 

           Collections.synchronizedMap(new HashMap<String, Integer>());

        PerformTest(synchronizedMapObject);

        // Test with ConcurrentHashMap Object

        concurrentHashMapObject =
 
                              new ConcurrentHashMap<String, Integer>();

        PerformTest(concurrentHashMapObject);

    }

    public static void PerformTest(final Map<String, Integer> Threads) throws InterruptedException {

        System.out.println("Test started for: " + Threads.getClass());

        long averageTime = 0;

        for (int i = 0; i < 5; i++) {

            long startTime = System.nanoTime();
            ExecutorService exServer = 

                      Executors.newFixedThreadPool(THREAD_POOL_SIZE);

            for (int j = 0; j < THREAD_POOL_SIZE; j++) {

                exServer.execute(new Runnable() {

                    @SuppressWarnings("unused")
                    @Override
                    public void run() {

                        for (int i = 0; i < 500000; i++) {
                            Integer RandomNumber = 
                              (int) Math.ceil(Math.random() * 550000);

                            // Retrieve value. We are not using it anywhere
                            Integer Value =     
                            Threads.get(String.valueOf(RandomNumber));

                            // Put value
                            Threads.put(String.valueOf(RandomNumber), 
                            RandomNumber);
                        }
                    }
                });
            }

         
            exServer.shutdown();

            // Blocks until all tasks have completed execution

            exServer.awaitTermination(Long.MAX_VALUE, TimeUnit.DAYS);

            long entTime = System.nanoTime();

            long totalTime = (entTime - startTime) / 1000000L;

            averageTime += totalTime;

            System.out.println("500K entried added/retrieved in " + 
            totalTime + " ms");
        }

        System.out.println("For " + Threads.getClass() +
        " the average time is " + averageTime / 5 + " ms\n");
   
    }
 } 

And the output of the program is as below:

Test started for: class java.util.Hashtable
500K entried added/retrieved in 1926 ms
500K entried added/retrieved in 2210 ms
500K entried added/retrieved in 1596 ms
500K entried added/retrieved in 1548 ms
500K entried added/retrieved in 1594 ms
For class java.util.Hashtable the average time is 1774 ms


Test started for: class java.util.Collections$SynchronizedMap
500K entried added/retrieved in 1507 ms
500K entried added/retrieved in 1535 ms
500K entried added/retrieved in 1459 ms
500K entried added/retrieved in 1578 ms
500K entried added/retrieved in 1754 ms
For class java.util.Collections$SynchronizedMap the average time is 1566 ms


Test started for: class java.util.concurrent.ConcurrentHashMap
500K entried added/retrieved in 698 ms
500K entried added/retrieved in 687 ms
500K entried added/retrieved in 686 ms
500K entried added/retrieved in 678 ms
500K entried added/retrieved in 570 ms
For class java.util.concurrent.ConcurrentHashMap the average time is 663 ms

And we can see, ConcurrentHashMaps are the fastest and are the best choice for multi-threaded applications.

How are ConcurrentHashMaps internally implemented?

Out of curiosity, I looked at the constructor of the ConcurrentHashMap class and I found three parameters :

  1. Initial Capacity
  2. Load factor
  3. Concurrency Level
/**
 * Creates a new, empty map with an initial table size based on
 * the given number of elements ({@code initialCapacity}), table
 * density ({@code loadFactor}), and number of concurrently
 * updating threads ({@code concurrencyLevel}).
 *
 * @param initialCapacity the initial capacity. The implementation
 * performs internal sizing to accommodate this many elements,
 * given the specified load factor.
 *
 * @param loadFactor the load factor (table density) for
 * establishing the initial table size
 *
 * @param concurrencyLevel the estimated number of concurrently
 * updating threads. The implementation may use this value as
 * a sizing hint.
 *
 * @throws IllegalArgumentException if the initial capacity is
 * negative or the load factor or concurrencyLevel are
 * nonpositive
 */

public ConcurrentHashMap(int initialCapacity,
                         float loadFactor, 
                         int concurrencyLevel) {

    if (!(loadFactor > 0.0f) || initialCapacity < 0 || 

         concurrencyLevel <= 0)

        throw new IllegalArgumentException();

    if (initialCapacity < concurrencyLevel)   // Use at least as many bins

        initialCapacity = concurrencyLevel;   // as estimated threads

    long size = (long)(1.0 + (long)initialCapacity / loadFactor);

    int cap = (size >= (long)MAXIMUM_CAPACITY) ?

        MAXIMUM_CAPACITY : tableSizeFor((int)size);

    this.sizeCtl = cap;

}

The first two initialCapacity and load factor I understood as I knew them from hashmaps internal implementation. Those who didn't get, please refer to the previous blog.

What is concurrency level?

As the documentation said, It is the estimated number of concurrently updating threads. The implementation performs internal sizing to try to accommodate this many threads.

Further, I found below variables in the class which says there are 16 buckets and 16 concurrent threads can work by default.

static final int DEFAULT_INITIAL_CAPACITY = 16;
static final int DEFAULT_CONCURRENCY_LEVEL = 16;

Instead of a map-wide lock, ConcurrentHashMap maintains a list of 16 locks by default each of which is used to lock on a single bucket of the Map.

Yes, you got it right! This indicates that 16 threads can modify the map at the same time, given, each thread works on the different bucket.

No alt text provided for this image

Don't get confused, you will get the concept from the below diagram :

No alt text provided for this image

ConcurrentHashMap creates an array called Segments on the top of it and each index of this array represents a HashMap.

If Concurrency level is 10, it means at any given point of time Segment array size will be 10 or greater than 10, so that 10 threads can able to write to a map in parallel.

In general, segments size remains in the powers of 2, so for concurrency level = 10

2^X >=10 , so segment size = 16

How Read and Write takes place in ConcurrentHashMaps?

This can be better explained with the below flow chart:

No alt text provided for this image
What happens when a segment index is calculated for writing a new key-value?

So, let's suppose Thread T1 comes and calls map.put(1,1) and the segment index for the key is calculated as 1. So since it is a write request, a lock is acquired on the segment 1 and no other threads can access it until T1 performs its task successfully.

At the same time, 15 other write threads can come and acquire locks on the other 15 segments.

No alt text provided for this image


Now, a stream of following questions will be flooding in your mind:

Can multiple read operations by multiple threads happen concurrently?

Yes, read operations can be performed concurrently in the hashmaps of same or different segments and the lock is not acquired for read operations.

Can multiple write operations by multiple threads happen concurrently?

Yes, write operations can be performed concurrently in the hashmaps of different segments as the lock is acquired for a write operation in a segment.

Can multiple read and write operations by multiple threads happen concurrently in the same segment?

This question is actually tricky. Yes, multiple read operations can happen in the same segment by multiple threads but only thread can perform multiple write operations in the same segments concurrently.

Does Rehashing happens at the segment level as well?

Basically, Rehashing is the process of re-calculating the hashcode of already stored entries (Key-Value pairs), to move them to another bigger size map when Load factor threshold is reached.

In ConcurrentHashMap, Every segment is separately rehashed so there is no collision between Thread 1 writing to Segment index 1 and Thread 2 writing to Segment index 4.

For Example:- If say Thread 1 is putting data in Segment[] array index 3 and finds that HashEntry[] array needs to be rehashed due to exceed Load factor capacity then it will rehash HashEntry[] array present at Segment[] array index 3 only. HashEntry[] array at other Segment indexes will still be intact, unaffected and continue to serve put and get the request in parallel.

I hope this should help in understanding the implementation and internal working of SynchronizedHashMap and ConcurrentHashMap at present and in future.

Don't forget to mention in the comments if you have any doubts or questions. If you liked the content, you can also mention that. ??

Mojahid Islam

Senior Software Engineer at LinkedIn

4 å¹´

Excellent ,well explained ...

Dalip Singh Negi

Building Teams | Solving problems | Having fun

4 å¹´

Very nicely written

要查看或添加评论,请登录

Aayushi Khandelwal的更多文章

  • Java Concepts: What everyone knows Vs What you should know? - Part 3A

    Java Concepts: What everyone knows Vs What you should know? - Part 3A

    The most important Java concept that everyone should know as it is asked in almost all kind of technical interviews is…

    4 条评论
  • Java Concepts - What everyone knows vs What you should know? : Part 2

    Java Concepts - What everyone knows vs What you should know? : Part 2

    What a picture of identical giraffes in a Java blog doing? Don't worry, it is merely to remind you that in Java, there…

  • My journey with AntiPatterns - Part 2

    My journey with AntiPatterns - Part 2

    Welcome back to part 2 of my blog on AntiPatterns. If you have missed Part 1, I recommend you to spend 5 minutes…

    2 条评论
  • My Journey with Anti-Patterns - Part 1

    My Journey with Anti-Patterns - Part 1

    Hello everyone, this blog is about various Software Development AntiPatterns which came across as I wrote various…

    1 条评论
  • Azure Messaging Services

    Azure Messaging Services

    One of the most common requirements of a cloud-based application is to have the ability to interact and communicate…

  • Java Concepts - What everyone knows vs What you should know? : Part 1

    Java Concepts - What everyone knows vs What you should know? : Part 1

    Well, Java is a very popular programming language from the past 24 years. Yes, 24 years, can you believe it, as much as…

    5 条评论
  • Elastic Search - What? Why? How?

    Elastic Search - What? Why? How?

    WHAT THE HELL IT IS? Developed by Elastic Co. , Elastic Search is an Apache Lucene Based Search Engine.

社区洞察

其他会员也浏览了