登录查看更多内容

JAVA-TRICK-12: Best Practices for Working with Large Datasets in Java

Chamseddine Toujani

Senior Technical Consultant

发布日期: 2025年2月6日

Recently, I faced a task to calculate running balances for all accounts, each with over 10,000 transaction records. After extensive research and development, I discovered several best practices for handling large datasets effectively using Java Spring. This article shares those insights to help you optimize your Spring applications for similar challenges.

Introduction

In today’s data-driven world, CBS (core banking system) applications often need to process and analyze vast amounts of data. Managing large datasets requires careful consideration of performance, memory usage, and scalability. Java Spring offers robust solutions to address these challenges, allowing developers to build efficient applications that can handle significant data loads.

Tips for Managing Large Datasets in Java Spring

1.Optimize Loops

The result can be influenced heavily by the iterative processes hence it is paramount to ensure they are optimized and, in most cases, the use of optimized code and enhanced for loops will suffice.

Example:

// Inefficient loop
for (int i = 0; i < list.size(); i++) {
    process(list.get(i));
}// Optimized loop
int size = list.size();
for (int i = 0; i < size; i++) {
    process(list.get(i));
}

Explanation: In the first example, list.size() has been invoked with every cycle of the loop which can be expensive. The second example solves this problem by making sure that the size of the object is placed outside of the loop.

2.Use Optional Carefully

Using Optional to handle nullable values is useful but avoid using it in collections or as fields in data classes, as it adds overhead.

// Avoid
private Optional<String> name;  // Creates unnecessary wrapping objects.// Optimize
// Use null checks instead or initialize with default values.
private String name;

3.Use Batch Processing

If you’re dealing with large datasets (e.g., importing/exporting data), process them in batches to reduce memory consumption you can add springBatch our make it in this way very easily.

@Transactional
public void importEmployees(List<Employee> employees) {
int size = employees.size();
    for (int i = 0; i < size ; i += 100) {
//this is great make very simple batch
        List<Employee> batch = employees.subList(i, Math.min(i + 100, employees.size()));
        employeeRepository.saveAll(batch);
        employeeRepository.flush();  // Clear persistence context to free memory.
    }
}

4.Stream API Best Practices

When using Java Streams, avoid creating large, intermediate collections. Use lazy evaluation and terminal operations wisely.

// Avoid
List<String> names = employees.stream()
                              .filter(e -> e.getAge() > 30)
                              .map(Employee::getName)
                              .collect(Collectors.toList()); 
 // Collects intermediate results to a new List// Optimize
employees.stream()
         .filter(e -> e.getAge() > 30)
         .map(Employee::getName)
         .forEach(System.out::println);  // Directly use forEach to avoid extra memory allocation.

5. Use Efficient Data Structures

Choose the right data structure for your needs. For example:
Use ArrayList over LinkedList if you require random access, as ArrayList uses less memory per element.
Use HashMap with proper initial size to avoid resizing during runtime.
Use primitive data types instead of their wrapper classes when possible (e.g., int instead of Integer).

Example:

// Avoid
List<Integer> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
    list.add(i);  // Autoboxing adds unnecessary overhead.
}
// Use primitive arrays
int[] arr = new int[10000];  // Avoids autoboxing
for (int i = 0; i < arr.length; i++) {
    arr[i] = i;
}

// Estimate the initial capacity based on expected number of entries
 int expectedEntries = 10000;
 float loadFactor = 0.75f;
 int initialCapacity = (int) (expectedEntries / loadFactor + 1);
// Create a HashMap with the calculated initial capacity
  HashMap<String, Integer> accountBalances = new HashMap<>(initialCapacity);

领英推荐

Latest Spring Framework Updates.

developrec 9 个月前

Java Streams: Transforming the Way We Handle Data

Java R&D Pvt. Ltd. 10 个月前

Message-Oriented Middle-ware, JMS, Database…

Code Graphers 1 年前

6.Avoid Unnecessary Object Creation

Reuse objects wherever possible instead of creating new instances repeatedly. For example, avoid using new inside loops.

// Avoid
for (int i = 0; i < 1000; i++) {
    String result = new String("Result");  // Creates a new String object in each iteration.
}
// Optimize
String result = "Result";  // Reuse the same object (string literals are interned in Java).
for (int i = 0; i < 1000; i++) {
    // Use 'result' without creating a new instance.
}

7. Use Parallel Streams and Fork/Join Framework

Implementing Fork/Join and parallel streams in Java for CPU-bound execution can accelerate the process by making use of the multiple cores available.

Example:

// Sequential stream
list.stream().forEach(this::process);
// Parallel stream
list.parallelStream().forEach(this::process);

Explanation: The parallel stream makes use of multiple threads in the data processing which leads to faster execution on multi-core processors. The caution here is that parallel streams should not be overused, especially concerning overhead due to thread management.

8.Tune the JVM Garbage Collector

Garbage collection in Java is good but to a limit, and elements need to be tuned well, otherwise, performance is hampered II: An appropriate set of JVM options such as –xms and -xmx can be employed to define the initial and maximum heaps, and also for deploying a garbage collector which suits an application such as G1GC, or ZGC.

Example:

# JVM options for GC tuning
java -Xms1g -Xmx2g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -jar myapp.jar

Explanation: It has been found that poor performance of controlled allocation followed by garbage collection is a result of tolerating uncontrolled allocation. As is now evident, if the allocator fits the application well, then over a usage period the expected average will be close. Reducing GC pauses and optimizing memory usage for the application at hand can be beneficial.

9.Enable Garbage Collection Logs

Enable GC logs in production to monitor memory usage and detect if excessive memory is being consumed or freed frequently.

-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps

10. Avoid Memory Leaks in Long-Lived Objects

Be cautious with static fields or long-lived objects holding large references. These objects may not be garbage collected, leading to memory leaks.

// Avoid
public class Cache {
    private static List<Employee> employeeCache = new ArrayList<>();
}
// Optimize
public class Cache {
    private static WeakHashMap<String, Employee> employeeCache = new WeakHashMap<>();  // Using Weak References
}

11.Avoid Full Object Serialization

When serializing objects (e.g., for session persistence or caching), avoid serializing unnecessary fields by marking them as transient.

public class Employee implements Serializable {
    private String name;
    private transient int salary;  // 'salary' will not be serialized
}

Conclusion

Handling large datasets efficiently is crucial for building robust and scalable applications. By leveraging Java Spring’s capabilities, such as optimized queries, caching, and batch processing, you can significantly improve performance and manageability. Additionally, using techniques like setting an appropriate initial size for data structures, such as HashMap, helps avoid unnecessary overhead during runtime. Implementing these best practices ensures your applications are well-equipped to handle large volumes of data efficiently .

要查看或添加评论，请登录

Chamseddine Toujani的更多文章

JAVA-TRICK-14: Java Comparable interface

2025年2月8日

JAVA-TRICK-14: Java Comparable interface

The Java language frequently operates on objects, and often, these objects need to be ordered in some way, such as by…
JAVA-TRICK-13: Java 8 vs Java 11 vs Java 17 vs Java 21: A Comprehensive Comparison

2025年2月6日

JAVA-TRICK-13: Java 8 vs Java 11 vs Java 17 vs Java 21: A Comprehensive Comparison

Java continues to evolve, with each new version bringing enhancements, deprecations, and new features. In this article,…
JAVA-TRICK-11: Java Records vs Regular DTO Classes: When to Use What?

2025年2月5日

JAVA-TRICK-11: Java Records vs Regular DTO Classes: When to Use What?

In modern Java applications, Data Transfer Objects (DTOs) are commonly used to carry data between layers such as…
JAVA-TRICK-10: Collections : ArrayList,LinkedList,HashMap,LinkedHashMap,HashSet,TreeMap,TreeSet,Max-PriorityQueue, Min-Priority Queue

2025年2月5日

JAVA-TRICK-10: Collections : ArrayList,LinkedList,HashMap,LinkedHashMap,HashSet,TreeMap,TreeSet,Max-PriorityQueue, Min-Priority Queue

Its very important that you know how can we use the above data structures in Java , Just having theoretical knowledge…
JAVA-TRICK-9: Scheduling Tasks in Spring Boot

2025年2月5日

JAVA-TRICK-9: Scheduling Tasks in Spring Boot

Spring Boot provides a robust scheduling mechanism using the annotation, making it simple to execute tasks at fixed…
JAVA-TRICK-8: Java try-with-resources

2025年2月4日

JAVA-TRICK-8: Java try-with-resources

try-with-resources is a powerful feature got introduced as part of Java SE 7 which simplifies resource management and…
JAVA-TRICK-7: Java Sealed Classes

2025年2月4日

JAVA-TRICK-7: Java Sealed Classes

Sealed classes is a powerful feature got introduced as part of Java SE 15. Its a new way to control inheritance.
JAVA-TRICK-6: Mastering Streams in JAVA?

2025年2月4日

JAVA-TRICK-6: Mastering Streams in JAVA?

1. What and Why do we need Streams ? Streams in Java were introduced in Java 8 as part of the Java Stream API…
JAVA-TRICK-5: Can a Constructor Be Synchronized?

2025年2月4日

JAVA-TRICK-5: Can a Constructor Be Synchronized?

Question: Is it possible to declare a constructor as ? Explanation: No, constructors cannot be synchronized because…
JAVA-TRICK-4: What Happens If You Call Wait() Outside a Synchronized Block?

2025年2月4日

JAVA-TRICK-4: What Happens If You Call Wait() Outside a Synchronized Block?

Question: What exception is thrown if is called outside a synchronized block? Explanation: Calling outside a…

See all articles

JAVA-TRICK-12: Best Practices for Working with Large Datasets in Java

Chamseddine Toujani

Senior Technical Consultant

Introduction

Tips for Managing Large Datasets in Java Spring

1.Optimize Loops

2.Use Optional Carefully

3.Use Batch Processing

4.Stream API Best Practices

5. Use Efficient Data Structures

领英推荐

6.Avoid Unnecessary Object Creation

7. Use Parallel Streams and Fork/Join Framework

8.Tune the JVM Garbage Collector

9.Enable Garbage Collection Logs

10. Avoid Memory Leaks in Long-Lived Objects

11.Avoid Full Object Serialization

Conclusion

Chamseddine Toujani的更多文章

社区洞察

其他会员也浏览了

Mastering the Building Blocks: A Guide to Data Structures Libraries in Java.

"Java Stream API: Simplifying Data Processing and Manipulation"

What Are the New Features of SpringBoot3 ?

Research Insights On JVM (Java Virtual Machine)

[VV93] Java 23 println, PreparedStatement, Async Method Invocation pattern, Kafka heartbeat, TS ?? operator, dev types, conference tips, squeaky chair

Day 3 : Variables, Data Types and Operators

Getting Started with Azure Cosmos DB in Java: A Comprehensive Guide

What is JDBC in Java?

JPA/Hibernate: Transactions and Concurrency

Mastering Spring Boot: Performance Tuning and Scalability Strategies for Enterprise-Level Applications

Introduction

Tips for Managing Large Datasets in Java Spring

1.Optimize Loops

2.Use Optional Carefully

3.Use Batch Processing

4.Stream API Best Practices

5. Use Efficient Data Structures

领英推荐

6.Avoid Unnecessary Object Creation

7. Use Parallel Streams and Fork/Join Framework

8.Tune the JVM Garbage Collector

9.Enable Garbage Collection Logs

10. Avoid Memory Leaks in Long-Lived Objects

11.Avoid Full Object Serialization

Conclusion

Chamseddine Toujani的更多文章

JAVA-TRICK-14: Java Comparable interface

JAVA-TRICK-13: Java 8 vs Java 11 vs Java 17 vs Java 21: A Comprehensive Comparison

JAVA-TRICK-11: Java Records vs Regular DTO Classes: When to Use What?

JAVA-TRICK-10: Collections : ArrayList,LinkedList,HashMap,LinkedHashMap,HashSet,TreeMap,TreeSet,Max-PriorityQueue, Min-Priority Queue

JAVA-TRICK-9: Scheduling Tasks in Spring Boot

JAVA-TRICK-8: Java try-with-resources

JAVA-TRICK-7: Java Sealed Classes

JAVA-TRICK-6: Mastering Streams in JAVA?

JAVA-TRICK-5: Can a Constructor Be Synchronized?

JAVA-TRICK-4: What Happens If You Call Wait() Outside a Synchronized Block?

社区洞察

其他会员也浏览了

Mastering the Building Blocks: A Guide to Data Structures Libraries in Java.

"Java Stream API: Simplifying Data Processing and Manipulation"

What Are the New Features of SpringBoot3 ?

Research Insights On JVM (Java Virtual Machine)

[VV93] Java 23 println, PreparedStatement, Async Method Invocation pattern, Kafka heartbeat, TS ?? operator, dev types, conference tips, squeaky chair

Day 3 : Variables, Data Types and Operators

Getting Started with Azure Cosmos DB in Java: A Comprehensive Guide

What is JDBC in Java?

JPA/Hibernate: Transactions and Concurrency

Mastering Spring Boot: Performance Tuning and Scalability Strategies for Enterprise-Level Applications