登录查看更多内容

Batch Import - Excel to Database

Keshavram Kuduwa

Technical Lead at Apptware

发布日期: 2021年6月9日

Importing directly from an excel file might be tricky and most of the documentation is not that clear on step-by-step implementations, but, I have figured it out and I'm here to share it all with you guys. So, before we start with the implementation, let's create a skeleton Spring Boot project using Spring Initializr. For this project, we need Spring Batch, Spring Data JPA and MySQL Driver as dependencies which you can add through Spring Initializr. We also need to add Apache POI and Apache POI OOXML externally.

Once done, create a new class in the base package named BatchConfiguration and annotate it with @Configuration and @EnableBatchProcessing. This is the class when-in we'll be configuring everything w.r.t. batch import. You might want to do it differently and create different files w.r.t different beans, but for now, we'll go ahead this way for the sake of simplicity.

Let's define a path to the source file and make it constant. I've put the excel source file in the Resource directory, you might want to place it elsewhere:

private static final String PROPERTY_EXCEL_SOURCE_FILE_PATH = "path/to/sourceFile.xlsx";

As go through the snapshot of the sample excel file above, note that headers of the columns are in camel case so that it is identical to the properties of the entity. You might also notice that each column is of a different datatype. We have String, Integer, ENUM, MySQL Timestamp, Object Reference and Bit(1) to deal with respectively. This is to mimic real-life scenarios.

Now let's create an entity w.r.t. which we'll be importing the data:

@Getter
@Setter
@Entity
@NoArgsConstructor
@Accessors(chain = true)
@Table(name = "jobs")
public class JobEntity {

  @NotNull
  @ManyToOne(fetch = FetchType.EAGER)
  @JoinColumn(name = "currency")
  private Currency currency;

  @NotNull
  @Column(name = "created_at")
  private Date createdAt;

  @NotNull
  @Column(name = "is_enabled")
  private boolean enabled = true;

  @Id
  @NotBlank
  @Column(name = "job_id")
  private String jobId;

  @Column(name = "job_no")
  private int jobNo;

  @NotNull
  @Enumerated(EnumType.ORDINAL)
  private Source source;
}

The data in the excel file is mostly stored as String, Integer or Boolean, but here, we have other datatypes to map to. To achieve this Spring Boot provides something called ApplicationConversionService to register a custom converter which can be used as follows:

@Bean
FieldSetMapper<JobEntity> fieldSetMapper() {
    BeanWrapperFieldSetMapper<JobEntity> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
    fieldSetMapper.setTargetType(JobEntity.class);
    fieldSetMapper.setConversionService(ApplicationConversionService.getSharedInstance());
    return fieldSetMapper;
}

As we'll be using RowMapper with Spring Batch Excel Extension instead of FlieldSetMapper, I have a different solution. Initially, we'll create a model-POJO that will be mapped with the columns in the excel file:

@Getter
@Setter
@NoArgsConstructor
public class JobModel {
  private String currency;
  private String createdAt;
  private Boolean enabled;
  private String jobId;
  private Integer jobNo;
  private Integer source;
}

Then we'll need a custom JobItemProcessor:

/**
 * This is the processor class where you can process validate the data that has to be imported.
 */
@Service
class JobItemProcessor implements ItemProcessor<JobModel, JobEntity> {

  @Autowired
  private CurrencyService currencyService;

  @Override
  public JobEntity process(final JobModel jobModel) {

    if (jobModel.getSource() == null || jobModel.getCurrency() == null || jobModel.getCreatedAt() == null || jobModel
            .getEnabled() == null || jobModel.getJobId() == null || jobModel.getJobNo() == null)
      throw new IllegalArgumentException("Parameters cannot be null");

    JobEntity jobEntity = new JobEntity();
    jobEntity.setJobId(jobModel.getJobId()).setJobNo(jobModel.getJobNo())
            .setCurrency(currencyService.get(jobModel.getCurrency())).setSource(Source.getSource(jobModel.getSource()))
            .setCreatedAt(new Date(Timestamp.valueOf(jobModel.getCreatedAt()).getTime()))
            .setEnabled(jobModel.getEnabled() == 1);

    return jobEntity;
  }

}

You might want to cache the get method from currencyService to save on database calls. Read Caching in Spring Boot for more information.

So to map each row from the excel file, we'll be using BeanWrapperRowMapper implementation of the RowMapper interface to perform the actual work of mapping each row to the JobModel:

import org.springframework.batch.extensions.excel.RowMapper;
import org.springframework.batch.extensions.excel.mapping.BeanWrapperRowMapper;

private RowMapper<JobModel> excelRowMapper() {
  BeanWrapperRowMapper<JobModel> rowMapper = new BeanWrapperRowMapper<>();
  rowMapper.setTargetType(JobModel.class);
  return rowMapper;
}

Now, for reading data from the excel file, we'll use a Spring Batch Extension containing an ItemReader Implementation for Excel based on Apache POI. It supports reading both XLS and XLSX files:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.extensions.excel.poi.PoiItemReader;


@Bean
ItemReader<JobModel> excelJobReader() {
  PoiItemReader<JobModel> reader = new PoiItemReader<>();
  reader.setLinesToSkip(1); // HEADERS
  reader.setResource(new ClassPathResource(PROPERTY_EXCEL_SOURCE_FILE_PATH));
  reader.setRowMapper(excelRowMapper());
  return reader;
}

The PoiItemReader has the most features but is also the most memory intensive and might lead to memory issues with large XLS(X) sheets.

To reduce the memory footprint the StreamingXlsxItemReader can be used, this will only keep the current row in memory and discard it afterwards. Not everything is supported while streaming the XLSX file. It can be that formulas don’t get evaluated or lead to an error.

The ItemReader classess are not threadsafe. The API from Apache POI itself isn’t threadsafe as well as the AbstractItemCountingItemStreamItemReader used as a base class for the ItemReader classes. Reading from multiple threads is therefore not supported. Using a multi-threaded processor/writer should work as long as you use a single thread for reading.

We also require a batch writer that will persist data to the database in bulk. We'll be using a general JDBCBatchItemWriter as follows:

@Bean
public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) {
  return new JdbcBatchItemWriterBuilder<JobEntity>()
          .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
          .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )")
          .dataSource(dataSource).build();

The only thing remaining now is to create a Step and a Batch Job:

@Bean
public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) {
  return new JdbcBatchItemWriterBuilder<JobEntity>()
          .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
          .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )")
          .dataSource(dataSource).build();
}

@Bean
Step excelFileToDatabaseStep(ItemReader<JobModel> excelJobReader, ItemWriter<JobEntity> excelJobWriter,
                             StepBuilderFactory stepBuilderFactory, JobItemProcessor jobItemProcessor) {
  return stepBuilderFactory.get("excelFileToDatabaseStep")
          .<JobModel, JobEntity>chunk(10000)
          .reader(excelJobReader)
          .processor(jobItemProcessor)
          .writer(excelJobWriter)
          .faultTolerant()
          .skip(IllegalArgumentException.class)
          .skipLimit(100)
          .build();

That's it. You can now use this to directly import data from an excel file to the database. Here are some properties that might come in handy.

# General
server.port=4000

# Data-Source configurations for Test purposes
spring.datasource.url=jdbc:mysql://host:3306/schema?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC
spring.datasource.username=
spring.datasource.password=
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.jpa.database-platform=org.hibernate.dialect.MySQL5Dialect

spring.batch.initialize-schema=always
spring.jpa.open-in-view=false

# DDL generation
spring.jpa.generate-ddl=true
spring.jpa.hibernate.ddl-auto=update

# Turn Statistics on
spring.jpa.properties.hibernate.generate_statistics=true
logging.level.org.hibernate.stat=debug
# Show all queries
spring.jpa.show-sql=true

spring.data.jpa.repositories.bootstrap-mode=default
spring.jpa.properties.hibernate.format_sql=true
logging.level.org.hibernate.type=trace

# Naming Strategy | Use What We Mention
spring.jpa.hibernate.naming.physical-strategy=org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl

# Each fetch of a lazy entity will open a temporary session
spring.jpa.properties.hibernate.enable_lazy_load_no_trans=true

A big thank you to every single person who reads and enjoys these articles, it really means a huge amount to me, personally. Your subscription would be hugely appreciated, it helps more than you could know. And thanks a ton for reading, I'll see you around.

Nathan Lively

Java + Spring Backend Developer ? Founder of Tracebook & SubAligner ? 789K podcast downloads ? 1.6M YouTube views

1 年

Hi Keshavram, this article helped me out a lot. Thanks! My spreadsheet has two pages. The second page had four columns with 22,000 rows that should eventually be stored as arrays. I'm wondering how this is handled. Does this batch process the way you have developed it import all sheets and all rows?

Harish Rohokale

Leading Apptware with AI-first approach and design-driven philosophy.

3 年

Glad to see you learning something new in Apptware!

1 次回应

Harshal Shinde

Senior Software Developer at Trux Inc

3 年

Keshavram Kuduwa great to see you implement Spring Batch!

1 次回应

Meeta Sabnis

Lead - HR Strategy & Culture

3 年

Awesome hai! Waiting for more such informative articles. Keep them flowing. ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Keshavram Kuduwa的更多文章

Building a Serverless AI-Powered Q&A App with React & Transformers

2025年2月20日

Building a Serverless AI-Powered Q&A App with React & Transformers

With the latest mini-models being released now and then, AI-powered operations can happen directly in your browser…

3 条评论
Change Password Encoder in Spring Security with Backwards Compatibility

2024年8月16日

Change Password Encoder in Spring Security with Backwards Compatibility

Let’s talk about something that every developer has to deal with sooner or later: password security. Specifically, I…
Using Java Records with Lombok's @Builder.Default: A Guide

2024年7月21日

Using Java Records with Lombok's @Builder.Default: A Guide

Java records are a powerful feature introduced in Java 14 that provides a compact syntax for creating immutable data…
Dev-2-Deployment: Spring Cloud Config using GitLab with Spring Boot

2022年9月11日

Dev-2-Deployment: Spring Cloud Config using GitLab with Spring Boot

This article includes: Setting up Spring Cloud Config using GitLab Private Repository with .ssh Auth.

3 条评论
Minute-Read: Local Variable Type Inference

2022年4月5日

Minute-Read: Local Variable Type Inference

Quick Note: If you already have some good knowledge of this topic, you might understand that local variable…

2 条评论
Minute-Read: Understanding False Sharing in Java

2021年12月21日

Minute-Read: Understanding False Sharing in Java

The structure of caches inside the modern CPU has a drawback which is precisely called false sharing. But what exactly…
Minute-Read: Configuring Application Properties from the Database

2021年9月16日

Minute-Read: Configuring Application Properties from the Database

It's not recommended to fill application.properties file with string constants that you want to access globally just…

1 条评论
Minute-Read: Concurrency with Spring Security

2021年8月23日

Minute-Read: Concurrency with Spring Security

In most environments, Security is stored on a per Thread basis. This means that when work is done on a new Thread, the…
Minute-Read: Interceptor in Spring Boot

2021年7月26日

Minute-Read: Interceptor in Spring Boot

Before understanding what an interceptor is, let's take a step back and get to know HandlerMapping. HandlerMapping maps…
Minute-Read: Deploying Serverless Site to Cloudflare Workers

2021年7月19日

Minute-Read: Deploying Serverless Site to Cloudflare Workers

So, you might have heard or worked on Google Cloud Hosting, AWS Hosting, or even Firebase Hosting, but I thought I'll…

See all articles

Batch Import - Excel to Database

Keshavram Kuduwa

Technical Lead at Apptware

Keshavram Kuduwa的更多文章

社区洞察

其他会员也浏览了

Introduction to T-SQL: Career Opportunities & Applications

Mastering Bulk Collect and FORALL for Performance Optimization in PL/SQL

Exploring SQL: Definition, Benefits, and Practical Applications

SQL Queries Quickstart: Part III

Mastering Joins in T-SQL: Types, Techniques, and Troubleshooting

A Simple Crud on Rust (With Rocket.rs and Diesel.rs)

SQL Queries Quickstart: Part I

Demonstrating User Written #IBMi SQL Function in C

Why .NET with SQL Server is the Ideal Choice for API Development: A Powerful Combination for Modern Applications

SQL CTE (Common Table Expression)

Keshavram Kuduwa的更多文章

Building a Serverless AI-Powered Q&A App with React & Transformers

Change Password Encoder in Spring Security with Backwards Compatibility

Using Java Records with Lombok's @Builder.Default: A Guide

Dev-2-Deployment: Spring Cloud Config using GitLab with Spring Boot

Minute-Read: Local Variable Type Inference

Minute-Read: Understanding False Sharing in Java

Minute-Read: Configuring Application Properties from the Database

Minute-Read: Concurrency with Spring Security

Minute-Read: Interceptor in Spring Boot

Minute-Read: Deploying Serverless Site to Cloudflare Workers

社区洞察

其他会员也浏览了

Introduction to T-SQL: Career Opportunities & Applications

Mastering Bulk Collect and FORALL for Performance Optimization in PL/SQL

Exploring SQL: Definition, Benefits, and Practical Applications

SQL Queries Quickstart: Part III

Mastering Joins in T-SQL: Types, Techniques, and Troubleshooting

A Simple Crud on Rust (With Rocket.rs and Diesel.rs)

SQL Queries Quickstart: Part I

Demonstrating User Written #IBMi SQL Function in C

Why .NET with SQL Server is the Ideal Choice for API Development: A Powerful Combination for Modern Applications

SQL CTE (Common Table Expression)