Batch Import - Excel to Database

Batch Import - Excel to Database

Importing directly from an excel file might be tricky and most of the documentation is not that clear on step-by-step implementations, but, I have figured it out and I'm here to share it all with you guys. So, before we start with the implementation, let's create a skeleton Spring Boot project using Spring Initializr. For this project, we need Spring Batch, Spring Data JPA and MySQL Driver as dependencies which you can add through Spring Initializr. We also need to add Apache POI and Apache POI OOXML externally.

Once done, create a new class in the base package named BatchConfiguration and annotate it with @Configuration and @EnableBatchProcessing. This is the class when-in we'll be configuring everything w.r.t. batch import. You might want to do it differently and create different files w.r.t different beans, but for now, we'll go ahead this way for the sake of simplicity.

Let's define a path to the source file and make it constant. I've put the excel source file in the Resource directory, you might want to place it elsewhere:

private static final String PROPERTY_EXCEL_SOURCE_FILE_PATH = "path/to/sourceFile.xlsx";
No alt text provided for this image

As go through the snapshot of the sample excel file above, note that headers of the columns are in camel case so that it is identical to the properties of the entity. You might also notice that each column is of a different datatype. We have String, Integer, ENUM, MySQL Timestamp, Object Reference and Bit(1) to deal with respectively. This is to mimic real-life scenarios.

Now let's create an entity w.r.t. which we'll be importing the data:

@Getter
@Setter
@Entity
@NoArgsConstructor
@Accessors(chain = true)
@Table(name = "jobs")
public class JobEntity {

  @NotNull
  @ManyToOne(fetch = FetchType.EAGER)
  @JoinColumn(name = "currency")
  private Currency currency;

  @NotNull
  @Column(name = "created_at")
  private Date createdAt;

  @NotNull
  @Column(name = "is_enabled")
  private boolean enabled = true;

  @Id
  @NotBlank
  @Column(name = "job_id")
  private String jobId;

  @Column(name = "job_no")
  private int jobNo;

  @NotNull
  @Enumerated(EnumType.ORDINAL)
  private Source source;
}

The data in the excel file is mostly stored as String, Integer or Boolean, but here, we have other datatypes to map to. To achieve this Spring Boot provides something called ApplicationConversionService to register a custom converter which can be used as follows:

@Bean
FieldSetMapper<JobEntity> fieldSetMapper() {
    BeanWrapperFieldSetMapper<JobEntity> fieldSetMapper = new BeanWrapperFieldSetMapper<>();
    fieldSetMapper.setTargetType(JobEntity.class);
    fieldSetMapper.setConversionService(ApplicationConversionService.getSharedInstance());
    return fieldSetMapper;
}

As we'll be using RowMapper with Spring Batch Excel Extension instead of FlieldSetMapper, I have a different solution. Initially, we'll create a model-POJO that will be mapped with the columns in the excel file:

@Getter
@Setter
@NoArgsConstructor
public class JobModel {
  private String currency;
  private String createdAt;
  private Boolean enabled;
  private String jobId;
  private Integer jobNo;
  private Integer source;
}

Then we'll need a custom JobItemProcessor:

/**
 * This is the processor class where you can process validate the data that has to be imported.
 */
@Service
class JobItemProcessor implements ItemProcessor<JobModel, JobEntity> {

  @Autowired
  private CurrencyService currencyService;

  @Override
  public JobEntity process(final JobModel jobModel) {

    if (jobModel.getSource() == null || jobModel.getCurrency() == null || jobModel.getCreatedAt() == null || jobModel
            .getEnabled() == null || jobModel.getJobId() == null || jobModel.getJobNo() == null)
      throw new IllegalArgumentException("Parameters cannot be null");

    JobEntity jobEntity = new JobEntity();
    jobEntity.setJobId(jobModel.getJobId()).setJobNo(jobModel.getJobNo())
            .setCurrency(currencyService.get(jobModel.getCurrency())).setSource(Source.getSource(jobModel.getSource()))
            .setCreatedAt(new Date(Timestamp.valueOf(jobModel.getCreatedAt()).getTime()))
            .setEnabled(jobModel.getEnabled() == 1);

    return jobEntity;
  }

}

You might want to cache the get method from currencyService to save on database calls. Read Caching in Spring Boot for more information.

So to map each row from the excel file, we'll be using BeanWrapperRowMapper implementation of the RowMapper interface to perform the actual work of mapping each row to the JobModel:

import org.springframework.batch.extensions.excel.RowMapper;
import org.springframework.batch.extensions.excel.mapping.BeanWrapperRowMapper;

private RowMapper<JobModel> excelRowMapper() {
  BeanWrapperRowMapper<JobModel> rowMapper = new BeanWrapperRowMapper<>();
  rowMapper.setTargetType(JobModel.class);
  return rowMapper;
}

Now, for reading data from the excel file, we'll use a Spring Batch Extension containing an ItemReader Implementation for Excel based on Apache POI. It supports reading both XLS and XLSX files:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.extensions.excel.poi.PoiItemReader;


@Bean
ItemReader<JobModel> excelJobReader() {
  PoiItemReader<JobModel> reader = new PoiItemReader<>();
  reader.setLinesToSkip(1); // HEADERS
  reader.setResource(new ClassPathResource(PROPERTY_EXCEL_SOURCE_FILE_PATH));
  reader.setRowMapper(excelRowMapper());
  return reader;
}

The PoiItemReader has the most features but is also the most memory intensive and might lead to memory issues with large XLS(X) sheets.

To reduce the memory footprint the StreamingXlsxItemReader can be used, this will only keep the current row in memory and discard it afterwards. Not everything is supported while streaming the XLSX file. It can be that formulas don’t get evaluated or lead to an error.

The ItemReader classess are not threadsafe. The API from Apache POI itself isn’t threadsafe as well as the AbstractItemCountingItemStreamItemReader used as a base class for the ItemReader classes. Reading from multiple threads is therefore not supported. Using a multi-threaded processor/writer should work as long as you use a single thread for reading.

We also require a batch writer that will persist data to the database in bulk. We'll be using a general JDBCBatchItemWriter as follows:

@Bean
public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) {
  return new JdbcBatchItemWriterBuilder<JobEntity>()
          .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
          .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )")
          .dataSource(dataSource).build();
}

The only thing remaining now is to create a Step and a Batch Job:

@Bean
public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) {
  return new JdbcBatchItemWriterBuilder<JobEntity>()
          .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>())
          .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )")
          .dataSource(dataSource).build();
}

@Bean
Step excelFileToDatabaseStep(ItemReader<JobModel> excelJobReader, ItemWriter<JobEntity> excelJobWriter,
                             StepBuilderFactory stepBuilderFactory, JobItemProcessor jobItemProcessor) {
  return stepBuilderFactory.get("excelFileToDatabaseStep")
          .<JobModel, JobEntity>chunk(10000)
          .reader(excelJobReader)
          .processor(jobItemProcessor)
          .writer(excelJobWriter)
          .faultTolerant()
          .skip(IllegalArgumentException.class)
          .skipLimit(100)
          .build();
}

That's it. You can now use this to directly import data from an excel file to the database. Here are some properties that might come in handy.

# General
server.port=4000

# Data-Source configurations for Test purposes
spring.datasource.url=jdbc:mysql://host:3306/schema?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC
spring.datasource.username=
spring.datasource.password=
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.jpa.database-platform=org.hibernate.dialect.MySQL5Dialect

spring.batch.initialize-schema=always
spring.jpa.open-in-view=false

# DDL generation
spring.jpa.generate-ddl=true
spring.jpa.hibernate.ddl-auto=update

# Turn Statistics on
spring.jpa.properties.hibernate.generate_statistics=true
logging.level.org.hibernate.stat=debug
# Show all queries
spring.jpa.show-sql=true

spring.data.jpa.repositories.bootstrap-mode=default
spring.jpa.properties.hibernate.format_sql=true
logging.level.org.hibernate.type=trace

# Naming Strategy | Use What We Mention
spring.jpa.hibernate.naming.physical-strategy=org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl

# Each fetch of a lazy entity will open a temporary session
spring.jpa.properties.hibernate.enable_lazy_load_no_trans=true

A big thank you to every single person who reads and enjoys these articles, it really means a huge amount to me, personally. Your subscription would be hugely appreciated, it helps more than you could know. And thanks a ton for reading, I'll see you around.

Nathan Lively

Java + Spring Backend Developer ? Founder of Tracebook & SubAligner ? 789K podcast downloads ? 1.6M YouTube views

1 年

Hi Keshavram, this article helped me out a lot. Thanks! My spreadsheet has two pages. The second page had four columns with 22,000 rows that should eventually be stored as arrays. I'm wondering how this is handled. Does this batch process the way you have developed it import all sheets and all rows?

回复
Harish Rohokale

Leading Apptware with AI-first approach and design-driven philosophy.

3 年

Glad to see you learning something new in Apptware!

Harshal Shinde

Senior Software Developer at Trux Inc

3 年

Keshavram Kuduwa great to see you implement Spring Batch!

Meeta Sabnis

Lead - HR Strategy & Culture

3 年

Awesome hai! Waiting for more such informative articles. Keep them flowing. ??

要查看或添加评论,请登录

Keshavram Kuduwa的更多文章

社区洞察

其他会员也浏览了