Batch Import - Excel to Database
Importing directly from an excel file might be tricky and most of the documentation is not that clear on step-by-step implementations, but, I have figured it out and I'm here to share it all with you guys. So, before we start with the implementation, let's create a skeleton Spring Boot project using Spring Initializr. For this project, we need Spring Batch, Spring Data JPA and MySQL Driver as dependencies which you can add through Spring Initializr. We also need to add Apache POI and Apache POI OOXML externally.
Once done, create a new class in the base package named BatchConfiguration and annotate it with @Configuration and @EnableBatchProcessing. This is the class when-in we'll be configuring everything w.r.t. batch import. You might want to do it differently and create different files w.r.t different beans, but for now, we'll go ahead this way for the sake of simplicity.
Let's define a path to the source file and make it constant. I've put the excel source file in the Resource directory, you might want to place it elsewhere:
private static final String PROPERTY_EXCEL_SOURCE_FILE_PATH = "path/to/sourceFile.xlsx";
As go through the snapshot of the sample excel file above, note that headers of the columns are in camel case so that it is identical to the properties of the entity. You might also notice that each column is of a different datatype. We have String, Integer, ENUM, MySQL Timestamp, Object Reference and Bit(1) to deal with respectively. This is to mimic real-life scenarios.
Now let's create an entity w.r.t. which we'll be importing the data:
@Getter @Setter @Entity @NoArgsConstructor @Accessors(chain = true) @Table(name = "jobs") public class JobEntity { @NotNull @ManyToOne(fetch = FetchType.EAGER) @JoinColumn(name = "currency") private Currency currency; @NotNull @Column(name = "created_at") private Date createdAt; @NotNull @Column(name = "is_enabled") private boolean enabled = true; @Id @NotBlank @Column(name = "job_id") private String jobId; @Column(name = "job_no") private int jobNo; @NotNull @Enumerated(EnumType.ORDINAL) private Source source; }
The data in the excel file is mostly stored as String, Integer or Boolean, but here, we have other datatypes to map to. To achieve this Spring Boot provides something called ApplicationConversionService to register a custom converter which can be used as follows:
@Bean FieldSetMapper<JobEntity> fieldSetMapper() { BeanWrapperFieldSetMapper<JobEntity> fieldSetMapper = new BeanWrapperFieldSetMapper<>(); fieldSetMapper.setTargetType(JobEntity.class); fieldSetMapper.setConversionService(ApplicationConversionService.getSharedInstance()); return fieldSetMapper; }
As we'll be using RowMapper with Spring Batch Excel Extension instead of FlieldSetMapper, I have a different solution. Initially, we'll create a model-POJO that will be mapped with the columns in the excel file:
@Getter @Setter @NoArgsConstructor public class JobModel { private String currency; private String createdAt; private Boolean enabled; private String jobId; private Integer jobNo; private Integer source; }
Then we'll need a custom JobItemProcessor:
/** * This is the processor class where you can process validate the data that has to be imported. */ @Service class JobItemProcessor implements ItemProcessor<JobModel, JobEntity> { @Autowired private CurrencyService currencyService; @Override public JobEntity process(final JobModel jobModel) { if (jobModel.getSource() == null || jobModel.getCurrency() == null || jobModel.getCreatedAt() == null || jobModel .getEnabled() == null || jobModel.getJobId() == null || jobModel.getJobNo() == null) throw new IllegalArgumentException("Parameters cannot be null"); JobEntity jobEntity = new JobEntity(); jobEntity.setJobId(jobModel.getJobId()).setJobNo(jobModel.getJobNo()) .setCurrency(currencyService.get(jobModel.getCurrency())).setSource(Source.getSource(jobModel.getSource())) .setCreatedAt(new Date(Timestamp.valueOf(jobModel.getCreatedAt()).getTime())) .setEnabled(jobModel.getEnabled() == 1); return jobEntity; } }
You might want to cache the get method from currencyService to save on database calls. Read Caching in Spring Boot for more information.
So to map each row from the excel file, we'll be using BeanWrapperRowMapper implementation of the RowMapper interface to perform the actual work of mapping each row to the JobModel:
import org.springframework.batch.extensions.excel.RowMapper; import org.springframework.batch.extensions.excel.mapping.BeanWrapperRowMapper; private RowMapper<JobModel> excelRowMapper() { BeanWrapperRowMapper<JobModel> rowMapper = new BeanWrapperRowMapper<>(); rowMapper.setTargetType(JobModel.class); return rowMapper; }
Now, for reading data from the excel file, we'll use a Spring Batch Extension containing an ItemReader Implementation for Excel based on Apache POI. It supports reading both XLS and XLSX files:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.extensions.excel.poi.PoiItemReader; @Bean ItemReader<JobModel> excelJobReader() { PoiItemReader<JobModel> reader = new PoiItemReader<>(); reader.setLinesToSkip(1); // HEADERS reader.setResource(new ClassPathResource(PROPERTY_EXCEL_SOURCE_FILE_PATH)); reader.setRowMapper(excelRowMapper()); return reader; }
The PoiItemReader has the most features but is also the most memory intensive and might lead to memory issues with large XLS(X) sheets.
To reduce the memory footprint the StreamingXlsxItemReader can be used, this will only keep the current row in memory and discard it afterwards. Not everything is supported while streaming the XLSX file. It can be that formulas don’t get evaluated or lead to an error.
The ItemReader classess are not threadsafe. The API from Apache POI itself isn’t threadsafe as well as the AbstractItemCountingItemStreamItemReader used as a base class for the ItemReader classes. Reading from multiple threads is therefore not supported. Using a multi-threaded processor/writer should work as long as you use a single thread for reading.
We also require a batch writer that will persist data to the database in bulk. We'll be using a general JDBCBatchItemWriter as follows:
@Bean public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) { return new JdbcBatchItemWriterBuilder<JobEntity>() .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>()) .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )") .dataSource(dataSource).build();
}
The only thing remaining now is to create a Step and a Batch Job:
@Bean public JdbcBatchItemWriter<JobEntity> excelJobWriter(DataSource dataSource) { return new JdbcBatchItemWriterBuilder<JobEntity>() .itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>()) .sql("INSERT INTO MyTable ( currency, created_at, is_enabled, job_id, job_no, source ) VALUES ( :currency, :createdAt, :enabled, :jobId, :jobNo, :source )") .dataSource(dataSource).build(); } @Bean Step excelFileToDatabaseStep(ItemReader<JobModel> excelJobReader, ItemWriter<JobEntity> excelJobWriter, StepBuilderFactory stepBuilderFactory, JobItemProcessor jobItemProcessor) { return stepBuilderFactory.get("excelFileToDatabaseStep") .<JobModel, JobEntity>chunk(10000) .reader(excelJobReader) .processor(jobItemProcessor) .writer(excelJobWriter) .faultTolerant() .skip(IllegalArgumentException.class) .skipLimit(100) .build();
}
That's it. You can now use this to directly import data from an excel file to the database. Here are some properties that might come in handy.
# General server.port=4000 # Data-Source configurations for Test purposes spring.datasource.url=jdbc:mysql://host:3306/schema?useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=UTC spring.datasource.username= spring.datasource.password= spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver spring.jpa.database-platform=org.hibernate.dialect.MySQL5Dialect spring.batch.initialize-schema=always spring.jpa.open-in-view=false # DDL generation spring.jpa.generate-ddl=true spring.jpa.hibernate.ddl-auto=update # Turn Statistics on spring.jpa.properties.hibernate.generate_statistics=true logging.level.org.hibernate.stat=debug # Show all queries spring.jpa.show-sql=true spring.data.jpa.repositories.bootstrap-mode=default spring.jpa.properties.hibernate.format_sql=true logging.level.org.hibernate.type=trace # Naming Strategy | Use What We Mention spring.jpa.hibernate.naming.physical-strategy=org.hibernate.boot.model.naming.PhysicalNamingStrategyStandardImpl # Each fetch of a lazy entity will open a temporary session spring.jpa.properties.hibernate.enable_lazy_load_no_trans=true
A big thank you to every single person who reads and enjoys these articles, it really means a huge amount to me, personally. Your subscription would be hugely appreciated, it helps more than you could know. And thanks a ton for reading, I'll see you around.
Java + Spring Backend Developer ? Founder of Tracebook & SubAligner ? 789K podcast downloads ? 1.6M YouTube views
1 年Hi Keshavram, this article helped me out a lot. Thanks! My spreadsheet has two pages. The second page had four columns with 22,000 rows that should eventually be stored as arrays. I'm wondering how this is handled. Does this batch process the way you have developed it import all sheets and all rows?
Leading Apptware with AI-first approach and design-driven philosophy.
3 年Glad to see you learning something new in Apptware!
Senior Software Developer at Trux Inc
3 年Keshavram Kuduwa great to see you implement Spring Batch!
Lead - HR Strategy & Culture
3 年Awesome hai! Waiting for more such informative articles. Keep them flowing. ??