Reading .csv files with and without headers using different delimiters in Mule 4
Pranav Davar
MuleSoft Ambassador | MuleSoft Bangalore Meetup Leader | MuleSoft | DevOps| Azure |Kubernetes| APIM | Middleware | Oracle Integration Cloud | Oracle SOA/OSB 12c
Flat files especially .csv files are very common for data loading and data transfers. We generally come across scenarios where we want to read a .csv file, which may or may not contain the header row. Also, there is a chance .csv files don't have ','(comma) as delimiter rather has ''|"(pipe) as a delimiter or any other character as a delimiter. This article will discuss various ways to read such files in Mule 4 and create a JSON output of it.
Mule Flow
This will be a simple flow, where we will be reading a file using a file connector. Then convert the read data to JSON using transform message. Finally log the JSON payload to console using Logger. Here, we are not doing any complex transformation, it will be more like a MIME type conversion from application/csv to application/json, for better understanding the structure of input data.
Steps:
.csv file having header row in the file content
Let us take below employeeData.csv file as an example. For simplicity, we are taking dummy data of only 5 employees.
id,first name,last name,email,gender,designation
1,Dom,Lissett,[email protected],Male,Recruiter
2,Susan,Moogan,[email protected],Female,Executive Secretary
3,Carl,Cotesford,[email protected],Male,Tech Architect
4,Debor,Sprakes,[email protected],Female,Account Coordinator
5,Priscella,Genny,[email protected],Female,Environmental Tech
In this example, data in the first row signifies the headers separated by "," as a delimiter. Rest all other rows represent employee data.
INFO? 2021-07-27 23:08:55,916 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 895d1200-efb1-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
? {
? ? "id": "1",
? ? "first name": "Dom",
? ? "last name": "Lissett",
? ? "email": "[email protected]",
? ? "gender": "Male",
? ? "designation": "Recruiter"
? },
? {
? ? "id": "2",
? ? "first name": "Susan",
? ? "last name": "Moogan",
? ? "email": "[email protected]",
? ? "gender": "Female",
? ? "designation": "Executive Secretary"
? },
? {
? ? "id": "3",
? ? "first name": "Carl",
? ? "last name": "Cotesford",
? ? "email": "[email protected]",
? ? "gender": "Male",
? ? "designation": "Tech Architect"
? },
? {
? ? "id": "4",
? ? "first name": "Debor",
? ? "last name": "Sprakes",
? ? "email": "[email protected]",
? ? "gender": "Female",
? ? "designation": "Account Coordinator"
? },
? {
? ? "id": "5",
? ? "first name": "Priscella",
? ? "last name": "Genny",
? ? "email": "[email protected]",
? ? "gender": "Female",
? ? "designation": "Environmental Tech"
? }
]
Data in the first row became the keys for the JSON output.
.csv file without header row in the file content
Let us assume there is no header row in the content of employeeData.csv
1,Dom,Lissett,[email protected],Male,Recruiter
2,Susan,Moogan,[email protected],Female,Executive Secretary
3,Carl,Cotesford,[email protected],Male,Tech Architect
4,Debor,Sprakes,[email protected],Female,Account Coordinator
5,Priscella,Genny,[email protected],Female,Environmental Tech
If we try to run with the same configurations, it will take the first row as keys and console output will look like.
领英推荐
INFO? 2021-07-27 23:21:15,609 [[MuleRuntime].uber.03: [csv-read].csv-readFlow.CPU_INTENSIVE @1e42a374] [processor: csv-readFlow/processors/1; event: 4254cbd0-efb3-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
? {
? ? "1": "2",
? ? "Dom": "Susan",
? ? "Lissett": "Moogan",
? ? "[email protected]": "[email protected]",
? ? "Male": "Female",
? ? "Recruiter": "Executive Secretary"
? },
? {
? ? "1": "3",
? ? "Dom": "Carl",
? ? "Lissett": "Cotesford",
? ? "[email protected]": "[email protected]",
? ? "Male": "Male",
? ? "Recruiter": "Tech Architect"
? },
? {
? ? "1": "4",
? ? "Dom": "Debor",
? ? "Lissett": "Sprakes",
? ? "[email protected]": "[email protected]",
? ? "Male": "Female",
? ? "Recruiter": "Account Coordinator"
? },
? {
? ? "1": "5",
? ? "Dom": "Priscella",
? ? "Lissett": "Genny",
? ? "[email protected]": "[email protected]",
? ? "Male": "Female",
? ? "Recruiter": "Environmental Tech"
? }
]
By default, after setting MIME type to "application/json", mule considers the first row as the header row. In such cases, we don't want to consider the first row as the header row.
Now, by using the same file, we will be able to read data properly. But in this case, autogenerated values for keys like coloumn_0, column_1, etc. will be populated. Below is a sample log snippet.
INFO? 2021-07-27 23:35:57,158 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 4fbf0590-efb5-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
? {
? ? "column_0": "1",
? ? "column_1": "Dom",
? ? "column_2": "Lissett",
? ? "column_3": "[email protected]",
? ? "column_4": "Male",
? ? "column_5": "Recruiter"
? },
? {
? ? "column_0": "2",
? ? "column_1": "Susan",
? ? "column_2": "Moogan",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Executive Secretary"
? },
? {
? ? "column_0": "3",
? ? "column_1": "Carl",
? ? "column_2": "Cotesford",
? ? "column_3": "[email protected]",
? ? "column_4": "Male",
? ? "column_5": "Tech Architect"
? },
? {
? ? "column_0": "4",
? ? "column_1": "Debor",
? ? "column_2": "Sprakes",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Account Coordinator"
? },
? {
? ? "column_0": "5",
? ? "column_1": "Priscella",
? ? "column_2": "Genny",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Environmental Tech"
? }
]
.csv file without header row and having a "|" as a delimiter
If we replace "," delimiter with "|" delimiter in employeeData.csv
1|Dom|Lissett|[email protected]|Male|Recruiter
2|Susan|Moogan|[email protected]|Female|Executive Secretary
3|Carl|Cotesford|[email protected]|Male|Tech Architect
4|Debor|Sprakes|[email protected]|Female|Account Coordinator
5|Priscella|Genny|[email protected]|Female|Environmental Tech
On running the same mule application with the above data, the output of the above employeeData.csv file will be.
INFO? 2021-07-27 23:42:06,938 [[MuleRuntime].uber.05: [csv-read].csv-readFlow.CPU_INTENSIVE @6fbaa2fe] [processor: csv-readFlow/processors/1; event: 2c2f0f20-efb6-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
? {
? ? "column_0": "1|Dom|Lissett|[email protected]|Male|Recruiter"
? },
? {
? ? "column_0": "2|Susan|Moogan|[email protected]|Female|Executive Secretary"
? },
? {
? ? "column_0": "3|Carl|Cotesford|[email protected]|Male|Tech Architect"
? },
? {
? ? "column_0": "4|Debor|Sprakes|[email protected]|Female|Account Coordinator"
? },
? {
? ? "column_0": "5|Priscella|Genny|[email protected]|Female|Environmental Tech"
? }
]
Now, we are getting only one key-value pair for each row. By default mule considers "," as delimiter for "application/csv" MIME type data. We need to explicitly define "|" as a delimiter in parameters under the MIME type.
Now, by using the same file, we will be able to read data properly as shown below.
INFO? 2021-07-27 23:51:57,557 [[MuleRuntime].uber.06: [csv-read].csv-readFlow.CPU_INTENSIVE @6efd4175] [processor: csv-readFlow/processors/1; event: 8c31a710-efb7-11eb-8134-d4d25289cb8d] org.mule.runtime.core.internal.processor.LoggerMessageProcessor: [
? {
? ? "column_0": "1",
? ? "column_1": "Dom",
? ? "column_2": "Lissett",
? ? "column_3": "[email protected]",
? ? "column_4": "Male",
? ? "column_5": "Recruiter"
? },
? {
? ? "column_0": "2",
? ? "column_1": "Susan",
? ? "column_2": "Moogan",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Executive Secretary"
? },
? {
? ? "column_0": "3",
? ? "column_1": "Carl",
? ? "column_2": "Cotesford",
? ? "column_3": "[email protected]",
? ? "column_4": "Male",
? ? "column_5": "Tech Architect"
? },
? {
? ? "column_0": "4",
? ? "column_1": "Debor",
? ? "column_2": "Sprakes",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Account Coordinator"
? },
? {
? ? "column_0": "5",
? ? "column_1": "Priscella",
? ? "column_2": "Genny",
? ? "column_3": "[email protected]",
? ? "column_4": "Female",
? ? "column_5": "Environmental Tech"
? }
]
References:
Please share your valuable feedback ??
Mulesoft Consultant at HCL Technologies
1 年Hi Pranav Davar I have a requirement same as your last option .csv file without a header and "|" as a delimeter Tried in a same way and it didn't work for me. can you please help? This is my Input payload - 11281169|11091102|GULF COAST MARKET|2385 HIGHWAY 87|CRYSTAL BEACH|TX|77650| |4096842400|12040941614|5|0|0| |103|000980|S|04|0|D|0| | |0| |01|1000| |02|0|0|0|002|1001946|2023 TIER 1 (LF)|DPTX-W| |4416700000|0|0041|0|0|INDEPEND GROCERY|INDEPENDENT GROCERY|355|48|46526.88|0|LF| | | |GALVESTON| 11281611|11091102|BAYVUE GROCERY|1901 HWY 87|PORT BOLIVAR|TX|77650| |4096846220|17604685069|1|0|0| |112| |A|04|0|D|0| | |0| |01|1000| |01|HOU4287|0|0|002|1002433| |DPTX-W| |4416700000|0|0040|0|0|IBC|INDEPEND CONVENIENCE|355|48|0.00|0.00|SF| | | |GALVESTON| 11282565|11091102|DANNAY DONUTS|2124 HIGHWAY 87|CRYSTAL BEACH|TX|77650|KER|7136841778|32010714270|1|0|0| |214|000068|A|04|0|D|0| | |0| |01|1000| |01|0|0|0|002|1001231|REST/BARS/TAVERNS|DPTX-W| |4416700000|0|0040|0|0|ALL OTHER|QSR OTHER|355|48|0.00|0.00|SF| | | |GALVESTON|
Especialista SAP-ABAP | desenvolvedor Mulesoft | Engenheiro de computa??o
1 年Very useful! ??
Excellent article!
MuleSoft Developer | MuleSoft Mentor
1 年HI,if we want to define multiple delimiters then how do we do it like | and ;(semi-colon)etc
Solution Delivery Center Manager | Driving Operational Excellence & Client Satisfaction | Driving Operational Excellence | Strategic Planning & Execution | Process Improvement | Team Leader & Mentor
2 年how to remove the autogenerated keys "column_0, columns_1,..."?? I need the value only