Handle Large CSV Files (>12MB even over 36MB) in Apex with AWS Lambda Function Integration!
At first glance, it seems impossible to handle a CSV file that is larger than 12MB in apex. Hang tight! It is possible for even more than 36MB CSV files with the right approach! Join me in exploring how Salesforce and AWS Lambda can be a game-changing combination.
Before diving into details AWS Lambda Function has a FREE tier! "400000.0 seconds are always free per month as part of AWS Free Usage Tier (Global-Lambda-GB-Second)"
Our journey began with an integration between Google Ad Manager and Salesforce, where Google provided data via CSV files. Initially, these files were manageable, under 10MB. However, as our operations expanded, so did the file sizes, ballooning to over 40MB. This posed a significant challenge as we needed to process and store each row into the Salesforce database.
The Solution: Here's how we tackled this challenge:
Below is a simplified version of the Apex code used to send the CSV file link to the AWS Lambda Function. Note: In your implementation, prefer using Named Credentials for enhanced security and simplicity.
Http h = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint('Lambda Function Endpoint');
req.setMethod('POST');
req.setBody('{"url": "CSV Link"}');
req.setHeader('Content-Type', 'application/json');
HttpResponse res = h.send(req);
Handling Smaller CSV Files in Apex: The next step involves creating an Apex class to manage these smaller CSV files.
领英推荐
@RestResource(urlMapping='/CSVHandler/*')
global with sharing class CSVHandler {
@HttpPost
global static String processCSV() {
// [Code to process CSV and handle exceptions]
}
}
Setting Up the Lambda Function: The final piece of the puzzle is configuring the Lambda Function with an endpoint URL. For a detailed guide, check out AWS Lambda URLs Tutorial. In summary, create a Python 3.9 function and enable the 'Function URL' under advanced settings.
Here's a snippet of the Lambda Function code for handling and splitting the CSV file:
import requests
import csv
import io
def lambda_handler(event, context):
# Replace sensitive information with placeholders
client_id = "YOUR_CLIENT_ID"
client_secret = "YOUR_CLIENT_SECRET"
username = "YOUR_EMAIL"
password = "YOUR_PASSWORD_WITH_SECURITY_TOKEN"
try:
# Salesforce OAuth2 authentication
params = {
"grant_type": "password",
"client_id": client_id,
"client_secret": client_secret,
"username": username,
"password": password
}
auth_response = requests.post("https://login.salesforce.com/services/oauth2/token", params=params)
access_token = "Bearer " + auth_response.json().get("access_token")
instance_url = auth_response.json().get("instance_url")
print("Access Token:", access_token)
print("Instance URL", instance_url)
# Get CSV url from request
records = event['Records'][0]
print("Records ", records)
url = records['body']
print("BODY URL", url)
# Downloading CSV file
response = requests.get(url)
print("Google Response", response.status_code)
csv_file = io.StringIO(response.text)
print("response.text", len(response.text))
# Splitting CSV into chunks
max_chunk_size = 2.5 * 1024 * 1024 # 2.5 MB
reader = csv.reader(csv_file)
header = next(reader)
current_chunk = []
current_chunk_size = 0
chunks = []
# Chunk processing logic
for row in reader:
# [Code for processing rows and forming chunks]
current_chunk.append(row)
current_chunk_size += len(",".join(row)) + 1
if current_chunk_size >= max_chunk_size:
print("Current size: ", current_chunk_size);
chunks.append(current_chunk)
current_chunk = []
current_chunk_size = 0
if len(current_chunk) > 0:
chunks.append(current_chunk)
# Sending each chunk to Salesforce API
for chunk in chunks:
# Add header to each chunk
chunk.insert(0, header)
# Create CSV file from the chunk
csv_file = io.StringIO();
writer = csv.writer(csv_file)
writer.writerows(chunk)
csv_file.seek(0)
reqBody = '' + csv_file.read()+ ''
csv_file_content = csv_file.read()
csv_file.close()
# Send the chunk to the other API
headers = {'Authorization': access_token,'Content-Type': 'text/text; charset=utf-8'}
sfResponse = requests.post("https://YOUR_INSTANCE.my.salesforce.com/services/apexrest/CSVHandler", data = reqBody.encode('utf-8') ,headers=headers)
print("response: " , sfResponse.text)
return {
'statusCode': 200,
'body': "Success"
}
except Exception as e:
raise e
Conclusion: There you have it - a seamless process to manage large CSV files in Apex using AWS Lambda! The integration not only solves the size limitation issue but also streamlines the data handling process.
Hope this guide helps you navigate similar challenges with ease.