Handle Large CSV Files (>12MB even over 36MB) in Apex with AWS Lambda Function Integration!

Handle Large CSV Files (>12MB even over 36MB) in Apex with AWS Lambda Function Integration!

At first glance, it seems impossible to handle a CSV file that is larger than 12MB in apex. Hang tight! It is possible for even more than 36MB CSV files with the right approach! Join me in exploring how Salesforce and AWS Lambda can be a game-changing combination.

Before diving into details AWS Lambda Function has a FREE tier! "400000.0 seconds are always free per month as part of AWS Free Usage Tier (Global-Lambda-GB-Second)"

Our journey began with an integration between Google Ad Manager and Salesforce, where Google provided data via CSV files. Initially, these files were manageable, under 10MB. However, as our operations expanded, so did the file sizes, ballooning to over 40MB. This posed a significant challenge as we needed to process and store each row into the Salesforce database.

The Solution: Here's how we tackled this challenge:

  1. Retrieve the CSV File: Using Apex, we fetched the CSV file link through an API.
  2. Utilize AWS Lambda: The file was sent to an AWS Lambda Function.
  3. File Splitting: The Lambda Function then split this file into manageable 3MB chunks.
  4. Processing Each Chunk: These smaller files were sent back to an Apex API Service for processing.

Below is a simplified version of the Apex code used to send the CSV file link to the AWS Lambda Function. Note: In your implementation, prefer using Named Credentials for enhanced security and simplicity.

Http h = new Http();
HttpRequest req = new HttpRequest();
req.setEndpoint('Lambda Function Endpoint');
req.setMethod('POST');
req.setBody('{"url": "CSV Link"}');
req.setHeader('Content-Type', 'application/json');
HttpResponse res = h.send(req);        

Handling Smaller CSV Files in Apex: The next step involves creating an Apex class to manage these smaller CSV files.

@RestResource(urlMapping='/CSVHandler/*')
global with sharing class CSVHandler {
    @HttpPost
    global static String processCSV() {
        // [Code to process CSV and handle exceptions]
    }
}        

Setting Up the Lambda Function: The final piece of the puzzle is configuring the Lambda Function with an endpoint URL. For a detailed guide, check out AWS Lambda URLs Tutorial. In summary, create a Python 3.9 function and enable the 'Function URL' under advanced settings.

Here's a snippet of the Lambda Function code for handling and splitting the CSV file:

import requests
import csv
import io

def lambda_handler(event, context):
    # Replace sensitive information with placeholders
    client_id = "YOUR_CLIENT_ID"
    client_secret = "YOUR_CLIENT_SECRET"
    username = "YOUR_EMAIL"
    password = "YOUR_PASSWORD_WITH_SECURITY_TOKEN"

    try:
        # Salesforce OAuth2 authentication
        params = {
            "grant_type": "password",
            "client_id": client_id,
            "client_secret": client_secret,
            "username": username,
            "password": password
        }
        auth_response = requests.post("https://login.salesforce.com/services/oauth2/token", params=params)
        access_token = "Bearer " + auth_response.json().get("access_token")
        instance_url = auth_response.json().get("instance_url")
        print("Access Token:", access_token)
        print("Instance URL", instance_url)
        
        # Get CSV url from request
        records = event['Records'][0]
        print("Records ", records)
        url = records['body']
        print("BODY URL", url)
        
        # Downloading CSV file
        response = requests.get(url)
        print("Google Response", response.status_code)
        csv_file = io.StringIO(response.text)
        print("response.text", len(response.text))
        
        # Splitting CSV into chunks
        max_chunk_size = 2.5 * 1024 * 1024  # 2.5 MB
        reader = csv.reader(csv_file)
        header = next(reader)
        current_chunk = []
        current_chunk_size = 0
        chunks = []

        # Chunk processing logic
        for row in reader:
            # [Code for processing rows and forming chunks]
            current_chunk.append(row)
            current_chunk_size += len(",".join(row)) + 1
            if current_chunk_size >= max_chunk_size:
                print("Current size: ", current_chunk_size);
                chunks.append(current_chunk)
                current_chunk = []
                current_chunk_size = 0
            if len(current_chunk) > 0:
                chunks.append(current_chunk)

        # Sending each chunk to Salesforce API
        for chunk in chunks:
            # Add header to each chunk
            chunk.insert(0, header)

            # Create CSV file from the chunk
            csv_file = io.StringIO();
            writer = csv.writer(csv_file)
            writer.writerows(chunk)
            csv_file.seek(0)
            
            reqBody = '' + csv_file.read()+ ''
            csv_file_content = csv_file.read()
            csv_file.close()

            # Send the chunk to the other API
            headers = {'Authorization': access_token,'Content-Type': 'text/text; charset=utf-8'}

            sfResponse = requests.post("https://YOUR_INSTANCE.my.salesforce.com/services/apexrest/CSVHandler", data = reqBody.encode('utf-8') ,headers=headers)
            print("response: " , sfResponse.text)
        return {
            'statusCode': 200,
            'body': "Success"
        }

    except Exception as e:
        raise e
        

Conclusion: There you have it - a seamless process to manage large CSV files in Apex using AWS Lambda! The integration not only solves the size limitation issue but also streamlines the data handling process.

Hope this guide helps you navigate similar challenges with ease.





要查看或添加评论,请登录

Halil ?brahim ?zdemir的更多文章

社区洞察

其他会员也浏览了