Design Approach-On-premise Database processing using AWS Glue
Prosenjit Das
Senior Enterprise Architect @Capgemini Cloud-Native , Integration Practice | Ex- EY , IBM
The objective of the solution to find out a way to establish a connectivity in between on-premise database and AWS Glue over the HTTP/s only
Inbound Process
1.An application written to capture change data (CDC)
2. There is a built-in translator program in the application to read CDC and convert it to JSON/CSV
3. Upload JSON/CSV[Raw data] to S3 Bucket over http/Https
4. An AWS Glue crawler creates a table for each stage of the data based on a job trigger or a predefined schedule. In this case we use AWS Lambda function to trigger the ETL process every time a new file is added to the Raw Data S3 bucket and transform it into Glue specific data format ( in case any transformation required for analysis)
5.Crawler also updates Glue catalog on schedule by reading data structure from S3
5. Store transformed data in Glue Catalog
6. A PySpark Job/ Java Native lib/ Native Py SQL can be written to read /write Glue schema for data processing
Outbound Process
1.PySpark Job can be written in Glue for any changes or processing of data in Glue
2. Glue transforms changes via catalog into output file(s) via jobs to S3.
3.There should be another scheduler in On-Premise application to read the changes in S3 and update the record in Local DB
Solution Architect at HCL Technologies
2 年I am learning AWS, Can you provide the code with sample data or any youtube link for On-premise Database processing using AWS Glue
Solution Architect at HCL Technologies
2 年Hi Prosenjit