Log Federation in AWS
Alfred David
Tech Innovation Alchemist | AI-to-Blockchain Strategist | Building World-Class Engineering Teams | Future-First Leader
This is a classic Big Data workflow example, where large volumes of data needs to be moved from an enterprise to a cloud-based PAAS, where a data repository is created either for master data management or as a data warehouse; I’m using this to illustrate how logs can be managed on AWS
This Data flow architecture defines two components EMR & Data pipeline; The EMR is used for splitting the large file to equal sized chunks in a massively parallel process (MPP) utilizing the full power of EMR and the HDFS filesystem in the background.
The Data pipeline is used to copy the chunks of data in the various S3 buckets to the respective tables in the redshift snowflake /star schema. It also defines the 'unload’ command of redshift wherein exception logs of redshift are copied to an S3 bucket for further action.The logs of both the EMR and Data Pipeline is logged using CloudTrail, which logs to S3 buckets; each application and compute instance can be configured to log into separate instances of S3
An SNS component listens to the S3 buckets and sends notifications as an email message to administrators or product owners The event message can then be sent to a lambda component from which any aggregation or business logic can be applied to derive collective log intelligence to get actionable data.
The Lambda component can utilise a NodeJS based javascript code, a java/scala class or even a scripting code like python to effect the necessary logic.
This aspect can be further customised wherein log events are sent to third party log management tools such as Splunk, ELK etc and can be used for long-term detailed analysis of logs and reports generation
I’ve just highlighted how we would do this using AWS cloud components at a high level and how to manage the logs generated therein , so any anomalies in the pipeline are captured and there is a mechanism for further deep analysis and retrospection and also a facility to monitor key logging events .
Cloud trail allows us to configure separate buckets for each of the app services and compute instance types. CloudTrail adds another dimension to the monitoring capabilities already offered by AWS; it does not change or replace logging features you might already be using such as those for Amazon S3 or Amazon CloudFront subscriptions. Amazon CloudWatch focuses on performance monitoring and system health; CloudTrail focuses on API activity. While CloudTrail does not report on system performance or health, you can use CloudTrail in conjunction with CloudWatch Logs alarms to notify you about activities that you might be interested in, This I’ve depicted below.