Why creating a single or limited global AWS S3 bucket in an organization make any sense?
Vinod Kumar Nair
Cloud Architect (AWS) | CNCF Kubestronaut | SaaS | OpenSource Contributor | Blogger | DoKC Ambassador
When it comes to architecting and creating an AWS S3 or Amazon's Simple Storage Service bucket in any organization especially in large corporations usually we start with a single S3 bucket or rather with a limited number of S3 buckets depending upon the various Business Units (BU) or Verticals says Finance, Manufacturing, etc in a company.
It is always a good practice to limit the S3 bucket for various good reasons and keep the data centralized.
Especially for a growing company like Startups, if we start restricting the number of buckets from the beginning itself then it will be very easier to manage the data if we look at the long term scope. And eventually, it becomes one of the best practices that the employees within the Startup can follow.
Just think of a scenario where different teams within a Startups have started creating their own S3 buckets for various project-related work and then comes to a point later where it becomes too difficult to move out to single or limited bucket architecture. This will lead to both operational & maintenance costs.
As a good practice, a company can create a single, global, unique S3 bucket for different environments say sandbox, development, QA, pre-production, and production. And within that bucket, various Engineering teams can get access to create/read/write data from their own restricted prefixes (or sub-folders in the layman terminology). This will give more control to an organization over their data as it is kept centralized and secured. As a good practice, we can also enable a separate centralized logging bucket to keep track of every activity performed on that centralized data bucket. We can also enable various properties on that centralized data bucket like versioning and data replication as a backup for the disaster recovery in other AWS regions.
A classic example would be a bucket like in this format as <bucket-name>-<environment>-<region>-<AWS Account id>
global-bucket-sbx-ap-southeast-1-088853283839
As shown in the picture above, the export name value is global-bucket which can be imported then into any other CloudFormation templates using the !ImportValue global-bucket
Note: Both MyS3Bucket and LoggingBucket as shown in the above picture are just logical names only. The actual name of the bucket gets resolved based on your environment and account id. For instance in this case as global-bucket-sbx-ap-southeast-1-088853283839 and global-loggings-sbx-ap-southeast-1-088853283839.
Keeping a single bucket has the following advantages:-
- Operational cost
- Ease of maintenance
- Ease of restrictions
Lets' discuss all points one by one.
- Operation cost - As a service, S3 itself is free and it scales automatically when it comes to the storage of large volumes of data however there is an operational cost involved in it when it comes to data-in and data-out from a network and of course with other key parameters. Keeping the data scattered in various buckets leads to untracked operational costs (network transfer-out usage).
- Ease of maintenance - No need to worry about the maintenance of various buckets as your DevOps teams have to manage a single bucket only. As the company grows, your DevOps team can give access to the development or product team to specific prefix within a bucket to read or write data into it by giving the bucket level access. You can write a single Infrastructure as Code (or IaC) template file (JSON or YAML) for a centralized Bucket creation with Outputs as export value and in the future, your any development team can refer that Export value in their IaC code (AWS CloudFormation or AWS Serverless Application Model). One can also, implement the life cycle management on this bucket to save cost.
- Ease of restrictions - Access will be given not at the bucket level (root or top-level) but only at the prefix level or the sub-folder level. Your teams can play around with in their own prefix
A typical example to create a global, centralized, unique AWS S3 bucket based on different environments is shown below by writing this AWS Cloud Formation template:-
AWSTemplateFormatVersion: '2010-09-09' Metadata: License: Unlicensed Description: > This template creates a global unique S3 bucket in a specific region which is unique. The bucket name is formed by the environment, account id and region Parameters: #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html Environment: Description: This paramenter will accept the environment details from the user Type: String Default: sbx AllowedValues: - sbx - dev - qa - e2e - prod ConstraintDescription: Invalid environment. Please select one of the given environments only Resources: #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket.html MyS3Bucket: Type: AWS::S3::Bucket DeletionPolicy: Retain Properties: BucketName: !Sub 'global-bucket-${Environment}-${AWS::Region}-${AWS::AccountId}' #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/pseudo-parameter-reference.html AccessControl: Private LoggingConfiguration: DestinationBucketName: !Ref 'LoggingBucket' LogFilePrefix: 'access-logs' Tags: - Key: name Value: globalbucket - Key: department Value: engineering LoggingBucket: Type: AWS::S3::Bucket DeletionPolicy: Retain Properties: BucketName: !Sub 'global-loggings-${Environment}-${AWS::Region}-${AWS::AccountId}' AccessControl: LogDeliveryWrite Outputs: MyS3Bucket: Description: A private S3 bucket with deletion policy as retain and logging configuration Value: !Ref MyS3Bucket Export: Name: global-bucket
Then import the value of the bucket in any Cloud Formation resource. For instance, like below where we are importing it into the Lambda as an environment variable:-
AWSTemplateFormatVersion: '2010-09-09' Metadata: License: Unlicensed Description: > This template creates a lambda function which gets triggered by any event occured in the S3 global bucket Parameters: #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html Environment: Description: This paramenter will accept the environment details from the user Type: String Default: sbx AllowedValues: - sbx - dev - qa - e2e - prod ConstraintDescription: Invalid environment. Please select one of the given environments only Resources: #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html HelloLambda: Type: AWS::Lambda::Function Properties: Code: ZipFile: | var aws = require('aws-sdk') var response = require('cfn-response') exports.handler = function(event, context) { console.log("REQUEST RECEIVED:\n" + JSON.stringify(event)) // For Delete requests, immediately send a SUCCESS response. if (event.RequestType == "Delete") { response.send(event, context, "SUCCESS") return } var responseStatus = "FAILED" var responseData = {} var functionName = event.ResourceProperties.FunctionName var lambda = new aws.Lambda() lambda.invoke({ FunctionName: functionName }, function(err, invokeResult) { if (err) { responseData = {Error: "Invoke call failed"} console.log(responseData.Error + ":\n", err) } else responseStatus = "SUCCESS" response.send(event, context, responseStatus, responseData) }) } Description: > This is just a sample hello world lambda that uses prefix of an existing S3 bucket Environment: Variables: BUCKET_NAME: !ImportValue global-bucket FunctionName: !Sub 'hellolambda-${Environment}-${AWS::Region}-${AWS::AccountId}' Handler: index.handler MemorySize: 128 ReservedConcurrentExecutions: 0 Role: !GetAtt LambdaExecutionRole.Arn Runtime: nodejs12.x Tags: - Key: name Value: testlambda Timeout: 10 LambdaExecutionRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Principal: Service: - lambda.amazonaws.com Action: - sts:AssumeRole Path: "/" Policies: - PolicyName: root PolicyDocument: Version: '2012-10-17' Statement: - Effect: Allow Action: - logs:* Resource: arn:aws:logs:*:*:*
Hope this article has given an insight into the importance of keeping a single S3 architecture in a company.
Do share your feedback and comments on this :)
Cheers