登录查看更多内容

Why creating a single or limited global AWS S3 bucket in an organization make any sense?

Vinod Kumar Nair

Cloud Architect (AWS) | CNCF Kubestronaut | SaaS | OpenSource Contributor | Blogger | DoKC Ambassador

发布日期: 2020年9月13日

When it comes to architecting and creating an AWS S3 or Amazon's Simple Storage Service bucket in any organization especially in large corporations usually we start with a single S3 bucket or rather with a limited number of S3 buckets depending upon the various Business Units (BU) or Verticals says Finance, Manufacturing, etc in a company.

It is always a good practice to limit the S3 bucket for various good reasons and keep the data centralized.

Especially for a growing company like Startups, if we start restricting the number of buckets from the beginning itself then it will be very easier to manage the data if we look at the long term scope. And eventually, it becomes one of the best practices that the employees within the Startup can follow.

Just think of a scenario where different teams within a Startups have started creating their own S3 buckets for various project-related work and then comes to a point later where it becomes too difficult to move out to single or limited bucket architecture. This will lead to both operational & maintenance costs.

As a good practice, a company can create a single, global, unique S3 bucket for different environments say sandbox, development, QA, pre-production, and production. And within that bucket, various Engineering teams can get access to create/read/write data from their own restricted prefixes (or sub-folders in the layman terminology). This will give more control to an organization over their data as it is kept centralized and secured. As a good practice, we can also enable a separate centralized logging bucket to keep track of every activity performed on that centralized data bucket. We can also enable various properties on that centralized data bucket like versioning and data replication as a backup for the disaster recovery in other AWS regions.

A classic example would be a bucket like in this format as <bucket-name>-<environment>-<region>-<AWS Account id>

global-bucket-sbx-ap-southeast-1-088853283839

As shown in the picture above, the export name value is global-bucket which can be imported then into any other CloudFormation templates using the !ImportValue global-bucket

Two Buckets with Centralized Logging and Centralized Data

Note: Both MyS3Bucket and LoggingBucket as shown in the above picture are just logical names only. The actual name of the bucket gets resolved based on your environment and account id. For instance in this case as global-bucket-sbx-ap-southeast-1-088853283839 and global-loggings-sbx-ap-southeast-1-088853283839.

Keeping a single bucket has the following advantages:-

Operational cost
Ease of maintenance
Ease of restrictions

Lets' discuss all points one by one.

Operation cost - As a service, S3 itself is free and it scales automatically when it comes to the storage of large volumes of data however there is an operational cost involved in it when it comes to data-in and data-out from a network and of course with other key parameters. Keeping the data scattered in various buckets leads to untracked operational costs (network transfer-out usage).
Ease of maintenance - No need to worry about the maintenance of various buckets as your DevOps teams have to manage a single bucket only. As the company grows, your DevOps team can give access to the development or product team to specific prefix within a bucket to read or write data into it by giving the bucket level access. You can write a single Infrastructure as Code (or IaC) template file (JSON or YAML) for a centralized Bucket creation with Outputs as export value and in the future, your any development team can refer that Export value in their IaC code (AWS CloudFormation or AWS Serverless Application Model). One can also, implement the life cycle management on this bucket to save cost.
Ease of restrictions - Access will be given not at the bucket level (root or top-level) but only at the prefix level or the sub-folder level. Your teams can play around with in their own prefix

A typical example to create a global, centralized, unique AWS S3 bucket based on different environments is shown below by writing this AWS Cloud Formation template:-

AWSTemplateFormatVersion: '2010-09-09'
Metadata: 
  License: Unlicensed
Description: >
  This template creates a global unique S3 bucket in a specific region which is unique. 
  The bucket name is formed by the environment, account id and region


Parameters:
 #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html

  Environment:
    Description: This paramenter will accept the environment details from the user
    Type: String
    Default: sbx
    AllowedValues:
      - sbx
      - dev
      - qa
      - e2e
      - prod
    ConstraintDescription: Invalid environment. Please select one of the given environments only


Resources:

  #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-s3-bucket.html

  MyS3Bucket:
      Type: AWS::S3::Bucket
      DeletionPolicy: Retain
      Properties: 
        BucketName: !Sub 'global-bucket-${Environment}-${AWS::Region}-${AWS::AccountId}' 

#https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/pseudo-parameter-reference.html

        AccessControl: Private                
        LoggingConfiguration:
          DestinationBucketName: !Ref 'LoggingBucket'
          LogFilePrefix:  'access-logs'
        Tags:
          - Key: name
            Value: globalbucket
          - Key: department
            Value: engineering
  LoggingBucket:
    Type: AWS::S3::Bucket
    DeletionPolicy: Retain
    Properties:
      BucketName: !Sub 'global-loggings-${Environment}-${AWS::Region}-${AWS::AccountId}'
      AccessControl: LogDeliveryWrite      


Outputs:
  MyS3Bucket:
    Description: A private S3 bucket with deletion policy as retain and logging configuration
    Value: !Ref MyS3Bucket
    Export:
      Name: global-bucket

Then import the value of the bucket in any Cloud Formation resource. For instance, like below where we are importing it into the Lambda as an environment variable:-

AWSTemplateFormatVersion: '2010-09-09'
Metadata: 
  License: Unlicensed
Description: >
  This template creates a lambda function which gets triggered by any event occured in the S3 global bucket


Parameters:
  
#https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/parameters-section-structure.html
  Environment:
    Description: This paramenter will accept the environment details from the user
    Type: String
    Default: sbx
    AllowedValues:
      - sbx
      - dev
      - qa
      - e2e
      - prod
    ConstraintDescription: Invalid environment. Please select one of the given environments only


Resources:

  #https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-lambda-function.html
  HelloLambda:
    Type: AWS::Lambda::Function
    Properties:
      Code: 
        ZipFile: |
          var aws = require('aws-sdk')
          var response = require('cfn-response')
          exports.handler = function(event, context) {
              console.log("REQUEST RECEIVED:\n" + JSON.stringify(event))
              // For Delete requests, immediately send a SUCCESS response.
              if (event.RequestType == "Delete") {
                  response.send(event, context, "SUCCESS")
                  return
              }
              var responseStatus = "FAILED"
              var responseData = {}
              var functionName = event.ResourceProperties.FunctionName
              var lambda = new aws.Lambda()
              lambda.invoke({ FunctionName: functionName }, function(err, invokeResult) {
                  if (err) {
                      responseData = {Error: "Invoke call failed"}
                      console.log(responseData.Error + ":\n", err)
                  }
                  else responseStatus = "SUCCESS"
                  response.send(event, context, responseStatus, responseData)
              })
          }
      Description: >
          This is just a sample hello world lambda that uses prefix of an existing S3 bucket
      Environment:
        Variables:
          BUCKET_NAME: !ImportValue global-bucket  
      FunctionName: !Sub 'hellolambda-${Environment}-${AWS::Region}-${AWS::AccountId}'
      Handler: index.handler
      MemorySize: 128
      ReservedConcurrentExecutions: 0
      Role: !GetAtt LambdaExecutionRole.Arn
      Runtime: nodejs12.x
      Tags: 
        - Key: name
          Value: testlambda
      Timeout: 10
  LambdaExecutionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: root
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - logs:*
            Resource: arn:aws:logs:*:*:*

Hope this article has given an insight into the importance of keeping a single S3 architecture in a company.

Do share your feedback and comments on this :)

Cheers

要查看或添加评论，请登录

Vinod Kumar Nair的更多文章

Designing a scalable Webhook using AWS Serverless Stack

2024年3月3日

Designing a scalable Webhook using AWS Serverless Stack

We have often used Webhook a lot during our interactions with the systems unknowingly but do we know what exactly a…

5 条评论
A 3-tier web application using AWS Serverless Application Model (SAM) tool

2024年2月5日

A 3-tier web application using AWS Serverless Application Model (SAM) tool

With AWS SAM (Serverless Application Model), we can create serverless resources on AWS to build & develop serverless…
Using AWS DMS to migrate & replicate data from an unmanaged database server to another

2023年5月21日

Using AWS DMS to migrate & replicate data from an unmanaged database server to another

Introduction to AWS DMS AWS DMS (Database Migration Service) is a managed migration and replication service that helps…
Introduction to Deno (the next level of NodeJS?Runtime)

2020年6月28日

Introduction to Deno (the next level of NodeJS?Runtime)

Deno is a secure runtime for JavaScript and TypeScript. It is developed by same person, Ryan Dahl who developed NodeJS…
Decoding a JWT token in?NodeJS

2020年6月8日

Decoding a JWT token in?NodeJS

JWT (or JSON Web Tokens) are an open, industry standard RFC 7519 method for representing claims securely between two…

2 条评论

See all articles

Why creating a single or limited global AWS S3 bucket in an organization make any sense?

Vinod Kumar Nair

Cloud Architect (AWS) | CNCF Kubestronaut | SaaS | OpenSource Contributor | Blogger | DoKC Ambassador

Vinod Kumar Nair的更多文章

社区洞察

其他会员也浏览了

Kafka & Serverless: A Match Made in the Cloud

The Main Use of Apache Airflow in Cloud Environments

Designing Systems to Handle Millions of Users: A Practical Guide

Serverless vs. Kubernetes on AWS: Minimizing Complexity to Accelerate Innovation

Working with Persistent Volumes in Kubernetes

Building Scalable Software for Enterprise Use: Best Practices

Deep Dive into Chaos Engineering for RDS and PolarDB on Alibaba Cloud

77. AWS re:Invent 2021 recap - #3 Putting your data to work

Log Management through ELK Stack

Vinod Kumar Nair的更多文章

Designing a scalable Webhook using AWS Serverless Stack

A 3-tier web application using AWS Serverless Application Model (SAM) tool

Using AWS DMS to migrate & replicate data from an unmanaged database server to another

Introduction to Deno (the next level of NodeJS?Runtime)

Decoding a JWT token in?NodeJS

社区洞察

其他会员也浏览了

Kafka & Serverless: A Match Made in the Cloud

The Main Use of Apache Airflow in Cloud Environments

Designing Systems to Handle Millions of Users: A Practical Guide

Serverless vs. Kubernetes on AWS: Minimizing Complexity to Accelerate Innovation

Working with Persistent Volumes in Kubernetes

Building Scalable Software for Enterprise Use: Best Practices

Deep Dive into Chaos Engineering for RDS and PolarDB on Alibaba Cloud

77. AWS re:Invent 2021 recap - #3 Putting your data to work

Log Management through ELK Stack