登录查看更多内容

Day - 07 | Databases & Analytics | AWS Cloud Practitioner Certification CLF-C02

Anshul Agarwal

? SDET + DevOps ? | Selenium/Appium (Java & Python) | API testing (Postman + RestAssured) | Cypress | WebdriverIO | Playwright | Robot Framework | CI/CD | Python | AWS | Docker | Linux | Terraform | Ansible | Jenkins

发布日期: 2024年7月29日

+ 关注

Databases & Analytics

Databases Intro
Relational Databases
NoSQL Databases

? NoSQL data example: JSON

Databases & Shared Responsibility on AWS
AWS RDS Overview

? Advantage over using RDS versus deploying DB on EC2

? RDS Deployments: Read Replicas, Multi-AZ

? RDS Deployments: Multi-Region

Amazon Aurora
Amazon ElastiCache Overview
DynamoDB

? DynamoDB Accelerator - DAX

? DynamoDB - Global Tables

Redshift Overview
Amazon EMR
Amazon Athena
Amazon QuickSight
DocumentDB
Amazon Neptune
Amazon QLDB
Amazon Managed Blockchain
AWS Glue
DMS - Database Migration Service
Databases & Analytics Summary

Databases Intro

AWS offers a comprehensive set of database and analytics services to store, manage, and analyze data efficiently. These services cater to various needs, from relational databases to NoSQL, in-memory caching, and data warehousing.

? Storing data on disk (EFS, EBS, EC2 Instance Store, S3) can have its limits

? Sometimes, you want to store data in a database…

? You can structure the data

? You build indexes to efficiently query / search through the data

? You define relationships between your datasets

? Databases are optimized for a purpose and come with different features, shapes and constraint

Relational Databases

Relational databases store data in tables with predefined schemas and use SQL (Structured Query Language) for querying and managing data. They are ideal for applications requiring complex queries and transactions.

? Looks just like Excel spreadsheets, with links between them!

? Can use the SQL language to perform queries / lookups

NoSQL Databases

NoSQL = non-SQL = non relational databases
NoSQL databases are purpose built for specific data models and have flexible schemas for building modern applications.
Benefits:

? Flexibility: easy to evolve data model

? Scalability: designed to scale-out by using distributed clusters

? High-performance: optimized for a specific data model

? Highly functional: types optimized for the data model

Examples: Key-value, document, graph, in-memory, search databases

NoSQL data example: JSON

JSON = JavaScript Object Notation
JSON is a common form of data that fits into a NoSQL model
Data can be nested
Fields can change over time
Support for new types: arrays, etc.

Databases & Shared Responsibility on AWS

AWS manages the infrastructure, software patching, backups, scaling, operations, upgrades, monitoring, and alerting while customers are responsible for data security, database schema, and access management.

NOTE - Many databases technologies could be run on EC2, but you must handle yourself the resiliency, backup, patching, high availability, fault tolerance, scaling.

AWS RDS Overview

Amazon RDS (Relational Database Service) is a managed service that makes it easy to set up, operate, and scale a relational database using SQL as a query language in the cloud.

It supports multiple database engines, including Amazon Aurora, MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server.

Advantage over using RDS versus deploying DB on EC2

RDS is a managed service:

? Automated provisioning, OS patching

? Continuous backups and restore to specific timestamp (Point in Time Restore)!

? Monitoring dashboards

? Read replicas for improved read performance

? Multi AZ setup for DR (Disaster Recovery)

? Maintenance windows for upgrades

? Scaling capability (vertical and horizontal)

? Storage backed by EBS (gp2 or io1)

BUT you can’t SSH into your instances

领英推荐

How to Improve the Performance of DynamoDB in General…

Centizen, Inc. 5 个月前

Azure Cosmos DB’s Advantages Over Standard Databases

Bizmetric 5 个月前

AWS Glue vs. AWS DataSync: Choosing the Right Data…

WorkiFicient Technologies Pvt Ltd 9 个月前

RDS Deployments: Read Replicas, Multi-AZ

RDS Deployments: Multi-Region

Multi-Region deployments replicate data across different AWS regions to ensure global availability and disaster recovery. It solves local performance for global reads and have replication cost.

Amazon Aurora

Aurora is a proprietary technology from AWS (not open sourced)
PostgreSQL and MySQL are both supported as Aurora DB
Aurora is “AWS cloud optimized” and claims 5x performance improvement over MySQL on RDS, over 3x the performance of Postgres on RDS
Aurora storage automatically grows in increments of 10GB, up to 64 TB.
Aurora costs more than RDS (20% more) – but is more efficient
Not in the free tier

Amazon ElastiCache Overview

The same way RDS is to get managed Relational Databases…
ElastiCache is to get managed Redis or Memcached
Caches are in-memory databases with high performance, low latency
Helps reduce load off databases for read intensive workloads
AWS takes care of OS maintenance / patching, optimizations, setup, configuration, monitoring, failure recovery and backup

DynamoDB

Fully Managed Highly available with replication across 3 AZ
NoSQL database - not a relational database
Scales to massive workloads, distributed “serverless” database
Millions of requests per seconds, trillions of row, 100s of TB of storage
Fast and consistent in performance
Single-digit millisecond latency – low latency retrieval
Integrated with IAM for security, authorization and administration
Low cost and auto scaling capabilities
Standard & Infrequent Access (IA) Table Class

DynamoDB Accelerator - DAX

Fully Managed in-memory cache for DynamoDB
10x performance improvement – single- digit millisecond latency to microseconds latency – when accessing your DynamoDB tables
Secure, highly scalable & highly available
Difference with ElastiCache at the CCP level: DAX is only used for and is integrated with DynamoDB, while ElastiCache can be used for other databases

DynamoDB - Global Tables

Make a DynamoDB table accessible with low latency in multiple-regions
Active-Active replication (read/write to any AWS Region)

Redshift Overview

Redshift is based on PostgreSQL, but it’s not used for OLTP (Online Transactional Processing)
It’s OLAP – online analytical processing (analytics and data warehousing)
Load data once every hour, not every second
10x better performance than other data warehouses, scale to PBs of data
Columnar storage of data (instead of row based)
Massively Parallel Query Execution (MPP), highly available
Pay as you go based on the instances provisioned
Has a SQL interface for performing the queries
BI tools such as AWS Quicksight or Tableau integrate with it

Amazon EMR

EMR stands for “Elastic MapReduce”
EMR helps creating Hadoop clusters (Big Data) to analyze and process vast amount of data
The clusters can be made of hundreds of EC2 instances
Also supports Apache Spark, HBase, Presto, Flink
EMR takes care of all the provisioning and configuration
Auto-scaling and integrated with Spot instances
Use cases: data processing, machine learning, web indexing, big data

Amazon Athena

Serverless query service to analyze data stored in Amazon S3
Uses standard SQL language to query the files
Supports CSV, JSON, ORC, Avro, and Parquet (built on Presto)
Pricing: $5.00 per TB of data scanned
Use compressed or columnar data for cost-savings (less scan)
Use cases: Business intelligence / analytics / reporting, analyze & query VPC Flow Logs, ELB Logs, CloudTrail trails, etc…
analyze data in S3 using serverless SQL, use Athena

Amazon QuickSight

Serverless machine learning-powered business intelligence service to create interactive dashboards
Fast, automatically scalable, embeddable, with per-session pricing
Use cases:Business analyticsBuilding visualizationsPerform ad-hoc analysisGet business insights using data
Integrated with RDS, Aurora, Athena, Redshift, S3…

DocumentDB

Aurora is an “AWS-implementation” of PostgreSQL / MySQL …
DocumentDB is the same for MongoDB (which is a NoSQL database)
MongoDB is used to store, query, and index JSON data
Similar “deployment concepts” as Aurora
Fully Managed, highly available with replication across 3 AZ
Aurora storage automatically grows in increments of 10GB, up to 64 TB.
Automatically scales to workloads with millions of requests per seconds

Amazon Neptune

Fully managed graph database
A popular graph dataset would be a social networkUsers have friendsPosts have commentsComments have likes from usersUsers share and like posts…
Highly available across 3 AZ, with up to 15 read replicas
Build and run applications working with highly connected datasets – optimized for these complex and hard queries
Can store up to billions of relations and query the graph with milliseconds latency
Highly available with replications across multiple AZs
Great for knowledge graphs (Wikipedia), fraud detection, recommendation engines, social networking

Amazon QLDB

QLDB stands for ”Quantum Ledger Database”
A ledger is a book recording financial transactions
Fully Managed, Serverless, High available, Replication across 3 AZ
Used to review history of all the changes made to your application data over time
Immutable system: no entry can be removed or modified, cryptographically verifiable
2-3x better performance than common ledger blockchain frameworks, manipulate data using SQL
Difference with Amazon Managed Blockchain: no decentralization component, in accordance with financial regulation rules

Amazon Managed Blockchain

Blockchain makes it possible to build applications where multiple parties can execute transactions without the need for a trusted, central authority.
Amazon Managed Blockchain is a managed service to:Join public blockchain networksOr create your own scalable private network
Compatible with the frameworks Hyperledger Fabric & Ethereum

AWS Glue

Managed extract, transform, and load (ETL) service
Useful to prepare and transform data for analytics
Fully serverless service
Glue Data Catalog: catalog of datasetscan be used by Athena, Redshift, EMR

DMS - Database Migration Service

Quickly and securely migrate databases to AWS, resilient, self healing
The source database remains available during the migration
Supports:Homogeneous migrations: ex Oracle to OracleHeterogeneous migrations: ex Microsoft SQL Server to Aurora

Databases & Analytics Summary

Relational Databases - OLTP: RDS & Aurora (SQL)
Differences between Multi-AZ, Read Replicas, Multi-Region
In-memory Database: ElastiCache
Key/Value Database: DynamoDB (serverless) & DAX (cache for DynamoDB)
Warehouse - OLAP: Redshift (SQL)
Hadoop Cluster: EMR
Athena: query data on Amazon S3 (serverless & SQL)
QuickSight: dashboards on your data (serverless)
DocumentDB: “Aurora for MongoDB” (JSON – NoSQL database)
Amazon QLDB: Financial Transactions Ledger (immutable journal, cryptographically verifiable)
Amazon Managed Blockchain: managed Hyperledger Fabric & Ethereum blockchains
Glue: Managed ETL (Extract Transform Load) and Data Catalog service
Database Migration: DMS
Neptune: graph database

Happy Learning !

要查看或添加评论，请登录

Anshul Agarwal的更多文章

Selenium WebDriver: Cross-Browser Testing Using Selenium Grid with Docker

2024年12月31日

Selenium WebDriver: Cross-Browser Testing Using Selenium Grid with Docker

Cross-browser testing ensures your web application works seamlessly across different browsers. With Selenium Grid 4 and…
A Comprehensive Guide : How to Test APIs and Large Language Models (LLMs)

2024年12月27日

A Comprehensive Guide : How to Test APIs and Large Language Models (LLMs)

API Testing 1. Understand the API API Documentation: Study endpoints, request/response formats, authentication methods,…

1 条评论
API Testing : Using Cypress

2024年12月24日

API Testing : Using Cypress

Here is a comprehensive tutorial for API Testing Using Cypress, designed to help you master API automation testing with…

1 条评论
Selenium - Interview Preparation Topics

2024年12月17日

Selenium - Interview Preparation Topics

1. Basics of Selenium What is Selenium? History and evolution of Selenium.
30-Day Learning Plan to master Selenium with Java, Page Object Model (POM), TestNG, and Cucumber BDD for Automation Testing

2024年12月16日

30-Day Learning Plan to master Selenium with Java, Page Object Model (POM), TestNG, and Cucumber BDD for Automation Testing

Here’s a 30-Day Learning Plan to master Selenium with Java, Page Object Model (POM), TestNG, and Cucumber BDD for…

1 条评论
Day - 11 | Cloud Integration | AWS Cloud Practitioner Certification CLF-C02

2024年12月13日

Day - 11 | Cloud Integration | AWS Cloud Practitioner Certification CLF-C02

? Cloud Integration ? Section Introduction ? Amazon SQS - Simple Queue Service ? Amazon Kinesis ? Amazon SNS ? Amazon…
Day - 10 | Global Infrastructure | AWS Cloud Practitioner Certification CLF-C02

2024年12月12日

Day - 10 | Global Infrastructure | AWS Cloud Practitioner Certification CLF-C02

? Why make a global application? ? Global AWS Infrastructure ? Global Applications in AWS ? Amazon Route 53 Overview ?…

1 条评论
AWS Certified Cloud Practitioner (AWS-CLF-C02)

2024年12月11日

AWS Certified Cloud Practitioner (AWS-CLF-C02)

AWS Certified Cloud Practitioner (AWS-CLF-C02): Essential Services at a Glance! ?? ?? Compute ?? EC2 (Elastic Compute…

1 条评论
Mastering Mock API Testing with Cypress!

2024年11月30日

Mastering Mock API Testing with Cypress!

When it comes to frontend testing, handling dynamic API responses can be tricky. But with Cypress, mocking API…
YAML Tutorial: A Comprehensive Guide ????

2024年11月25日

YAML Tutorial: A Comprehensive Guide ????

What is YAML? ?? YAML (short for "YAML Ain't Markup Language") is a human-readable data serialization format commonly…

1 条评论

See all articles

Databases & Analytics

Databases Intro

Relational Databases

NoSQL Databases

NoSQL data example: JSON

Databases & Shared Responsibility on AWS

AWS RDS Overview

Advantage over using RDS versus deploying DB on EC2

领英推荐

RDS Deployments: Read Replicas, Multi-AZ

RDS Deployments: Multi-Region

Amazon Aurora

Amazon ElastiCache Overview

DynamoDB

DynamoDB Accelerator - DAX

DynamoDB - Global Tables

Redshift Overview

Amazon EMR

Amazon Athena

Amazon QuickSight

DocumentDB

Amazon Neptune

Amazon QLDB

Amazon Managed Blockchain

AWS Glue

DMS - Database Migration Service

Databases & Analytics Summary

Anshul Agarwal的更多文章

Selenium WebDriver: Cross-Browser Testing Using Selenium Grid with Docker

A Comprehensive Guide : How to Test APIs and Large Language Models (LLMs)

API Testing : Using Cypress

Selenium - Interview Preparation Topics

30-Day Learning Plan to master Selenium with Java, Page Object Model (POM), TestNG, and Cucumber BDD for Automation Testing

Day - 11 | Cloud Integration | AWS Cloud Practitioner Certification CLF-C02

Day - 10 | Global Infrastructure | AWS Cloud Practitioner Certification CLF-C02

AWS Certified Cloud Practitioner (AWS-CLF-C02)

Mastering Mock API Testing with Cypress!

YAML Tutorial: A Comprehensive Guide ????

社区洞察

其他会员也浏览了

Amazon Aurora: High-Performance, Scalable Database Solution in AWS - AWS Series EP 03

Cloud Storage and ETL Pricing: A Comparison of Azure, AWS, and GCP

Amazon DynamoDB: Scalable NoSQL Database Simplified EP:08

DynamoDB Difinition & Data Modeling

AWS Data Architecture

Week 23 (3 Jun - 9 Jun)

AMAZON REDSHIFT VS AZURE SYNAPSE: WHICH IS THE WINNER?

AWS Glue Tutorial for Beginners

AWS DynamoDB Fundamentals | A Complete Guide