Preparing for your AWS Solutions Architect - Associate Exam
"One who speaks deferentially but increases his preparations will advance. One who speaks belligerently and advances hastily will retreat." - Sun Tzu “The Art of War”,
Having recently passed the SAA-CO2 from AWS, I've consolidated some pointers and reviewers which helped me a lot in the examination. Feel free to use and Good luck!
IAM
- universal (not regional)
- root account for creating other accounts (multifactor)
- new user has no access allowed by default
- Access Key ID and Secret Access Keys are used in API, CLI and SDK and not in console.
- New users passwords are generated by the creator for console access
- SSO to any domain-joined EC2 instance
- AWS Managed Microsoft AD - Active Directory support
- AD Trust - extends existing AD inside AWS
- Simple AD (no AD trust) - provides directory type of hierarchy for users.
- AD Connector to add AD trust in Simple AD
- Amazon Resource Names (ARN)
- IAM Policy - Effect (Allow or Deny) / Action (AWS API) /Resource
- YOU HAVE TO ATTACH! to either resource or identity to work.
- List of statements
- Deny supersedes allow
- Identity vs Resource Policy
- Permission Boundaries - limits maximum permission
Resource Access Manager (RAM)
- share (clonable) resources to ANOTHER account --appMesh,aurora,codebuild,ec2,ec2 image builder, license, resource group, route53
Single Sign-On Authentifaction (SSO)
- SAML 2.0 grants
S3
- universal object based storage which stands for Simple Storage Service
- 0 to 5TB of allowed storage
- private by default, and can be made public via bucket policies (bucket wide) and access control list (individual file)
- S3 files have the following details: key, value, version, metadata (access control list, torrent)
- PUTS OF NEW OBJECTS have "read over write consistency"
- overwrite PUTS AND DELETE have "eventual consistency"
- The more prefix the faster (3,500 put/copy/post/delete and 5500 get/head requests per second per prefix)
- S3 Standard
- S3 IA - infrequent but fast
- S3 Intelligent Tiering - machine learning driven
- S3 One Zone - IA - cheaper IA but not tolerant
- S3 Glacier - mainly for data archive
- S3 Deep Archive - 12 hours retrieval, cheaper than Glacier
Encryption in S3
A. In Transit
- - SSL/TSL
B. At rest
- S3 Managed Keys - SSE-S3
- KMS
- uploading/downloading counts in kms quota
- region-specific quota (5,500, 10,000 or 30,000)
- CANNOT request to increase quota
- Server Side Encryption With Customer Provided Keys -SSE-C
- Client Side Encryption
Locks in S3
S3 Object Locks - write once, read many (WORM)
- governance - requires special permission
- compliance - restricts even the root
S3 Glacier Vault Lock - glacier lock
Handling of large files in S3
- UPLOAD - multi-part upload faster upload of big file (100MB - 5G)
- DOWNLOAD - S3 byte-range fetches, for faster download
Sharing in S3
- Bucket Policies and IAM (for entire bucket via programmatic)
- Bucket ACLS and IAM (for each file via programmatic)
- Cross-Account IAM Roles (console and programmatic)
Cross Region Replication
- versioning must be enabled for both buckets across regions
- S3 Lifecycle Policies - automating moving of objects across S3 tiers (cheaper in Glacier)
Other S3 Features
- S3 Transfer Acceleration - Edge upload first hence faster (Cloundfront CAN BE WRITE TOO!)
- S3 Select - SQL syntax in getting S3
CloudFront
- Uses Edge Location for faster serving of web contents
- Origin can be s3 or ec2 or elb or route53
- Distribution is either web or rtmp
- Time-to-live (TTL) is the duration of the cache and can be invalidated to remove cache but will be additionally charged
- Secure content via signed URL (per file) or signed cookie (multiple files)
- EC2 with cloudfront should use signed url
AWS Data Sync - On-Premise to AWS
- install datasync on vm on-prem
- can be used to replicate EFS to EFS
AWS Snowball
Large Device for moving large files
Storage Gateway (on site to s3)
- File Gateway - per file on s3
- Volume Gateway -> Stored Volumes and Cached Volumes
- Gateway Virtual Tape Library (VTL)
Athena (similar to S3 SELECT!!!)
- serverless service whicg uses sql query for faster s3 retrieval
- commonly used to analyze logs in s3
Macie
- AI to check PII (Personal Identifiable Information)
AWS ORGANIZATIONS
- multi-factor authentication (MFA)
- paying account for billing only
- Service Control Policy (SCP) to enable or disable services on OU or individual
EC2
- On-Demand Instances
- Reserved Instances
- Spot Instances
- bid on market
- not charged by hour if aws terminates, but will be if user manually terminated the instance.
Dedicated Hosts
- can be either a dedicated instance or dedicated hosts
Elastic Block Store (EBS)
- think of a hard disk drive attached to EC2
- termination protection is off by default hence can be deleted!
- root gets deleted upon EC2 deletion, unlike other added EBS
- can be encrypted
- SSD: gp2 (most) , io1 (db)
- HHD: st1 (warehouse), sc1(fs) , Standard(infrequent)
- Volumes in EBS, -> always same AZ as EC2 instance
- Snapshots in S3
- incremental
- can be shared if NOT encrypted
EBS vs Instance Store
- Instance Store cannot be stopped, you will lose data
- EBS can be stopped. Upon deletion of EC2, EBS root may not be deleted.
Security Group (SG) in EC2
- All inbound is blocked
- All outbound are allowed
- Multiple ec2 to sg, as well as multiple sg to ec2
- STATEFUL!
- no IP Blocking (this is done via NACL which is stateless)
CloudWatch vs Cloudtrail
- cloudwatch is for performance monitoring (5 minutes by default)
- cloudtrail is for auditing the aws service usage
Roles
- more secure than access key and secret access key in EC2
- universal
Metadata and User data
-provides the IP details > curl https://169.254.169.254/meta-data/
- provides the bootstart script > curl https://169.254.169.254/user-data/
Elastic File System EFS
- supports NFS
- multi-AZ
- Read After Write Consistency
- Scales up to petabyte
Placement Groups
- Cluster -> same AZ hence faster
- Spread -> separate AZ requires own rack
- Partitioned -> multipled EC2 in partition per rack, think of cassandra
Relational Data Service (RDS)
- Online transaction processing (OLTP)
1. SQL
2. MySQL
3. PostGreSQL
4. MariaDB
5. Oracle
6. Aurora (does have a serverless aurora)
-- MultiAZ cannot be accessed as endpoint
Serverless
DynamoDB - serverless NoSQL
Redshift - online analytics processing (OLAP)
DB Cache
Elasticache: Memcached (scale out) and Redis (multiAZ and backups)
ReadReplicas
- can be multi-AZ
- faster performance
- ALL RDS except SQL
- can be promoted to master (breaks read replica)
Multi AZ
- for DR, force failover by rebooting RDS instance
DynamoDB
- SSD
- 3 geographical distinct data centres
- Eventual Consistency Reads (dafault) OR Srongly consistent Read (add cost)
Redshift
- 1 AZ
- backup 1 day - 35 days
- at least 3 copies
Aurora
- minimum of 6 copies: 2 copies per AZ for at least 3 AZ.
- use Aurora Serverless -- cheap, infrequent intermittent or unpredictable workload.
Route53
A - IP Address
ALIAS RECORD - top level domain
CNAME - LookUp for top level domain -> e.g. awsdns-27.co.uk
TTL - seconds (for DNS to refresh)
1. Simple Routing Policy - 1 record with multiple IP addresses in random
2. Weighted Routing Policy - add weights
3. Latency Routing Policy - based on latency, regional assignment
4. Failover Routing Policy - set 2 record, primary fails it goes to failover IP record
5. geolocation routing policy - geological position and not on latency
6. geoproximity policy - must use route53 traffic flow (gui flow for complex policy)
7. multivalue answer policy - same as simple but with health check
- 50 domains MAX limit
VPC
- contains IGW (or a Virtual Private Gateway), Route Table, Security Group, NACL, Subnets
- 1 subnet = 1 AZ
- 5 IP address per subnets
- 1 IGW per VPC
- AZ are randomized per account even with same name (e.g. us-east-2a may vary)
- Security Group are Stateful while NACL are Stateless
- no transitive peering
- 5 VPC per region
1. NAT Instance -> disable source destination check
- must be in a public subnet
- route out of private to NAT
- behind a SG
2. NAT Gateway -> redundant in AZ. 1 NG in 1 AZ
- 5gbps - 45gbps
- just update routeTable
- create a NAT Gateway per AZ for HA
NACL
- you can associate 1 NACL with multiple subnets; however a subnet can only be associated to a single NACL
- Rules in NACL are followed in order (100, 101, 200, 201 so on)
- stateless (should declare inbound as well as outbound) -- in contrast to SG
LOAD BALANCERS
- Needs at least 2 public subnets and should have IG for internet access
VPC Flow Logs
- only configured once
- can be tagged
- only for your VPC
- Exemption for logging:
-- traffic calling amazon dns server
-- windows instance for amazon windows license activation
-- 169.254.169.254 (instance metadata)
-- DHCP
-- reserved IPs
Bastion
-- ssh and rdp in your instance in private subnet
Direct Connect
-- connects data center to AWS
-- for high throughput workloads or for stable and reliable connection
How to Setup Direct Connect
- create a public virtual interface in the Direct Connect console
- create a customer Gateway in the vpc console > vpc connections.
- create a virtual private gateway
- attach the virtual private gateway to vpc
- create a vpn connection
- select the virtual private gateway > customer gateway
- once vpn is up, setup on the customer gateway or firewall
Global Accelerator
- provides quicker access of contents traversing AWS global network instead of public internet
-provides 2 IP addresses
- Control traffics via endpoints group
VPC Endpoint
- via aws global connections and NOT the internet
- utilizes PrivateLink
A. Interface Endpoints
- ENI with a private IP that serves as an entry point to supported device
-
B. Gateway Endpoints - same as NAT gateway
- available for S3 and DynamoDB
VPC PrivateLink prerequisite
- Nerwork Load Balancer in your service and ENI on the client VPC
AWS Transit Gateway
- a single point where all VPC connections can connect to instead of VPC peering, vpn, directConnect etc.
- hub and spoke model
- support IP MULTICAST (only one who does this)
AWS VPN CloudHub
- connecting multiple sites with vpn
- hub and spoke model
- over internet by encrypted
VPC Costing
- free incoming
- free public to private subnets within same AZ
- charges when subnet to another different AZ
- charges if traversing internet (twice as expensive)
- charge inter-region more expensive than inter AZ (duh!)
-
HIGH AVAILABILITY
Load Balancers : app, network and classic
Error 504, means app needs scaling
x-forward-for --> ipv4 of the client instead of IP of LB
- instances status: InService OR OutOfService
- DNS Name BUT NO IP address for app lb and classic. Network LB has static IP.
Sticky Session
- enable for instance that needs writing to local disk; disable so load is balanced
Enable Cross Zone Balancing
- load balances across AZ
Path Pattern
- path pattern based on URL
Autoscaling
- maintain current scaling at all time
- manually
- schedule
- demand
- predictive
--> create config (IMA script etc.)
--> create Auto-scaling Group (ASG) based on the config
If you delete ASG it will delete the instances as well
MQ Services
1. SQS - used to DECOUPLE applications so they run independently
- 256kb text (if higher then saved in s3 and not in sqs)
- STANDARD Queues - messages may be delivered OUT OF ORDER / TWICE
- solved by FIFO (FirstInFirstOut) - order preserved and sent EXACTLY ONCE
- 1 min to 14 days, default is 4 days retention
- IF twice message appears VISIBILITY TIMEOUT is not long enough (max 12hrs)
- to save cost do LONG POLLING
2. SWF (Amazon Workflow) - HUMAN TASK is involved in the tasks (e.g. warehouse) can last up to 1 year. ONCE never duplicated unlike SQS
-- swf actors: workflow starters (application that start SWF) -> Deciders -> Activity Workers
- "domain" collection of related workflows
3. SNS - Simple Notification Service - push notification
- grouping by "topic" which creates an Amazon Resource Name
- SES Simple Email Service
4. ELASTI TRANSCODER -media transcoder - paid by minutes and resolution transcode
5. API GATEWAY
- low cost and scale automatically
- throttle requests to prevent attack (!)
- API Caching (TTL in seconds)
- CORS - XSS attacks --> origin policy
6. Kinessis - data streams
- A. Kinesis Streams - allows to persistently store streams in 24hrs - 7 days
- data are stored in SHARDS in streams
- B. Kinesis Firehose- no data persistence, data needs to be processed
- C. Kinesis Analytics - works with streams and firehose and analyze data and stores in s3, redshift elastic search cluster.
FEDERATION ALLOWS A WEB IDENTITY PROVIDER
- Cognito -- login FB, GC
- User Pool - refers to the User login (username and pw) via JWT
- Identity Pool - send IAM role to grant access to AWS resources S3, lambda, DynamDB etc.
SECURITY
1. WAF --> attach before cloudfront or ALB!
- ip blocking and firewall
- works on level 7
- blocks sql attack or x-site injections
2. NACL - (ineffective with Cloudfront due to cloudfront IP)
- blocks IP Ranges
- layel 4
3. host-based firel - ufw, firewalld in vm ( ineffective with ALB)
Encryption
1. KMS
- manages customer manage keys (cmk)
- FIPS 140-2 Level 2 (while CloudHSM is Level 3)
- encrypt 4kb in size
A. AWS Managed CMK - used by default
B. Customer Managed CMK -
C. AWS Owned CMK
2. CloudHSM - encryption keys subject to corporate / regulatory requirments
- single tenant
- manage your own key, no API
3. Systems Manager Parameter Store (SSM) - like Keyvault
- serverless storage of secrets
- heirarchical
- used in CloudFormation to give secrets
SERVERLESS
- AWS X-Ray helps with debuggin lambda
Serverless Application Model (SAM)
-- deployment of serverless app via cloud formation
Elastic Container Service (ECS)
-- orchestration of containers (like k8s)
- task (vs pods in EKS)
Fargate (vs EC2)
- serverless container engine (as opposed to ec2) works with ECS and EKS (Elastic Kubernets Service)
- not GPU
Elastic Container Registry (ECR) - Similar to Docker Hub
SECURITY IN ECS
- A. Instance Role (ALL tasks will have the roles. Not secure and not least privilege)
- B. Task Role - roles per tasks
===========
PRACTICE EXAM NOTES AND POINTERS:
-----
- global acclerator vs cloudfront
- FSx (Windows File Server) - Lustre
- Amazon Workspace
- target tracking scaling policy vs step scaling vs simple
- origin access identity (cloudfront to s3 private)
- rtmp -> signed URL end date, time and IP address range for access (END DATE!)
Until 2018 there was a hard limit on S3 puts of 100 PUTs per second. To achieve this care needed to be taken with the structure of the name Key to ensure parallel processing. As of July 2018 the limit was raised to 3500 and the need for the Key design was basically eliminated. Disk IOPS is not the issue with the problem. The account limit is not the issue with the problem.
- S3 – OneZone-IA cheaper for infrequent access. Glacier is cheaper but has long retrieval time (99.50 % availability)
- Eventual consistentcy - Overwrite PUTS and DELETES
- Read after write consistency - New Objects
Virtual style puts your bucket name 1st, s3 2nd, and the region 3rd. Path style puts s3 1st and your bucket as a sub domain. Legacy Global endpoint has no region. S3 static hosting can be your own domain or your bucket name 1st, s3-website 2nd, followed by the region. AWS are in the process of phasing out Path style, and support for Legacy Global Endpoint format is limited and discouraged. However it is still useful to be able to recognize them should they show up in logs. https://docs.aws.amazon.com/AmazonS3/latest/dev/VirtualHosting.html
- EBS vs Instance Store - ephemeral storage (intance cannot be stop, only reboot - data can be lose if status check fails)
- ENI (Elastic Network Interface) - graphic card
- ENA (Enhance Network Adapter) - faster network 10gbps - 100gbps
- EFA (Elastic Fabric Adapter) High Performance Computing (HPC) / Machine Learning or OS bypass (linux only)
- How to Encrypt EBS from Uneencrypted EC2 -> snapshot, copy snapshot with enc,ima, deploy ima to ec2
- Automated Backups (1-35 days, log + backup) and DB Snapshot (not automatically deleted after db deletion)
- Disaster Recovery use RDS Multi-AZ
- For High Availability (Read Replica) turn on backup
- Redshift (data warehouse, for analytics 1 AZ, 1day backup , 35 days max too)
- Aurora Serverless - infrequent, intermittent or unpredicatble workloads
- Memcache advantage over Redis is multi-threaded the rest are for Redis
- Amazon Data Migration Service (DMS) needs Schema Conversion Tool (SCT) for heterogeneous migration (not of the same type) otherwise no need for homogeneous migration
- Caching Services: ElastiCache, DynamoDB (DAX), CloudFront (using edge locations) API Gateway
- User -> CloudFront -> API Gateway -> Lambda/EC2 -> DynamoDB or ElastiCache over RDS
- EMR Map Reduce (master node -> core node WITHIN Cluster)
- only setup once at start, backup logs of master node (mnt/var/log) to s3
- set of RDS: Oracle, SQL Server, MySQL,PostGresql
- RDS reserved Instance available for Multi-AZ
- 16TB RDS Provisioned IOPS, max size RDS
- RDS DB Security group NO NEED to expose port
- Amazon Resource Name arn: partition: service: region: account_id
For more contents and better preparation I recommend taking the online classes from AWS Academy and from Udemy (CloudGuru). I'm not affiliated with the group, but the paid virtual classes helped me a lot.
Plus, the online courses come with practice exams at the end. Try to take it until you've mastered the topics.
Whew! That's it. Better sure to read AWS Faq pages as well as exam notes. Good luck!
Program Management
4 年Congrats Ryan!
Solutions Architect
4 年Congrats bro!
Congrats! Thanks for sharing.
Cybersecurity Leader | Advocate for Open XDR | Driving 30% Cost Reduction in Threat Detection & Enhanced Security Capabilities
4 年hey Ryan
Information & Cyber Security Architect
4 年Nice pointers, but more importantly shared. Congratulations!