Amazon Web Services is always a great learning experience 2020
AMIT KUMAR
Experienced IT Professional | Cloud Architect | IT Manager | Global IT Governance | Multi-Cloud Expert
Practicing Amazon Web Services is always a great learning experience; it brought the best out of myself. Love to share a part of my journey with you here.
Little bit about Background:
AWS Specialty Certification Program was announced first at Re:Invent 2016, at that moment I have set the target to acquire AWS Specialty. Inspired by the spirit of sanjay sir showed in attending all-3 specialty exam (Big Data, Advanced Networking & Security) on first day and for me it is "Day 1" attitude. As I am all 3 AWS Certified Associate, pretty confident on what I am getting into. Attempted all 3 Beta Exam in Jan 2017, just to understand the level of difficulty and to have the great experience of AWS Specialty Exam. This experience gave me the confidence that if well prepared I can beat it.
Started my preparation based on beta exam experience, Big Data Specialty exam is weighed heavily on Redshift, EMR and Kinesis, fairly questioned at DynamoDB, KMS, Machine Learning, Data Pipeline services and few scenario/fact based questions at IoT, ML and QuickSight. Prioritized myself to spend more time on Redshift, EMR & Kinesis services and worked around it.
Must to know key concepts:
Redshift: When I started working on redshift service, Sanjay (from acloud.guru) also started coming up with Online course. Detailed outline at service level drafted by him, helped a lot to structure my preparation.
Key concepts on Redshift are Table design – Distribution & Sort Keys, Column Encoding, Loading data into redshift from various AWS services and its methods, Work load Management, Redshift Encryption at rest and in-transit, Node types, Redshift Back-up cross region replication.
Don’t miss ‘Redshift Advanced Table Design Playbook’ from Bigdata Blog (read all the 5 parts of the blog) which is one of the best Redshift design documents out there.
Elastic Map Reduce: This topic is big and heavily weighed for Big Data Specialty credential. Key concepts: Understand Hadoop Architecture, Launching EMR cluster and associated applications related to Hadoop/Spark framework. EMRFS - Transient and permanent cluster use case, Monitoring Hadoop cluster/HDFS performance, Spark in EMR, Spark Streaming and its interaction with Kinesis Streams, Hive on EMR, Presto use case and its application, Zeppelin and Jupyter notebook for Big data visualization.
Kinesis: Among Kinesis family, I gave more importance to Kinesis Stream & Kinesis fire hose. Its application use case and best practices while integrating with other AWS services. Key concepts like Kinesis Producer Library, Shards limitation and effective use of it for big data streaming. Kinesis Client Library and Kinesis connector interaction with other AWS services.
DynamoDB: As I have been well equipped on this NoSQL DB service while doing Developer and Solution Architect associate, didn’t spend much time for this service. But it is must to have firm understanding on DB partition concepts, Read Capacity Unit (RCU), Write Capacity Unit (WCU), Local Secondary Index (LSI), Global Secondary Index (GSI), DynamoDB Stream and its use case.
Data Pipeline: Good to know the core concepts of data pipeline and its use cases, in current scenario even though most of the data pipeline action can be performed using AWS Serverless service ‘Lambda’ still it is good to learn & understand about key data nodes, activities of Data pipeline along with Precondition and scheduling concepts.
IoT: Specific to AWS IoT (Internet of Things) architecture, how each component plays an important role in receiving/processing messages from fleet of things connected to Internet. Understand the AWS IoT Components Device Gateway, message transmission protocol (http, web socket or MQTT), TLS & Authentication certificates, Rule Engine and its actions. Have good understanding on AWS IoT specific key feature i.e., device registry & Device shadow.
Machine Learning: High level concept of AWS machine Learning Service (Supervised learning) and get to know about what is Binary, Regression & Classification training model of AWS ML and how to read the prediction outcome.
ElasticSearch: The components of ElasticSearch & how well it can interact with Kibana, log stash and Kinesis stream for Big Data click-stream analysis.
Preparation Methods & My Experience with it:
AWS official Big Data Blogs, AWS Summit/Re:Invent videos from AWS Channel in YouTube helps a lot in grasping the scenario based questions better.
Here given my YouTube playlist referenced for better understanding of Services. If you want to watch only one session related to AWS Big Data, that should be this - Big Data Architectural Patterns and Best Practices presented by Mr. Siva Raghupathy - AWS Big Data Architect from Amazon Web Service. This session helped me a lot in answering scenario-based question to choose the best service, which can fit in for a given scenario.
Listed AWS Big data blogs, which I referenced for Specialty Exam (Categorized as Must to Read, Good to Know & Low Priority blog post). No need to go deep at code level for all blog post (almost 200 technical blogs specific to big data services). I worked out selected few blog posts and practiced it with free tier AWS account and walk-thru quickly (on average 15min per blog) on all the blog post and got the point on how the solution has been architected for the use case outlined and understood the reason behind AWS service selection (to Collect, Store, Process, Analyze and Visualize the data), which fit in best.
Don’t constraint reading AWS blog for certification alone, AWS is evolving in such a phase that if we missed to update ourselves on AWS for 3 months period we would have been outdated in AWS. For me, “The morning newspaper” is Jeff Barr’s Blog, can’t miss it from keeping myself informed on happenings in AWS and its service features.
Nothing beats the learning, by making your hands dirty with AWS Services. Utilized AWS 1 year free-tier account well and qwiklab credits, which is the best way to play around services. Don't forget to set cloud watch alarms with billing limits. After practice, delete the cluster and cleanup the service to avoid any unused charges and as a best practice, don't store the access key in Github or on EC2 root with open security group of 0.0.0.0.
Finally, endless evening hours and sacrificed leisure weekends worked my way bottom up and made it really worth.
If you ask me to mention secret sauce that lead to successful completion of AWS Solution Architect (Associate) Specialty Certification, then it is Mr. Sanjay and his course from you tuber is my Cloud Guru’s, thanks to him for coming up with such a wonderful platform to learn AWS in every possible way.
Happy Learning!!!
Cheers,
Amit Kumar
My Email id - [email protected]