AWS AI Practitioner - Preparation / Last Minute Revision Sheet
Hardik Joshi ??
Principal Consultant / Senior Engineering Manager- SaaS, Cloud ?? ??? Technical Program Manager ? AWS SA Professional? ? Azure Solutions Architect Expert ??? ? Cloud Evangelist
Are you ready to give the AWS AI Practitioner Certification Exam?
Below is the list of all the material and prep notes that helped me pass the exam.
Hope it will be helpful to you.
KEY NOTES: PLEASE READ FIRST
?
Domain Level Revision below From the Course on AWS Skill Builder
Domain 1: Fundamentals of AI and ML?
Known Data -> Features -> Algorithm -> Output
Adjustments
Inference
ML models can be trained on various types of data.
Structured data on RDS, S3 or Redshift
S3 is primary source of training data
Semi-structures = DynamoDB & DocumentDB
Unstructured data - tokenization
Timeseries - sequential data
Model Training - Algorithm
Inference 2 options
? - Real time
???????? Low Latency
??????? High throughput
??????? persistent endpoint
- - Batch Transform
?????? Offline
?????? Large datasets
?????? Infrequent use
?
ML Types
? Supervised Learning
??????Amazon Sagemaker GroundTruth -> Amazon Mechanical Turk
? Unsupervised Learning
? Reinforcement Learning
????? Reward - AWS DeepRacer
?
Overfitting
? Model does well on training data but not outside it
Underfitting
? Model cannot determine meaningful results. It gives negative results for training data and new inputs
Bias and fairness
? Diversity of training data
? Feature importance
? Fairness constraints
Deep Learning
? Neural Networks
? Input Layer -> Hidden Layers -> Output Layer
Machine Learning vs Deep Learning
Consider alternatives when
? Costs outweigh the benefits
? Models cannot meet the interpretability requirements
? Systems must be deterministic rather than probabilistic
ML Models are probabilistic
?
Supervised learning -
??Classification
???? Binary????????? - Diabetic or not diabetic
???? MultiClass
? Regression
????? Simple Linear regression
????? Multiple Linear regression
????? Logistic regression
Unsupervised Learning
? Clustering
???? Define features
???? Similarity function
???? Number of clusters
? Anomaly detection
????? Data points that diverge
???
?
Amazon Rekognition
?? Facial comparison and analysis
?? Text detection
?? Object detection and labelling
?? Content moderation
?? Can find out explicit text from images and videos
?
Amazon Textract
? Extract text from scanned documents
?
Amazon Comprehend
? Extract key phrases, entities and sentiment.
? Main is finding PII data
?
Amazon Lex
?? Conversational voice and text
?
Amazon Transcribe
?? Converts speech to text
?
Amazon Polly
?? Converts Text to speech
?
Amazon Kendra
???Intelligent document search
?
Amazon Personalize
?? Personalized product recommendations
?
Amazon Translate
? Translates between 75 languages
?
Amazon Forecast
?? Predicts future points in time-series data
?
Amazon Fraud Detector
?? Detects fraud and fraudulent activities
?
Amazon Bedrock
?
Amazon Sagemaker
?
ML Pipeline
Identify Business Goal -> Frame ML Problem -> Collect Data -> Pre-process Data -> Engineer Features -> Train, Tune Evaluate -> Deploy -> Monitor
?
Collect Data
?? AWS Glue -
??????Cloud optimized ETL service
????? Contains its own data catalog
????? Built in transformations
? AWS Glue DataBrew
????? Point and click data transformation
????? 200+ transformations
? AWS SageMaker Ground Truth
?????Uses ML to label your training data
???? Can automatically label
AWS SageMaker Canvas
???? Import, Prepare, Transform, Visualize and analyze
AWS Sagemaker Feature Store
???? Processes raw data into features by using a processing workflow
Amazon Sagemaker Experiments
???? visual interface
Amazon Sagemaker automatic model tuning
?
Deploy
???? Batch inference
???? Real-time inference
???? Self-managed
???? Hosted
?
Amazon Sagemaker inference
??? Batch Transform
?????????????? Offline inference
?????????????? Large datasets
?? Asynchronous
?????????????? Long processing times
?????????????? Large payloads
?? Serverless
?????????????? Intermittent traffic
?????????????? Periods of no traffic
?? Real-time
?????????????? Live predictions
?????????????? Sustained traffic
?????????????? Low latency
?????????????? Consistent
?
Monitor the model
?????????????Configure alerts to notify and initiate actions if any drift
???????????? data drift / concept drift
?
Amazon Sagemaker Model Monitor
?
MLOps
????? Amazon SageMaker Model Building Pipelines
????? Repository Options
???????????? AWS Codecommit
???????????? AWS Sagemaker feature store
???????????? AWS Sagemaker model registry
??????????? 3rd party repository
????? Orchestration options
???????????? Amazon Sagemaker pipelines
???????????? Amazon managed workflows for apache airflow
???????????? AWS Step functions
?
Accuracy = (True Positives + Ture Negatives) / Total
Precision = True Positives / (True Positivies + False Positives)
Recall = True Positives / (True Positives + False Negatives)
F1 = Precision Recall 2 / (Precision + Recall)
False Positive Rate FPR = False Positives / (True Negatives + False Positives)
True Negative Rate = True Negatives / (True Negatives + False Positives)
Area Under Curve - AUC
Regression Model Errors
????? Mean Squared Error
?????? Root mean squared error
?????? Mean absolute error
?
?
Domain 2: Fundamentals of Generative AI
?
AI - ML - DL - GAI
Model
In-context learning
Prompts, prompt tuning, prompt engineering
Every NLP has a tokenizer which converts texts into token ID's.
Vector - ordered list of numbers.
Ability to encode related relationships and collect associations
Embeddings
Numerical vectorized representations of type that capture the semantic meaning of the token
Self-attention
?
LLMs
Deep learning foundation models
Transformers
Unimodal or multimodal
Multimodal use cases
Multimodal tasks
Diffusion Models
Forward Diffusion
Reverse Diffusion
Stable Diffusion
Does not use pixel space of the image, uses a reduced-definition latent space
?
SageMaker + Amazon Q Developer
Amazon Nimble studio and amazon samarian
?
Gen AI Architectures
Generative Adversarial Networks GANs
Variational autoencoders VAE
Transformers
?
AI Project lifecycle
Identify User case
Experiment and select
Adapt, align and augment
Evaluate
Deploy and integrate
Monitor
?
Interpretability
Intrinsic analysis
Post hoc analysis
?
ML outputs are deterministic
Gen AI outputs are non-deterministic
?
Gen AI Performance metrics
Recall - Oriented Understudy for Gisting Evaluation (ROUGE)
Bilingual Evaluation Understudy (BLEU)
?
Transfer learning
?
SageMaker JumpStart
?
?
Domain 3: Applications of Foundation Models
?
Considerations
Architecture
Complexity
Availability
Compatibility
Explainability
Interpretability
?
Inference
It is the process of generating an output from an input that you provided to the model.
Input = Prompt and inference parameters
Randomness and Diversity
Temperature? (Lower value = high probability outputs and Higher value = Low probability outputs)
Top K (Lower value = decrease the size of pool)
Top P
Length
Response Length
Penalties
Stop sequences
Prompt
A specific set of inputs to guide LLMs to generate an appropriate output or completion
RAG - Retrieval Augmented Generation (RAG)
Prompt enrichment and appending external data to your prompt
Vector Database
Collection of data stored as mathematical representations
?
AWS Services for Vector search databases
Amazon OpenSearch Service
Amazon OpenSearch Serverless
Amazon Aurora PostgreSQL
Amazon RDS PostgreSQL
Amazon Aurora
Amazon Neptune
Amazon DocumentDB [with MongoDB compatibility]
?
Amazon Bedrock AGENTS
Orchestrate prompt completion workflows
?
领英推荐
Prompt
Zero shot prompting
Few shot prompting
Prompt Template
Chain-of-thought prompting
Prompt tuning
?
Latent space
The encoded knowledge of language in LLMs or the stored patterns of data that capture relationships and reconstruct the language from the patterns when prompted
Statistical database
?
Prompt Engineering risks and limitations
Exposure
Prompt Injection
Jailbreaking
Hijacking
Poisoning
?
Training process for foundation models
Pretraining???????? - Self supervised learning
Fine-tuning??????? - Supervised learning??????????? :: Catastrophic forgetting
Continuous pre-training
?
Fine-tuning techniques
Parameter-efficient fine-tuning (PEFT)
Low-Rank Adaptation (LoRA)
Representation fine-tuning (ReFT)
Multitask fine-tuning
Domain adaption fine-tuning
Reinforcement learning from human feedback (RLHF)
?
Data preparation fine-tuning
Prepare your training data
Select prompts
Calculate loss
Update weights
Define evaluation steps
?
Data preparation AWS Services
Amazon SageMaker Canvas
Open-source frameworks
Amazon Sagemaker studio - integration with EMR, can use jupyter labs
Amazon Glue
Amazon SageMaker Feature Store
Amazon SageMaker Clarify? -- if you have bias in your data
Amazon SageMaker Ground Truth? -- manage data labelling
?
Model performance
One option to reduce inference latency is to decrease the size of LLMs but might decrease its performance
?
Gen AI Performance Metrics
Recall Oriented Understudy for Gisting Evaluation (ROUGE)
Automatic summarization tasks
Machine translation software
Bilingual Evaluation Understudy (BLEU)
Used for translation tasks
General Language Understanding Evaluation (GLUE)
Compare against benchmarks set by the experts
Access model generalization across multiple tasks
Holistic Evaluation of Language Models (HELM)
Help improve model transparency
Massive Multitask Language Understanding (MMLU)
Evaluates knowledge and problem solving capabilities of the model
Tested against history, mathematics, laws, computer science and more
Beyond the Imitation Game Benchmark (BIG-bench)
Focuses on tasks that are beyond the capabilities of the current language models
?
AWS Services for model evaluation
Amazon SageMaker JumpStart
Amazon SageMaker Clarify
?
Review these materials to learn more about the topics covered in this exam domain:?
?
?
?Domain 4: Guidelines for Responsible AI
?
Responsible AI
Fairness
Explainability
Robustness
Privacy and security
Governance
Transparency
?
Effects of bias and variance
Demographic disparities
Inaccuracy
Overfitting
Underfitting
User Trust
?
Responsible datasets
Inclusivity
Diversity
Balanced datasets
Privacy protection
Consent and transparency
Regular audits
?
Responsible practices
Environmental considerations
Sustainability
Transparency
Accountability
Stakeholder engagement
?
AWS service for this
Amazon SageMaker Clarify
Detect bias
Explainability
SageMaker Processing jobs
?
SageMaker pre-training bias analysis
Class imbalance
Label imbalance
Demographic disparity
Difference in positive proportions
Specificity difference
Recall difference
Accuracy difference
Treatment equality
?
Gen AI Risks
Hallucinations
Intellectual Property
Bias
Toxicity
Data privacy
?
Guardrails for Amazon Bedrock
Hate
Insults
Sexual
Violence
+ Denied topics
?
Model transparency
Interpretability?? - Deep analysis
Explainability????? - black box analysis
?
AI Service Card
Amazon SageMaker Model Cards
Sagemaker provides
Feature attributions - SHAP Values
Partial dependence plots
Amazon Augmented AI (A2I) - send data to human reviewers to review random predictions.
Use your own reviewers or use mechanical turf
?
?
Domain 5: Security, Compliance, and Governance for AI Solutions?
?IAM Identity Center
Workforce users, Workforce identities
Logging with CloudTrail
Captures API calls and related events
Integrated with SageMaker
Amazon SageMaker Role Manager
Preconfigured permissions for 12 activities
?
Encryption at rest
Amazon SageMaker
Data is encrypted by default on ML storage volumes
Notebook instances, SageMaker jobs, and endpoints
?
AWS Key Management Service - KMS
Amazon Macie
Identifies and alerts you to sensitive data
Remove PII during ingestion
?
AI System Vulnerabilities
Training Data
Input Data
Output Data
Models
Inversion
Theft
LLM's
Prompt Injection
?
Amazon SageMaker Model Monitor
Capture data
Create a baseline
Define data quality monitoring jobs
Evaluate statistics
?
Amazon SageMaker Model Registry
Amazon SageMaker Model Cards
Amazon SageMaker ML Lineage Tracking
Amazon SageMaker Feature Store
Amazon SageMaker Model Dashboard
?
Emerging AI compliance standards
ISO 42001 and ISO 23894
EU Artificial Intelligence Act
NIST AI Risk Management Framework (RMF)
?
AI Risk Management
Probability of occurrence
Severity of occurrence
?
Algorithmic Accountability Act
Transparency and explainability
Monitor for Bias
?
AWS Audit Manager
Audits AWS usage to assess compliance
Choose a framework
Gen AI
Customer frameworks
Collect evidence and add to audit report
?
Guardrails for Amazon Bedrock
Apply guardrails to any foundation model and agents for Amazon Bedrock
Configure harmful content filtering
Define and disallow denied topics
PII data
?
AWS Config
Continuously monitors and records configurations
AWS Config rules
Conformance packs
Operational best practices for AI and ML
Security best practices for Amazon SageMaker
?
Amazon Inspector
Works at application level
Performs automated security assessments on your applications
?
AWS Trusted Advisor
Provides guidance to help you
Reduce cost
Increase performance
Improve security
?
Data Governance
Curation
Discovery and understanding
Protection
? Define roles
Data steward
Data owner
IT Roles
?
AWS Glue DataBrew for data goverance
Data profiling
Data Lineage
AWS Glue Data Catalog
AWS Glue Data Quality
?
Curation
Data Quality Management
Data Integration
Data Management
Protection
Data Security
Data Compliance
Data Lifecycle management
?
Review these materials to learn more about the topics covered in this exam domain:?
?
GENERAL LINKS - For Revision
What are Transformers in Artificial Intelligence? ->?aws.amazon.com/what-is/transformers-in-artificial-intelligence/
What are Foundation Models? ->?aws.amazon.com/what-is/foundation-models/
What is Artificial Intelligence (AI)? ->?aws.amazon.com/what-is/artificial-intelligence/?
What is Machine Learning? ->?aws.amazon.com/what-is/machine-learning/?
What is Deep Learning? ->?aws.amazon.com/what-is/deep-learning/?
What is Generative AI? ->?aws.amazon.com/what-is/generative-ai/
What’s the Difference Between Supervised and Unsupervised Learning? ->?aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/
Machine Learning Concepts ->?docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-concepts.html
?AWS AI Use Case Explorer ->?aws.amazon.com/machine-learning/ai-use-cases/?use-cases
?What is Amazon SageMaker? ->?docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
?AWS Services - Machine Learning (ML) and Artificial Intelligence (AI) -> docs.aws.amazon.com/whitepapers/latest/aws-overview/machine-learning.html
AWS Deploy Serverless ML ->aws.amazon.com/blogs/machine-learning/deploy-a-serverless-ml-inference-endpoint-of-large-language-models-using-fastapi-aws-lambda-and-aws-cdk/
AWS Sagemaker - API Gateway - AWS Lambda ->?aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/
Inference parameters ->docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html
Inference parameters ->?docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html?icmpid=docs_bedrock_help_panel_playgrounds?
Amazon Bedrock or Amazon SageMaker? ->?docs.aws.amazon.com/decision-guides/latest/bedrock-or-sagemaker/bedrock-or-sagemaker.html?
Choosing a generative AI service ->?docs.aws.amazon.com/decision-guides/latest/generative-ai-on-aws-how-to-choose/guide.html
AWS Bedrock Agents -> aws.amazon.com/bedrock/agents/