GenAI SAB - Secure Application Build Powered by AWS
Kathirvelan Ganesan
Product Head, GenAI, AI-ML, Conversational Intelligence, AI.Cloud, TCS
Opening thoughts
Generative AI (GenAI) is a rapidly evolving technology that has led to a significant transformation in organizations that have adopted it. As GenAI models become more sophisticated and mainstream, they also bring in concerns about various societal, ethical, technical and security considerations that needs to be addressed when a GenAI application is implemented in production. Organizations need to be aware of and mitigate risks that can arise through the implementation of Generative AI applications. This would entail the organization to have a well-defined strategy that covers all the layers of Generative AI Application architecture.
GenAI Application - Security Considerations
About 39% of enterprise AI decision makers report that data privacy and security concerns are among the greatest barriers to adoption of GenAI. GenAI drives need for AI governance & security and is one of the top concerns for the CXO in the organization. GenAI applications bring in LLM trained and developed by 3rd parties, implementation of these LLM can potentially open new surfaces that are prone to attacks like Prompt Injection, Training Data Poisoning, Model DDoS, Sensitive Information Disclosure and Model Theft. Since, enterprise and proprietary data are needed to customize and fine-tune LLMs to enable them to provide better results based on the organizations context and domain, Organizations need to have controls in place to ensure that the data used by LLM is protected and not exposed to model providers or other external entities. Organizations are also concerned about the quality of the responses produced by GenAI applications, they want to ensure that the responses are explainable, transparent and free of biases. The GenAI application also needs to comply with the legal regulations laid down by the regulators in the region they operate.
These considerations need to be addressed to enable the GenAI application scale up from pilot to full-scale production implementation. In this blog post, I share my point of view on Secure Application Build across the application layers, significance and challenges in each layer, solutions to address the concerns in each layer, key metrics used to monitor the solutions and leveraging AWS Services augmented by open-source frameworks/tools.
GenAI Secure Application Build
The secure application build encompasses Infrastructure, Network, Application, Data, Model, Guardrails and Responsible AI layers of the cloud environment in which the GenAI application is deployed.
Infrastructure Protection
Significance:
GenAI applications need security policies to ensure that they are protected against unintended and unauthorized access and potential vulnerabilities. Protecting infrastructure from unintended and unauthorized access and potential vulnerabilities will help elevate security posture in the cloud. The most common issue with infrastructure is security misconfiguration, measures need to be in place for Effective threat detection will allow for response to threats faster and learn from security events.
Solution:
To implement infrastructure protection, regularly scan for vulnerabilities to help protect against new threats. Employ vulnerability scanners and endpoint agents to associate systems with known vulnerabilities. Isolate of environments by creating separate development, testing, and production environments to limit exposure in case of an incident. Set up strong identity and access control policies, utilize multi-factor authentication (MFA) for all critical components. Use hardened operating system images and regular patching of OS components. Key metrics for tracking Infrastructure protection are Mean time to detect, mean time to resolve, and Preparedness level.
Key Services:
AWS provides threat detection and investigation through Amazon GuardDuty using machine learning, automated scanning in Amazon Inspector, data classification in Amazon Macie, centralized management in AWS Security Hub, automated response with Amazon EventBridge, and log centralization in AWS Security Lake.
Network Protection
Significance:
GenAI applications require robust network protection through a multi-layered security approach, including strong API authentication, network segmentation, and encrypted communication channels. Implementation of rate limiting, input validation, and continuous monitoring are essential to guard against prompt injection attacks, model extraction attempts, and potential data breaches. The potential challenges are in understanding assets, security risks and develop, maintain, and effectively communicate security roles, responsibilities, accountabilities, policies, processes, and procedures.
Solution:
Network protection can be implemented using security groups, network access control lists, and network firewalls to control traffic. Apply Zero Trust to systems and data in accordance with their value. Leverage virtual private cloud (VPC) endpoints for private connection to cloud resources. Inspect and filter traffic at each layer. Secure internal communication through a VPN and ensure data in transit is encrypted using protocols like TLS (Transport Layer Security) to protect sensitive data from interception. Use throttling, rate limiting, and authentication mechanisms to mitigate Distributed Denial of Service (DDoS) attacks and prevent abuse. Key metrics for tracking network protection are System Availability percentage, Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
Key Services:
AWS Control Tower provides centralized and automated compliance, security, and governance capabilities across AWS environments through integration with services like AWS Organizations, AWS Config, Amazon Security Hub, AWS CloudTrail, and Amazon GuardDuty while AWS Audit Manager further optimizes evidence collection and reporting for audits. Capabilities of audit manager like just-in-time temporary elevated access, chaos engineering, shift left testing, and leveraging AI/ML can further improve security, resiliency and compliance processes; and new services like AWS Resilience Hub, and CloudTrail Lake centralize assessments, unify security layers, and ingest external audit data respectively to enhance security and compliance postures. Use Amazon VPC for network Segmentation.? NAT gateway for instances in a private subnet to connect to services outside VPC.
Data Protection
Significance:
Data protection is crucial in preventing misuse, maintaining privacy, and to ensure the responsible development and deployment of GenAI applications. The major challenges lie in ensuring that the enterprise proprietary data is not exposed to third party and model providers, Understanding and adhering to data privacy regulations, such as GDPR, HIPAA, CCPA depending on the nature of the data.
Solution:
To implement data protection, Collect and retain only the minimum amount of data necessary for GenAI application's purpose. Avoid collecting sensitive or PII unless required. Before using sensitive data in training, employ techniques like data masking or anonymization to remove personal identifiers. Embedded data stored in the vector database is encrypted at rest and in transit. Protect data pipelines with industry-standard protocols and algorithms. Implement data governance through proper access controls, encryption, and auditing for data repositories. Document controls into a comprehensive control framework and establish demonstrable security and privacy controls that meet those objectives. Follow data minimization principles, ensuring that only necessary data is collected and retained. This reduces exposure and the risk of leaks. Implement mechanisms to monitor, evaluate, manage, and improve the effectiveness of the security and privacy programs. Application of strict role-based access control (RBAC) to data storage systems, ensuring that only authorized users and services can access or modify data. Key metrics to be monitored are PII detection and masking effectiveness, data retention compliance, Access logging coverage and Data Anonymization levels.
Key Services:
AWS Audit Manager is used to map compliance requirements to AWS usage data with prebuilt and custom frameworks and automated evidence collection. AWS Config for continuous assessments, audits, and evaluation of configurations and relationships of resources on AWS. AWS Artifact is a central resource for compliance-related information. It provides on-demand access to security and compliance reports from AWS. AWS KMS keys can be used to encrypt data at rest. AWS IAM for defining roles and policies for data access, data classification in Amazon Macie, AWS Glue for PII Data redaction, Amazon API Gateway and Amazon Cognito for Authentication and authorization. Amazon DataZone to manage and govern access to data using fine-grained controls. AWS?Lake Formation?centralizes permissions management of data and makes it easier to share across the organization. AWS Clean Rooms to securely analyze and collaborate on collective datasets, without sharing or copying underlying data.
Application Protection
Significance:
GenAI application protection will have similar protection requirements to existing enterprise applications hence existing Application protection controls can be extended to GenAI application. The challenge here is to provide correct level of access to users of the application, ensure the application code is free of security bugs, and ensure proper logging and monitoring is in place for the GenAI application.
Solution:
For Application protection, implement robust authentication mechanisms, Use strong password policies and multi-factor authentication (MFA). Secure APIs and Endpoints by implementing rate limiting, input validation, and access controls. Security is an ongoing process. Implement mechanisms to assess and improve the security posture of GenAI application through static and dynamic code analysis tools. Mechanisms to validate and sanitize all inputs to the application to prevent injection attacks, such as prompt injection. Perform security assessments, including penetration testing and vulnerability scanning, to identify and address security weaknesses. Implement distributed tracing to track a request's journey by creating a connected trail of spans with unique trace IDs, enabling performance monitoring and debugging across the entire pipeline. Key metrics for tracking Application protection are CPU/GPU usage patterns, memory consumption trends, Alert response time, Latency, Hallucination detection rate.
Key Services:
Usage of GenAI application integrations with AWS IAM Identity Center for centralized access management, IAM Access Analyzer features like policy validation and access previews to restrict permissions, Amazon API Gateway with built-in throttling, rate limiting, and authentication mechanisms to mitigate Distributed Denial of Service (DDoS) attacks. AWS Directory Service for managing Active Directory, Amazon Verified Permissions for fine-grained authorization. AWS WAF enables creation of security rules that control traffic and block common attack patterns such as SQL injection or cross-site scripting (XSS). Amazon Q developer enables the developers to automate coding as well as check code for security vulnerabilities and remediate them during the development phase. Securing the entire AI development and deployment pipeline is essential to protect both data and models, build secure pipelines using AWS tools like AWS CodePipeline, AWS CodeBuild, and Amazon SageMaker. Implement CI/CD best practices and securely manage model versions and deployments to ensure end-to-end protection. Use Amazon CloudWatch for centralized logging with dashboards and automated alerting. Amazon CloudTrail for application audit trail. Amazon SQS for creation of dead letter queues and Amazon SNS for notification for pre-configured application events.
Model Protection
Significance:
Model Protection is the new facet of GenAI application development process. Model protection involves securing the model and data during the model training and fine-tuning process. Model protection in GenAI faces two primary challenges: preventing unauthorized access and model theft through techniques like model extraction attacks where adversaries can reconstruct model weights through repeated querying and defending against prompt injection attacks where carefully crafted inputs can bypass safety guardrails or extract confidential training data. Additionally, there's the complex balance of implementing robust security measures while maintaining model performance and usability, including protecting against membership inference attacks.
Solution:
To implement model protection, ensure the model artifacts and the model versions are stored and accessed securely. Model inference endpoints are secured. Ensure only authorized users have access to inference results. Implement measures to evaluate and monitor model results to ensure the results are not poisoned. Use model versioning to keep track of model performance metrics, changes, and deployments to ensure auditability. Apply watermarking techniques in generated outputs to trace misuse, model fingerprinting to protect intellectual property from model theft. Train the model with adversarial examples to improve its robustness against potential adversarial attacks. Key metrics for tracking for model protection are prompt injection attempt rate, Data Anonymization success rate, PII Data detection rate in input data set, Precision, Recall, and Model Version Control metrics.
Key Services:
AWS provides SageMaker Clarify that can help with pre-training and post-training bias detection, as well as explanations for LLM results. Amazon Bedrock Model Protection is a set of tools and services that help protect models from unauthorized access and misuse. Amazon Bedrock Model Evaluation provides a simple and intuitive way to assess the performance of models. The models can be trained and fine-tuned in a secure environment by configuring Amazon Virtual Private Network. Automate security checks within the pipeline, ensure model integrity, and securely transition models from development to production environments. Amazon SageMaker Pipelines is a serverless workflow orchestration service purpose-built for MLOps and LLMOps automation, it can be used to automate security checks within the pipeline, ensure model integrity, and securely transition models from development to production environments. It offers scalability and flexibility, enabling the training and deployment of GenAI models on a variety of configurations and cloud resources.
?
Guardrails
Significance:
GenAI applications bring in a new set of risks, as the enterprises are using a pre-trained LLM that has been trained by third party model providers and customized with enterprise data. Guardrails are technical measures and safety mechanisms implemented to ensure that an application behaves within predefined limits. They are rules or constraints designed to prevent undesirable outcomes or restrict GenAI capabilities to ensure safe operation. Guardrails ensure that the model does not hallucinate, provide results that have foul language, and perpetuate biases. They are short-term safety measures that prevent GenAI from making dangerous errors in specific scenarios.
Solution:
Implement guardrails with AWS Services or open source libraries to check the input prompts with the aim to detect prompt injections, choice of words in prompts, identify and redact PII data. Implement measures to analyze the model response to detect hallucinations, measures to guard against model specific vulnerabilities like bias, factcheck and verify results produced through RAG, filter out harmful content. Incorporate human-in-the-loop mechanisms for high-risk interactions. In this way, human oversight is available to intervene in scenarios where the AI might generate sensitive or harmful content. Some of the key metrics monitored are Precision, Recall, Faithfulness, Topic adherence, Semantic Similarity.
Key Services/Frameworks:
Amazon Bedrock Guardrails can be configured to filter out harmful content, Topic Filtering, profanity filters, Sensitive information filters and Grounding Checks. Additional guardrails can be implemented using third party providers. Guardrails.AI provides a framework for creating reusable validators to check LLM outputs. NeMo Guardrails is an open-source toolkit that allows adding programmable guardrails, RAGAS framework can be used for checking Factual Correctness of responses. For high-risk outputs develop a workflow using AWS Step Functions to route high-risk outputs to a human reviewer for approval.
Responsible AI
Significance:
GenAI Applications while revolutionizing various industries and offer unique and differentiating capabilities, it also raises concerns about ethical considerations, data privacy, and potential misuse. Responsible AI is crucial for ensuring that GenAI applications are developed and deployed in a manner that prioritizes safety, transparency, accountability, and alignment with human values. Responsible AI involves policies, principles, frameworks, and best practices that guide the entire lifecycle of an GenAI system, from development to deployment. By embracing responsible AI principles, developers and organizations can mitigate risks associated with bias, discrimination, and unintended consequences. ?
Solution:
The key components of Responsible AI framework are Controllability, Safety, Fairness, Explainability, Veracity & Robustness, Transparency, Privacy & Security and Governance. ?Controllability is a crucial aspect of responsible AI development, as it enables the oversight, monitoring, and adjustment of GenAI applications to prevent unintended or harmful outcomes. Prioritizing safety, transparency, and ethical considerations is paramount for the responsible development and deployment of GenAI systems. It involves implementing robust governance frameworks, adhering to ethical guidelines, and fostering collaboration among stakeholders to address complex societal implications. Ensure fairness by auditing the training data for bias and implementing bias mitigation techniques during model training. Responsible AI also emphasizes the importance of explainability, enabling users to understand the decision-making processes of GenAI systems and ensuring they operate within acceptable boundaries. Make the model’s behavior explainable to both developers and end-users. Key Metrics to track are Representation Ratio, Feature Value distribution, Model Interoperability, Context Precision, Context Recall, semantic similarity, accuracy, robustness, and toxicity.
Key Services/Frameworks:
Amazon Bedrock Model Evaluation provides a simple and intuitive way to assess the performance of models, Amazon Comprehend can be used for prompt safety, Amazon Bedrock Guardrails can be configured to filter out harmful content, Topic Filtering, profanity filters, Sensitive information filters and Grounding Checks. Amazon SageMaker Clarify to help with pre-training and post-training bias detection, as well as explanations for LLM results. AWS IAM for defining roles and policies for data access, data classification in Amazon Macie. fmeval is an open source library to evaluate large language models (LLMs) in a range of tasks. It helps evaluate model for task performance and along multiple responsible AI dimensions. Llama Guard is another framework that acts as a gatekeeper, screening both user prompts and LLM outputs for any unsavory content. HuggingFace Evaluator can be used for evaluating model responses for fairness, explainability and robustness.
?
Closing Insights
Generative AI with LLMs has gained a lot of traction with most of the organizations experimenting with them and many of these organizations are now on the path to scale and implement the GenAI Application to production. By focusing on security, privacy, and responsible AI principles, you can ensure that your Generative AI solutions deliver impactful results while maintaining the trust of users and stakeholders. By leveraging AWS's robust infrastructure, security tools, and managed services, the development of secure AI systems can be streamlined, enabling you to focus on innovation without being weighed down by security complexities.
Sr. Engineering Manager @ GlobalLogic |Gen AI adoption | Digital Transformation |
6 天前Kathirvelan Ganesan: This is a very informative article. Are there any comprehensive open-source or commercial security tools you would recommend. Thanks
Cloud Center of Excellence at Pacific Gas & Electric / PG&E
1 周Kathirvelan Ganesan, I'm curious if there's a comprehensive product that combines all these technologies and adapts to specific product requirements.