How to Implement Authentication Across AWS Services and EMR Kerberos

How to Implement Authentication Across AWS Services and EMR Kerberos

Authentication — verifying the identity of a human or machine user — is the foundation of any successful data governance program. Often, other components of governance, like authorization and access management, get more attention, but neither is effective without proper user authentication first. Does it matter how fine-grained your access controls are if you don’t know who the user is? Not really. In reality, you have to have authentication in place before authorization.

It’s also important not to conflate authentication with authorization, a common mistake. While authentication is the process of verifying identity, authorization involves granting access to data and services based on that identity, associated roles, and other factors.

In the real world, authenticating your identity is sometimes as simple as flashing your driver’s license. Things get more complicated when it comes to authenticating users seeking access to cloud-based data and analytics services. Amazon Web Services, in particular, poses a number of authentication challenges. Namely, AWS offers a number of different authentication methods depending on which of its services you are using. It’s not uncommon for enterprises to use a number of different AWS services simultaneously, making applying consistent authentication across these services non-trivial.

So how do you go about establishing effective user authentication on AWS? First, let’s take a quick look at the key concepts you need to understand to implement effective authentication for AWS: Identity and Access Management (IAM) User, IAM Role, and Security Token Service (STS). Most work with AWS Federated Services, which itself uses SAML and Kerberos (for EMR Spark, Hive, etc). We’ll keep these brief, as AWS has much more detailed information on each should you want to dig deeper.

IAM User

An IAM user represents either a human or application user that you create. It consists of names and credentials. These could be (1) username and password for the AWS management console or (2) a combination of access key ID and secret access key when using the API for code or (3) a command prompt when using the AWS CLI or AWS PowerShell tools.

IAM Role

An IAM role is an identity to which you assign permissions, which can then be adopted by users. It is especially helpful in enabling users to communicate with or access one AWS service from another. Imagine that you want enable users to access S3 from EC2. In AWS, since you can’t directly assign policies to a service, you must create a role whose permission includes access to S3, then assign that role to EC2. Take a look at thisexample to understand how exactly it works.

Security Token Service

The AWS Security Token Service (STS) is a web service that enables you to request temporary, limited-privilege credentials for AWS Identity and Access Management (IAM) users or for users that you authenticate through indentity federation (see below). The best usecase of STS is where you have identity federation, delegation, cross-account access, and IAM roles.

Federated Services

Federation Services allows you to centrally manage access to AWS resources using a single sign-on tool or using your enterprise directory. Identity and security information is exchanged between the application and identity provider using services like SAML. You can dig into the architecture details and a guide to implementing AWS Federated Authentication with Active Directory Federation Services (AD FS) here.

SAML

Security Assertion Markup Language 2.0 (SAML) is a standard to provide user identities for authentication and authorization. Imagine logging into one system using your username and password, which then authenticates you across various other applications and services. With SAML, you type your password or provide login credentials only once but gain access to multiple services. More on SAML from AWS here.

Kerberos

Keberos is a network authentication protocol that uses the concept of principal, which is a unique identity within the Kerberos protocol. Kerberos provides enhanced security as no credentials are sent over the network in unencrypted formats. There is a concept of realm and Kerberos Distribution Center (KDC) that provides the means for principals to authenticate. As part of Amazon EMR, Kerberos plays a key role authenticating users logged into the EC2 instance and provides security when users try to submit remote jobs to YARN or try to access services like HiveServer2 remotely.

Below is handy chart identifying various authentication methods and concepts as applied to a number of popular AWS data services. If you’re looking for a service that is not listed, reach out to me for more details.


How to Implement Kerberos in Amazon EMR

Now that we’ve laid out the important concepts to authentication in AWS, let’s walk through an actual use case. Let’s consider a hypothetical financial services company that needs to analyze customer and transaction data to help develop more personalized services for its customers and to detect fraud. It decides to use Amazon EMR and needs to support a number of different types of users, from data scientists to regular business users, across departments. Here’s how it would implement Kerberos in Amazon EMR.

(1) Log in to AWS, search for and click EMR, and create a new cluster.

(2) Create security configurations: ns-emr-kdc.

Complete demo:



要查看或添加评论,请登录

Neeraj S.的更多文章

社区洞察

其他会员也浏览了