Sensitive Data Detection

Sensitive Data Detection

Classifying a document based on detecting it's Sensitive Data

Intro :

???????A Document can be classified as sensitive not by using Amazon Macie (An AWS Service) and a S3 Bucket. Amazon Macie is a data security service that uses machine learning (ML) and pattern matching to discover and help protect your sensitive data.

Amazon Macie

The Document chosen for analysis is sent to the S3 bucket and with the help of Amazon Macie, we can create jobs that checks and analyses the document in the mentioned S3 Bucket. After the analysis made by Amazon Macie, it shows the severity of the sensitive data. The output severity is classified into three cases.

  • Low
  • Medium
  • High

With the severity rate, we need to take actions accordingly. The Output also contains the category of data present in the given document (For Example: Phone Number , Number Plate , Email Id , Login Credentials , etc.). The Document which has no sensitive data will not be Displayed.

Why Amazon Macie?

Amazon Macie provides automated, specialized data security with pre-built policies and machine learning models, offering seamless integration into AWS services for efficient identification and protection of sensitive data, especially valuable in regulated industries for compliance adherence.

Demo :

Step 1:

??????Create a S3 Bucket or Use An Existing Bucket to import the Document which you need to analyze.

I’ve Created a S3 Bucket here!

Step 2 :

??????Import the file to the S3 bucket.

A text file with random Sensitive Data (Fake)

The Above Text Document is the Document which I've taken for a sample and also I've uploaded a cat picture into the S3 Bucket.

Step 3 :

??????Search Amazon Macie in the AWS Management Console and Click enable Amazon Macie.

Enabled Macie!

Step 4 :

??????Select S3 buckets from the list on your left which provides you the list of S3 Buckets present in your account. Now select your preferred S3 Bucket for analysis and Click “Create Job”.

I have taken sensitive-store Bucket!

Step 5 :

??????Now review the S3 Bucket and click the next option where we need to refine the scope of the job.

Now According to your preference, you may choose the data in the S3 Bucket to be analyzed on a schedule or only one time!

Step 6 :

??????Now you have to select the Managed data Identifiers (i.e) the in-built data identifiers to identify the sensitive data.

There are two options,

  • Recommended
  • Custom

Recommended gives you the standard required sensitivity checking data identifiers.

Custom gives you the 155 custom in-built data identifiers which you can access.

Also added to that, you can create your own custom data Identifier in the next step.

Step 7 :

??????The next step is to Select Custom data Identifiers in which we can define our own sensitive parameters in the Custom Identifier. If you don’t have one, you can create it.

Here in this above Picture, you need the regular expression that defines the pattern to match. Regular Expression can be searched on Google search.

Now we have created a Custom Data Identifier!

Step 8 :

??????Review all the other details and click next till the end and the job will be created successfully!

Now the status should be showing “Active(Running)”, the Job is fully available when the status turns into “Complete”. It takes almost 20-30 min for the “complete” status!

The Job is Complete Now!

Step 9 :

??????Now select Findings from the sidebar and it will show you the output.

Here the severity of the document is ”Medium” and says that the document has personal information (PII).

Note : Here Macie didn’t consider the cat image as a sensitive document. So it is not mentioned in the findings. Macie found the document sensitive.txt as a sensitive content, so it displayed it.

Step 10 :

??????By clicking the “SensitiveData:S3Object/Personal” you can view the output in detail.

Architecture Diagram:?

Further Improvements :

??????We can further improve the project by attaching an EventBridge and a Simple Notification Service (SNS) to the existing architecture. By doing this we can get notified by an email whenever a sensitive document is uploaded into the S3 bucket. This provides a high level of security.


Thank you, Happy Learning!

Manmeet Singh Bhatti

Founder Director @Advance Engineers | Zillion Telesoft | FarmFresh4You |Author | TEDx Speaker |Life Coach | Farmer

1 年

Excited to dive into this article on Amazon Macie and sensitive data detection! ?? Your insights are always invaluable. #alwayslearning

Excited to dive into this! Thanks for sharing such detailed insights. ?? KAUSHIK J

Exciting use cases and benefits of Amazon Macie shared in a comprehensive way! ??

Raghul Gopal

Data Science at Logitech | AWS Community Builder ??(ML & GenAI) | Talks and Writes about AI, AGI & Cloud Deployments of AI & AGI | Public Speaker ??| Blogger ??| Unlocking Data Secrets with Math & AI ????

1 年

Superb stuff KAUSHIK J

要查看或添加评论,请登录

KAUSHIK J的更多文章

  • Getting Started with Amazon Bedrock

    Getting Started with Amazon Bedrock

    Amazon Bedrock: It is a fully managed service that offers a choice of high-performing foundation models (FMs) from…

    12 条评论
  • My Cloud Journey

    My Cloud Journey

    Hello LinkedIn fam, The weather is pleasant out here and hope you are all in great health! It's been 3 months since I…

    12 条评论

社区洞察

其他会员也浏览了