Best Practices for Data & Analytics Architecture on AWS

Best Practices for Data & Analytics Architecture on AWS

"Best practice is a procedure that has been shown by research and experience to produce optimal results and that is established or proposed as a standard suitable for widespread adoption" - Merriam-Webster Dictionary


Data, Analytics, Web & Mobile on AWS

Here is how your architecture would look like on AWS if you needed to implement most of the common data, analytics, web and mobile use cases.

No alt text provided for this image

Yes, looks very busy and complex. The good news is most organizations only need to implement part of this architecture for their specific use cases.

So let's get right into it and cover some of the popular use cases and architecture best practices? As you're going through each use case, notice how they all have the same data lake foundation. In other words, regardless of what kind of data you're ingesting, your data lake structure stays the same. This provides a consistent approach for storing, organizing, securing and governing your data, and allows to transform and analyze data from different sources and of different types using common technologies and even the same codebase.


Ingest, process and organize CSV files in near real-time on AWS

This is a straight forward and very popular use case for organizations that have many departments or lines of business with heavy use of spreadsheets. At some point organizations realizes that spending days or weeks creating, combing through and aggregating data from 20, 50 or 100 spreadsheets just to create end-of-month reports is very inefficient. This architecture allows to ingest and organize various spreadsheets into AWS data lake, transform and aggregate data in near real time using Glue jobs and allow organizations to use Athena/SQL queries to explore the data.

No alt text provided for this image


On-going replication of small to medium size Oracle or MS SQL Server databases to AWS data lake

Another popular use case to establish a data warehouse and BI foundation on AWS. This architecture ensures near-realtime replication of data from on-premise database to AWS data lake via DMS (Database Migration Service), provides ETL/ELT capability via Glue jobs and allows data exploration using Athena/SQL.

No alt text provided for this image


Process and organize events in near real-time

Many organizations adopted event based or event sourced architectures for their applications. This use case is appropriate for organizations that need to store and organize events produced by applications in AWS data lake in near real time.

No alt text provided for this image


Run ETL/ELT jobs and publish results to Redshift

Some organizations already have a way of ingesting data into AWS S3, but need a proven way of transforming and loading (ETL) or loading and transforming (ELT) data into RedShift data warehouse.

No alt text provided for this image


And now let's put it all together for a typical medium complexity data platform on AWS with both internal and external data sources

No alt text provided for this image


Conclusion

There are more use cases that organizations are implementing on AWS, while utilizing best practices. They key to successful implementation is choosing the right architectural patterns and technologies for the job.

Jinesh Ranawat

AWS/Azure & Palantir Certified | Staff AI/Data Eng. Tech Lead |Forward Deployed Engineer | Consultant | Freelance Trainer | Mentor | Content Creator | Gen AI & Copilot Enthusiast

1 个月

Which is the tool used to show flows of diagrams

赞
回复

要查看或添加评论,请登录

Igor Royzis的更多文章

社区洞察

其他会员也浏览了