ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Best Practices for Data & Analytics Architecture on AWS

Igor Royzis

CTO | Software Engineering Leader in Cloud, Data & AI | Scaling Organizations, Driving Innovation, Delivering Results

å‘å¸ƒæ—¥æœŸ: 2021å¹´9æœˆ15æ—¥

"Best practice is a procedure that has been shown by research and experience to produce optimal results and that is established or proposed as a standard suitable for widespread adoption" - Merriam-Webster Dictionary

Data, Analytics, Web & Mobile on AWS

Here is how your architecture would look like on AWS if you needed to implement most of the common data, analytics, web and mobile use cases.

Yes, looks very busy and complex. The good news is most organizations only need to implement part of this architecture for their specific use cases.

So let's get right into it and cover some of the popular use cases and architecture best practices? As you're going through each use case, notice how they all have the same data lake foundation. In other words, regardless of what kind of data you're ingesting, your data lake structure stays the same. This provides a consistent approach for storing, organizing, securing and governing your data, and allows to transform and analyze data from different sources and of different types using common technologies and even the same codebase.

Ingest, process and organize CSV files in near real-time on AWS

This is a straight forward and very popular use case for organizations that have many departments or lines of business with heavy use of spreadsheets. At some point organizations realizes that spending days or weeks creating, combing through and aggregating data from 20, 50 or 100 spreadsheets just to create end-of-month reports is very inefficient. This architecture allows to ingest and organize various spreadsheets into AWS data lake, transform and aggregate data in near real time using Glue jobs and allow organizations to use Athena/SQL queries to explore the data.

On-going replication of small to medium size Oracle or MS SQL Server databases to AWS data lake

Another popular use case to establish a data warehouse and BI foundation on AWS. This architecture ensures near-realtime replication of data from on-premise database to AWS data lake via DMS (Database Migration Service), provides ETL/ELT capability via Glue jobs and allows data exploration using Athena/SQL.

é¢†è‹±æŽ¨è

What is AWS Glue?

Neal K. Davis 2 å¹´å‰

Data Lakehouse Architecture: A Modern Solution for Unified Analytics

Data Lakehouse Architecture: A Modern Solution forâ€¦

Andrew Madson MSc, MBA 8 ä¸ªæœˆå‰

Azure Data Factory

Rohit Singh 5 ä¸ªæœˆå‰

Process and organize events in near real-time

Many organizations adopted event based or event sourced architectures for their applications. This use case is appropriate for organizations that need to store and organize events produced by applications in AWS data lake in near real time.

Run ETL/ELT jobs and publish results to Redshift

Some organizations already have a way of ingesting data into AWS S3, but need a proven way of transforming and loading (ETL) or loading and transforming (ELT) data into RedShift data warehouse.

And now let's put it all together for a typical medium complexity data platform on AWS with both internal and external data sources

Conclusion

There are more use cases that organizations are implementing on AWS, while utilizing best practices. They key to successful implementation is choosing the right architectural patterns and technologies for the job.

Jinesh Ranawat

1 ä¸ªæœˆ

Which is the tool used to show flows of diagrams

èµž

å›žå¤

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Igor Royzisçš„æ›´å¤šæ–‡ç«

From Idea to MVP: Your No-Nonsense Guide to Building What Really Matters

2024å¹´10æœˆ31æ—¥

From Idea to MVP: Your No-Nonsense Guide to Building What Really Matters

Building a Minimum Viable Product (MVP) can feel like a daunting task, especially if youâ€™re a first-time founder. Butâ€¦

3 æ¡è¯„è®º
How Expanding My Vision Revealed the True Value of Fractional CTO to Founders, CEOs, CIOs, and CTOs

2024å¹´9æœˆ18æ—¥

How Expanding My Vision Revealed the True Value of Fractional CTO to Founders, CEOs, CIOs, and CTOs

When I first began offering my services as a Fractional CTO, my primary audience was non-technical startup foundersâ€¦

3 æ¡è¯„è®º
Seven Key Advantages of Partnering with a Fractional CTO

2024å¹´3æœˆ26æ—¥

Seven Key Advantages of Partnering with a Fractional CTO

Hiring a fractional CTO offers significant benefits for startups, especially for those led by founders without a techâ€¦
From Builders to Integrators: Unpacking the GenAI Ecosystem

2024å¹´3æœˆ11æ—¥

From Builders to Integrators: Unpacking the GenAI Ecosystem

So, I've been diving into GenAI, and it's pretty clear it's shaking up the business scene. Organizations big and smallâ€¦

1 æ¡è¯„è®º
Navigating the Tech Startup Landscape: A Comparative Analysis

2024å¹´2æœˆ27æ—¥

Navigating the Tech Startup Landscape: A Comparative Analysis

In the dynamic world of technology startups, the founding team's background plays a crucial role in shaping theâ€¦

3 æ¡è¯„è®º
Transforming Consultations into Commitments: The Power of Immediate Strategy

2024å¹´2æœˆ18æ—¥

Transforming Consultations into Commitments: The Power of Immediate Strategy

In technology consulting, the transition from a warm sales call to a proposal request is more art than science. Myâ€¦
Fractional CTO - Common Misconceptions and Reality

2024å¹´2æœˆ14æ—¥

Fractional CTO - Common Misconceptions and Reality

I recently attended a technology conference, where I engaged in networking and shared insights into my role. When askedâ€¦

1 æ¡è¯„è®º
4 mistakes to avoid when building lakehouse based solutions on AWS

2021å¹´10æœˆ5æ—¥

4 mistakes to avoid when building lakehouse based solutions on AWS

Mistake #1 Dumping all your data in S3 without a well designed data lake partitioning that supports your organization'sâ€¦

1 æ¡è¯„è®º
Database migration to AWS

2021å¹´9æœˆ9æ—¥

Database migration to AWS

History of AWS relational databases â€“ or how we got here Early 2000â€™s, everyone is happily buying servers, installingâ€¦

1 æ¡è¯„è®º
9 things to consider when designing microservices

2017å¹´10æœˆ19æ—¥

9 things to consider when designing microservices

Microservices have become the defacto architectural approach in the last couple of years and for a good reason. I'm notâ€¦

2 æ¡è¯„è®º

See all articles

Best Practices for Data & Analytics Architecture on AWS

Igor Royzis

CTO | Software Engineering Leader in Cloud, Data & AI | Scaling Organizations, Driving Innovation, Delivering Results

Data, Analytics, Web & Mobile on AWS

Ingest, process and organize CSV files in near real-time on AWS

On-going replication of small to medium size Oracle or MS SQL Server databases to AWS data lake

é¢†è‹±æŽ¨è

Process and organize events in near real-time

Run ETL/ELT jobs and publish results to Redshift

And now let's put it all together for a typical medium complexity data platform on AWS with both internal and external data sources

Conclusion

Igor Royzisçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

ETL

Building a Medallion Architecture with EMR Serverless and Apache Iceberg: An Incremental Data Processing Guide with Hands-On Code

Azure Data Factory: A Beginnerâ€™s Guide to Building ETL Pipelines ??

The Databricks Data Lakehouse

Data Engineering Day 5: AWS Glue for ETL

How Customers and Companies Can Use Fully Managed AWS Glue Schema Registry to Store Avro Schemas Managed by AWS

Mastering Parameters and Dynamic Features in Azure Data Factory (ADF)

Tanzu Data in 2025: Optionality of Data Engines, Deployment Flexibility, and Data Strategy

Exploring Data Architecture Design Patterns: An In-Depth Guide

Google Data Fusion aka Google Data Integration (ETL) Service

Data, Analytics, Web & Mobile on AWS

Ingest, process and organize CSV files in near real-time on AWS

On-going replication of small to medium size Oracle or MS SQL Server databases to AWS data lake

é¢†è‹±æŽ¨è

Process and organize events in near real-time

Run ETL/ELT jobs and publish results to Redshift

And now let's put it all together for a typical medium complexity data platform on AWS with both internal and external data sources

Conclusion

Igor Royzisçš„æ›´å¤šæ–‡ç«

From Idea to MVP: Your No-Nonsense Guide to Building What Really Matters

How Expanding My Vision Revealed the True Value of Fractional CTO to Founders, CEOs, CIOs, and CTOs

Seven Key Advantages of Partnering with a Fractional CTO

From Builders to Integrators: Unpacking the GenAI Ecosystem

Navigating the Tech Startup Landscape: A Comparative Analysis

Transforming Consultations into Commitments: The Power of Immediate Strategy

Fractional CTO - Common Misconceptions and Reality

4 mistakes to avoid when building lakehouse based solutions on AWS

Database migration to AWS

9 things to consider when designing microservices

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

ETL

Building a Medallion Architecture with EMR Serverless and Apache Iceberg: An Incremental Data Processing Guide with Hands-On Code

Azure Data Factory: A Beginnerâ€™s Guide to Building ETL Pipelines ??

The Databricks Data Lakehouse

Data Engineering Day 5: AWS Glue for ETL

How Customers and Companies Can Use Fully Managed AWS Glue Schema Registry to Store Avro Schemas Managed by AWS

Mastering Parameters and Dynamic Features in Azure Data Factory (ADF)

Tanzu Data in 2025: Optionality of Data Engines, Deployment Flexibility, and Data Strategy

Exploring Data Architecture Design Patterns: An In-Depth Guide

Google Data Fusion aka Google Data Integration (ETL) Service

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†