DAfR - Data Architect for Real {2}

DAfR - Data Architect for Real {2}

~ 1

Don't let complexity lead you to complications

Complexity is a state of being confusing or complicated, is an unavoidable reality of data management (complex by nature, but it doesn't need to be complicated)

  • Complexity is an inherent characteristic of something
  • Complication is a result of some actions taken or not taken

Things are complicated because we complicate them, so we can design architecture to handle data management complexities without making architecture complicated

~ 2

Think about scaling from the start

Modern applications must be able to scale (up and down) to meet the needs of a business's customers; this is true for all businesses and all applications

As an enterprise company, you want to enable business units to act on their own

Business units shouldn't rely on a central team to provision the environment, databases, and tools they need

Talking about cloud, you can start by provisioning the platform with only the services you require and extend the platform as you onboard new use cases (cost efficiency)

~ 3

A Data Lake Store is great for storing data, providing benefits like speeding up data load / reload, and lowering costs

There is no definitive guide to building a data lake and each scenario is unique in terms of ingestion, processing, consumption and governance but take the time to plan and design your Data Lake?

The way in which you name and structure your data will determine how easy it will be to use it later

I always encourage everyone to think about the desired structure they would like to work with

~ 4

Talking about data science, be careful with what you consider data

Data must be relevant and clean and make sure to answer this question: is there any bias in the data??

As data professionals we know that our data sample need to be statistically significant

Data bias in analytical models can impact their accuracy: correcting this bias throughout the data life cycle can also improve diversity and inclusion

~ 5

Data is at the heart of everything and becoming data-driven (using data at scale) remains a top priority for most organizations

Significant barriers are legacy and tightly interconnected systems, centralized monolithic platforms, complex governance?

The big shift with data mesh, that is gaining a lot of traction, is in managing data as a set of products, not as a collection of processes and pipelines: a democratized approach to managing data where various domains operationalize their own data?

Architecturally data mesh is a shift from enterprise data management to domain data management with enterprise collaboration

No alt text provided for this image


Data mash and data domains are interesting concepts to explore. Every time we distribute data though, we need to carefully consider latency implications that we're inherently introducing during data propagation and consumption activities. Data model design and well-known tradeoffs (normalization vs duplication, etc.) will always be front and center! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了