DAfR - Data Architect for Real {2}
Andrea Benedetti
AI, data and all the things we can do with it @ Microsoft | TedX Speaker | Keynote Speaker
~ 1
Don't let complexity lead you to complications
Complexity is a state of being confusing or complicated, is an unavoidable reality of data management (complex by nature, but it doesn't need to be complicated)
Things are complicated because we complicate them, so we can design architecture to handle data management complexities without making architecture complicated
~ 2
Think about scaling from the start
Modern applications must be able to scale (up and down) to meet the needs of a business's customers; this is true for all businesses and all applications
As an enterprise company, you want to enable business units to act on their own
Business units shouldn't rely on a central team to provision the environment, databases, and tools they need
Talking about cloud, you can start by provisioning the platform with only the services you require and extend the platform as you onboard new use cases (cost efficiency)
~ 3
A Data Lake Store is great for storing data, providing benefits like speeding up data load / reload, and lowering costs
There is no definitive guide to building a data lake and each scenario is unique in terms of ingestion, processing, consumption and governance but take the time to plan and design your Data Lake?
领英推荐
The way in which you name and structure your data will determine how easy it will be to use it later
I always encourage everyone to think about the desired structure they would like to work with
~ 4
Talking about data science, be careful with what you consider data
Data must be relevant and clean and make sure to answer this question: is there any bias in the data??
As data professionals we know that our data sample need to be statistically significant
Data bias in analytical models can impact their accuracy: correcting this bias throughout the data life cycle can also improve diversity and inclusion
~ 5
Data is at the heart of everything and becoming data-driven (using data at scale) remains a top priority for most organizations
Significant barriers are legacy and tightly interconnected systems, centralized monolithic platforms, complex governance?
The big shift with data mesh, that is gaining a lot of traction, is in managing data as a set of products, not as a collection of processes and pipelines: a democratized approach to managing data where various domains operationalize their own data?
Architecturally data mesh is a shift from enterprise data management to domain data management with enterprise collaboration
Data mash and data domains are interesting concepts to explore. Every time we distribute data though, we need to carefully consider latency implications that we're inherently introducing during data propagation and consumption activities. Data model design and well-known tradeoffs (normalization vs duplication, etc.) will always be front and center! ??