2022 Data journey
https://www.pexels.com/@mikael-blomkvist

2022 Data journey

Data technologies momentum should maintain last years rush. Sponsored by Organizations dreaming in make better decisions, boost productivity, and improve services, all supported by good data and meaningful insights coming out of their data stack.

With the digital transformation the paradigm changed. Organizations are moving from a legacy on-prem, siloed and disconnected approach, into a more integrated, modular, decoupled but connected and based on cloud technologies one.?

My perception for the coming months on discussions around data and analytics, should cover from data quality, integration, of course more cloud adoption, and strong operations.?


Organizations still struggle when they fail to recognize that most of the painful problems they face around analytics, are rarely technical but on the quality of the data inside their platforms.

Data Lineage, Data Quality and Master Data Management: Some data projects still start by jumping critical steps aiming for a faster ROI and going straight implementing all sort of architectures, and forgetting to build a robust data catalog and made the bridge between business concepts and the informational models.

We have in the market many tools that enable Organizations to fill this gap (i.e.: Amundsen, DataHub, Unity Catalog, Purview, Colibra, etc.).

Any data project that skip a proper data management approach, will lack on quality, won’t be trustable, will have a low adoption and probably won’t last long.


Analytics presume data integration, knowledge share, added value and collaboration to bound them together, delivering meaningful enterprise data analysis.

Collaboration: Old school siloed models are still commonly in use, and need to change to become more open and collaborative. It’s not only on technology, it’s a mindset.

Sharing objectives, investments and compromise should be more effective and efficient approach to implement a data and analytic architecture.


When we consider data and analytic architectures (kappa or lambda), nowadays we will try to find an integrated and collaborative solution, that ensures quality and avoid the data swamp.

Lakehouse platforms: Usually refers to modern data architectures, integrating data lakes, data warehouses, data stores, enabling unified data governance, with an agile operation to move data between layers, and enhancing all data journey and speeding up the decision process. Will soon make old-school data warehouses obsolete.

Cloud solutions will try to bring bundles at a lower TCO to speed up the adoption of the new scalable data architectures (i.e.: databricks, snowflake, …). A more decoupled solution could be built by using any of the major cloud vendor service catalog.


To have “the new oil” (Clive Humby) inside our platform, we need to acquire and ingest data.

Data Integration: Organizations are ingesting more and more data on all formats, from on-line services, social media, and growing. First it’s important to blueprint the framework that we will use, and after chose a reliable data integration platform to implement and manage the catalog of data pipelines.

For that we have already plenty of solutions to ingest data (ETL / ELT) and perform fully integration (i.e.: Fivetran, Mulesoft, ADF, Data Pipeline...).


Recently I read on a Gartner summit landing page a very strong sentence that keep my attention, ?“(…) harness the incredible power of people”. Indeed, Organizations must democratize and freed the users to consume and analyze data, but without losing control.

Business Intelligence platforms: Users need to be empowered with less complex insightful tools to interpret data stored on data lakes or data hubs, but we must keep track on creativity. Plenty of business intelligence tools exist on the market (i.e.: Power BI, Tableau, …) and some already allow applying ML (AutoML, TinyML) and AI algorithms, and conducting data mining and predictive decisions.


With the cloud advent, most of the points previously mention are only made available on Cloud Providers by BI vendors,that will push us more to the Cloud.

Cloud adoption: The cloud lift and shift movement will continue increasing. Having many Organizations preferring cloud-native analytics solutions to gain competitiveness by streamlined their new analytic platform.

It’s advised not to go in freestyle mode and invent the wheel, but must follow recommended patterns like any of the existing Cloud Adoption Framework (i.e.: , Azure CAF, AWS CAF, Google CAF, IBM CAS, …). Those best practices will help us to avoid pitfalls, made us confident, and speedup the adoption process.


Having everything ready, now it's “just” operate, like operations was that simple.

Operations: We need to define a robust operational framework, cover entire data lifecycle from acquisition (ingest) to consumption (reporting), allowing teams to ease interactions and deploys, improve quality and reduce the cycle time of data and analytics, having a faster time to market.

You have already in the market a set of best practices developed by independent user groups or Organizations that could give you the starting point for it (i.e.: DataKitchen:DataOps, StreamSets:DataOps, MLOps, …)


Ufff, many concepts and even more roles. In the end it will continue to be challenging, sometimes maybe overwhelming but should always be interesting working on data and analytics. Were data stack jobs will continue to be high demanded, but in order to be ready and on the knowledge edge, you need to recurrent update your skills.

Hope we all enjoy the ride.


@Paulo Gon?alves

要查看或添加评论,请登录

Paulo Gon?alves的更多文章

社区洞察

其他会员也浏览了