2022 Data Analytics Predictions

2022 Data Analytics Predictions

We at 9 friendly white rabbits consult customers in data & analytics questions and exchange with like-minded people. As such we hear and see a lot what is currently being implemented in the data & analytics space. With that we dare to look into our crystal sphere for 2022 developments.

Data First

Businesses operate a number of IT subsystems, each storing and transforming their own production data and exchanging it with the other subsystems. DWH's come to the rescue to break up the resulting data silos by copying production data into a common store and building data models on top of this to allow for holistic insights on the business model.

Recently, more customers are coming to us, creating products where insights and predictions based on holistic data are at the core. Here we find our selves in the position of developing the DWH core data model before the first part of the product is built.

Thinking this further it may lead to products whose components exchange data real time with services busses like Kafka (which also ingests the data into the DWH) and then query the DWH directly for accessing base data, insights or predictions. This means unifying transactional and analytics databases (OLTP and OLAP) and newcomers like firebolt or Materialize may in future provide the technology for this.

Data Ops manifestations

Solid development processes using source control systems and deployment strategies become standard. Looking around in our network we assume that at least half of the community is already using tools like dbt or dataform to build and version their data models. But also setting up cloud environments and data governance with code, e.g. in terraform, becomes more important. Orchestrating hundreds of integration and transformation task together with governing access for hundreds of users should not be dependent on a data engineer ticking the right check boxes in a UI for each of them. Tools like Talend, Fivetran, Stitch and the like need to adapt.

Data Mesh / Data Fabric

Recently, most DWH providers have introduced capabilities for federated queries, allowing to use tables from distant databases in local queries (e.g. joining a table in your DWH with table from your companies mysql production server). In the past this has been avoided for the obvious performance impact, but advances in technology make it feasible.

Data Meshes and Data Fabrics are two not too sharply defined and mostly overlapping concepts (read here on the differences) which allow holistic data models on distributed physical data sources from different departments in your organisation. Although it needs a technical basis (e.g. Trino/starburst or Bigquery with federated queries) we don't see these concepts as a technological innovation, though.

In fact, the real benefit is organisational, allowing different departments to run and own their (siloed) data infrastructure allowing them to adapt it to new needs quickly, while agreeing with the whole organisation on a set of currency metrics / business objects which are provided to a central data mesh/fabric in a documented and stable schema. This allows central data teams to run company wide analytics workloads without being an expert and being responsible for all source departments.

New Data Roles

With more business teams using data in their every day routines, roles shift. While product people will generate more and more insights on their own with self-analytics, the data analyst will concentrate on deep dives or even merge with the data scientist to do modelling. A new role of the analytics engineer emerges (or splitt of from the analyst) providing reporting layers implementing business logic and making sure that self-analytics actually returns meaningful results. dbt (see above) will be their Swiss knife. The role of the data engineer pulls back to providing data up to the core models only.

Semi-/Unstructured Data

Bigquery recently announced it will now support JSON as a data type, allowing queries with semi structured data where the schema of the data is not known before hand or at least isn't specified. Other DWHs support this already. Many people including us built data models based on purely structured data, but given the effort adapting to changes in a schema when working with an agile product team it's probably something we will be using in future.

Experimenting with ML

This isn't really something new, but will become reality for even more data projects in 2022. When having your data in a DWH, you are usually just a few clicks away from ML tools provided by the ecosystem of your DWH.

The biggest challenge for ML projects is still mostly that the outcome is not predictable. You usually don't know which model will work, if the data provides the features to predict the label and if the number of observations is sufficient.

But if you are anyway preparing data in the DWH to feed some heuristic automations it doesn't hurt to use the already set up tools to experiment with some scoring predictions that can support your automations.

GDPR vs Analytics

Admittedly, this one is more a wish than a prediction. Recently, courts and authorities in particular in Germany are interpreting the GDPR more strictly making analytics less accurate and meaningful. Unfortunately, the measures applied are vastly independent on the actual level of privacy invasion that the data processings show. Indeed, 1st party analytics, if implemented sensible, has basically no invasive effect on user privacy, but still has to follow the same rules as for example processing contact or order data.

On the other hand the sheer abundance of consent requests give users the impression of being very selective with their data even if they would actually consent to all relevant processing as those are usually connected to a transactions where it is common to share everything that is "needed" for the transaction.

My wish is here that the rulings become more reasonable with respect to 1st party analytics. Other countries (i.e. Italy) are already ahead here.

Mustafa Nemat Ali

B2B SaaS Sales | Support | Success

2 年

Thanks for sharing

要查看或添加评论,请登录

社区洞察

其他会员也浏览了