9 Predictions for Data in 2023

9 Predictions for Data in 2023

Yesterday, at the?Monte Carlo Impact Summit ?I shared my 9 Predictions for Data in 2023.?Here are the slides ?& I’ve embedded them below.

These are my 9 predictions. A year from now I’ll score them to see how I did.


  1. Cloud data warehouses (CDW) will process 75% of workloads by 2024. In the last five years, CDWs have grown from 20% of the workloads to 50% with on-prem databases constituting the remainder. Meanwhile, the industry has grown from $36b to $80b during that time.
  2. Data workloads will segment by use case into three groups. First, in-memory databases like?DuckDB ?will grow to dominate local analysis even for very large files. CDWs will retain classic BI & exploration uses. Cloud data lakehouses will serve jobs operating on massive data & jobs that don’t require the fastest latency - and do it at half the storage price.
  3. Metrics layers will unify the data stack. Today, there are two different forks in data. The first fork uses ETL to pump data into a CDW then to a BI or data exploration tool. The second fork, which is the machine learning stack, is identical save for the outputs: model serving & model training. The metric layer will become the single place metrics & features are defined unifying the stack & potentially moving model serving & training into the database.
  4. Large language machine learning models will change the role of data engineers. I recorded a video of myself writing code to produce charts in the presentation. The video shows Github Copilot magically creating a chart for the DuckDB star growth. Copilot ingests a comment, writes the code, even adds my custom theme function. When the code is executed, it works. Technologies like this will push data engineering work to a higher plane of abstraction.
  5. WebAssembly or WASM will become an essential part of end-user facing data apps. WASM is a technology that accelerates browser software. Pages load faster, data processing is speedier & users are happier. Every major browser supports WASM & consequently anyone producing a data app for an end user will use it.
  6. Notebooks will win 20% of Excel users. Of the 1b global Excel users, 20% of them will become prosumers, writing Python/SQL to analyze data. They will do it in notebooks like Jupyter, which are easily shared, reproducible & version controlled. Those notebooks will become data apps used by end users inside of companies,?replacing brittle Excel & Google Sheets .
  7. SaaS applications will use the?CDW as a backend for both reading & writing . Today, sales, marketing, & finance data exist in disparate systems. ETL systems use APIs to push that data into the CDW for analysis. In the future, software products will build their apps on top of the CDW to take advantage of centralized security, faster procurement processes, & adjacent data. These systems will also write back to the CDW.
  8. Data Observability becomes a Must Have. Software engineers measure the success of their efforts through up-time. 99.9% or three-nines of up-time means only 1 incident per 1000 hours. Today’s data teams see 70 incidents per 1000 tables. Data teams will align on data uptime/accuracy metrics & drive to the three-nines equivalent, using data observability tools to measure their performance.
  9. The Decade of Data Continues. Data startups raised more than $60b in total in 2021 more than 20% of all venture dollars raised. We’re still in the early innings of this foundational movement.

Thank you to the Monte Carlo team for the opportunity & the audience for the great questions at the end. I’ll post the video of the presentation when it’s live.

Akshay Toshniwal

Associate Principal at LTIMindtree | Thought Leader at Global AI Hub

2 年

Great insights Tomasz Tunguz

回复

#7 is an interesting thought. Would this imply moving back from service meshes to monoliths? In the service-oriented architecture, services encapsulate their data stores and communicate via messages and APIs.

回复
Trond Johannessen

Venture Developer, Board Member, Pre-Seed Investor

2 年

Point 6 is where you start placing in evidence the mounting risk levels in software/systems/AI. To some extent the risks have been there all along, as any code or even formula sequences, or statistical function applied to data, may have errors. Yet, when you start writing Python/SQL at scale, you are replicating the processes that caused the major crisis in 2008 and beyond, when hedges in programs distributed across portfolios managed on Wall Street caused every user to treat certain assets as hedges, aggravating the crisis, as the hedges were part of the problem. In general, the softwarization/virtualization requires increasing levels of checks and balances, and in some of the industry that naturally happens as you commercialize the resulting code. Maybe not perfect, maybe even doggy, but a lot of people screaming when things are SNAFU. Instead, if you get lot of custom code replacing some of packaged software, and that propagates through large enterprise and ecosystems, risk levels increase. Do you see evidence of someone working on reducing enterprise development risk at scale?

Karl Waldman

helping B2B SaaS companies deliver superior customer value

2 年

Go look at link in point 6 - watch the 5 min demo to the end. https://hex.tech/ Mind blown

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了