登录查看更多内容

9 Predictions for Data in 2023

Tomasz Tunguz

发布日期: 2022年10月27日

Yesterday, at the?Monte Carlo Impact Summit ?I shared my 9 Predictions for Data in 2023.?Here are the slides ?& I’ve embedded them below.

These are my 9 predictions. A year from now I’ll score them to see how I did.

Tomasz Tunguz 4 年前

Databricks SQL Series — Part 5 — Managing and Securing…

Krishna Yogi Kolluru 4 个月前

Selected Data Engineering Posts . . . February 2024

Axel Schwanke 9 个月前

Cloud data warehouses (CDW) will process 75% of workloads by 2024. In the last five years, CDWs have grown from 20% of the workloads to 50% with on-prem databases constituting the remainder. Meanwhile, the industry has grown from $36b to $80b during that time.
Data workloads will segment by use case into three groups. First, in-memory databases like?DuckDB ?will grow to dominate local analysis even for very large files. CDWs will retain classic BI & exploration uses. Cloud data lakehouses will serve jobs operating on massive data & jobs that don’t require the fastest latency - and do it at half the storage price.
Metrics layers will unify the data stack. Today, there are two different forks in data. The first fork uses ETL to pump data into a CDW then to a BI or data exploration tool. The second fork, which is the machine learning stack, is identical save for the outputs: model serving & model training. The metric layer will become the single place metrics & features are defined unifying the stack & potentially moving model serving & training into the database.
Large language machine learning models will change the role of data engineers. I recorded a video of myself writing code to produce charts in the presentation. The video shows Github Copilot magically creating a chart for the DuckDB star growth. Copilot ingests a comment, writes the code, even adds my custom theme function. When the code is executed, it works. Technologies like this will push data engineering work to a higher plane of abstraction.
WebAssembly or WASM will become an essential part of end-user facing data apps. WASM is a technology that accelerates browser software. Pages load faster, data processing is speedier & users are happier. Every major browser supports WASM & consequently anyone producing a data app for an end user will use it.
Notebooks will win 20% of Excel users. Of the 1b global Excel users, 20% of them will become prosumers, writing Python/SQL to analyze data. They will do it in notebooks like Jupyter, which are easily shared, reproducible & version controlled. Those notebooks will become data apps used by end users inside of companies,?replacing brittle Excel & Google Sheets .
SaaS applications will use the?CDW as a backend for both reading & writing . Today, sales, marketing, & finance data exist in disparate systems. ETL systems use APIs to push that data into the CDW for analysis. In the future, software products will build their apps on top of the CDW to take advantage of centralized security, faster procurement processes, & adjacent data. These systems will also write back to the CDW.
Data Observability becomes a Must Have. Software engineers measure the success of their efforts through up-time. 99.9% or three-nines of up-time means only 1 incident per 1000 hours. Today’s data teams see 70 incidents per 1000 tables. Data teams will align on data uptime/accuracy metrics & drive to the three-nines equivalent, using data observability tools to measure their performance.
The Decade of Data Continues. Data startups raised more than $60b in total in 2021 more than 20% of all venture dollars raised. We’re still in the early innings of this foundational movement.

Thank you to the Monte Carlo team for the opportunity & the audience for the great questions at the end. I’ll post the video of the presentation when it’s live.

Tomasz Tunguz

114,240 位关注者

Akshay Toshniwal

Associate Principal at LTIMindtree | Thought Leader at Global AI Hub

2 年

Great insights Tomasz Tunguz

Dmitri Tcherevik

2 年

#7 is an interesting thought. Would this imply moving back from service meshes to monoliths? In the service-oriented architecture, services encapsulate their data stores and communicate via messages and APIs.

Trond Johannessen

Venture Developer, Board Member, Pre-Seed Investor

2 年

Point 6 is where you start placing in evidence the mounting risk levels in software/systems/AI. To some extent the risks have been there all along, as any code or even formula sequences, or statistical function applied to data, may have errors. Yet, when you start writing Python/SQL at scale, you are replicating the processes that caused the major crisis in 2008 and beyond, when hedges in programs distributed across portfolios managed on Wall Street caused every user to treat certain assets as hedges, aggravating the crisis, as the hedges were part of the problem. In general, the softwarization/virtualization requires increasing levels of checks and balances, and in some of the industry that naturally happens as you commercialize the resulting code. Maybe not perfect, maybe even doggy, but a lot of people screaming when things are SNAFU. Instead, if you get lot of custom code replacing some of packaged software, and that propagates through large enterprise and ecosystems, risk levels increase. Do you see evidence of someone working on reducing enterprise development risk at scale?

1 次回应

Karl Waldman

helping B2B SaaS companies deliver superior customer value

2 年

Go look at link in point 6 - watch the 5 min demo to the end. https://hex.tech/ Mind blown

查看更多评论

要查看或添加评论，请登录

查看全部

9 Predictions for Data in 2023

Tomasz Tunguz

领英推荐

Tomasz Tunguz

114,240 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Creating an Automated Data Pipeline with Databricks

DATA Pill #033 - 4 ways to optimize BigQuery, 30 data models in DBT, 4 enablers of being data-driven, and a look back at the 2022 predictions

Advanced Data Analytics with Apache’s Cutting-Edge Tools

Exploring Data with KQL in Azure

Microsoft Fabric End-to-End Project?—? with Shorcut, Data Pipeline, DataFlow.

Architecting Data Pipelines

Bing New Search - End-to-End Azure Data Engineering Project using Microsoft Fabric.

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Ensuring Data Quality in Databricks with Great Expectations: A Practical How-to Guide

领英推荐

Tomasz Tunguz

114,240 位关注者

Theory Two

2024年11月22日

My Little Library

2024年11月20日

75 Cents per Month

2024年11月19日

Small but Mighty AI

2024年11月15日

The Post Election Surge is Unevenly Distributed

2024年11月11日

I Talk to Robots While Driving

2024年11月8日

The White Collar Revolution

2024年11月6日

Profit Dollars per GPU Dollar

2024年11月5日

My AI Rube Goldberg Machine

2024年10月29日

Productivity One Year from Now

2024年10月28日

社区洞察

其他会员也浏览了

Creating an Automated Data Pipeline with Databricks

DATA Pill #033 - 4 ways to optimize BigQuery, 30 data models in DBT, 4 enablers of being data-driven, and a look back at the 2022 predictions

Advanced Data Analytics with Apache’s Cutting-Edge Tools

Exploring Data with KQL in Azure

Microsoft Fabric End-to-End Project?—? with Shorcut, Data Pipeline, DataFlow.

Architecting Data Pipelines

Bing New Search - End-to-End Azure Data Engineering Project using Microsoft Fabric.

Analytics and Data Science News for the Week of September 20; Updates from Firebolt, Qrvey, Teradata & More

Ensuring Data Quality in Databricks with Great Expectations: A Practical How-to Guide