登录查看更多内容

Real-time Distributed Data Science is The Future!

Paul Golding

Hands-on R&D Multidisciplinary AI Leader | 30 patents in AI/ML | Edge AI | AI Chip Design | Robotics

发布日期: 2024年4月5日

+ 关注

No! Data is NOT your differentiator...

The real edge is infotaxis, which I will explain shortly.

And it is only possible via real-time distributed data science (RTDDS), which I argue is the future of data-driven AI-first orgs.

We get bombarded with soundbite posts that try to tell us that the future of AI is your data: the only differentiator. This is a common claim. But it is false and often shows the naivety of those whose world view is that of models and tech rather than organizations and value creation.

Anyone who has attempted to achieve enterprise-scale ML operations knows that it is a non-trivial task well beyond the current GenAI hype. It takes an enormous engineering lift just to make it happen, well beyond the skill of most data scientists and orgs, except the heavyweights who specialize in it.

Indeed, I argue it will become the #1 AI/ML challenge and the key differentiator.

Everyone has their pet competitive-edge claim: data, DataOps, causality, agents, graphs, and so on. But 99.9% of these claims are from those selling wares, teaching courses or trying to stand out (which is not to be faulted). Of course, I am too, but I have created numerous AI/ML teams tasked with re-engineering the org, not building models. Models are incidental.

Infotaxis (from biology) is the process of gaining information that drives the seeker (hunter) nearer to a goal (prey), even if that goal is moving!! So-called "pivoting" is baked into the process. In a nutshell, this is the organizational challenge.

However, unlike a single hunter and prey tuple, an organization is a composite of many infotaxis pathways that must ideally converge as an ensemble to produce a robust and, dare it say it, anti-fragile motion, possibly towards multiple competing goals (like net-zero vs. maximum profit).

But it is knowing what information to pay attention to and when, plus, of course, how to act upon it. It is a real-time game. Taking the definition of real-time from the world of real-time programming, this means whatever time window is needed to fulfill a task. It could be milliseconds or years. Indeed, we often overlook longer time horizons via organizational amnesia: a big mistake!

The tiniest of data packets might contain maximum entropy in terms of information gain for a particular task, maybe one you didn't even know needed attending to -- an anomaly, perhaps. I recently spoke to someone who realized, in hindsight, that their data was clearly showing the pending 2008 crash, but didn't get picked up as anomalous (due to concept drift).

I designed one of the earliest stream-processing systems with the formidable Geoff McGrath when he was spinning up the technology division of McLaren. Back then (2010), we were promoting prescriptive analytics (prescribing decisions). But it was not a flippant marketing phrase: we had understood that only data-driven businesses can win. Unbeknownst to many, racing is an information game as much as an engineering one.

领英推荐

40 Must-Know Data Science Skills & Frameworks, Getting…

Open Data Science Conference (ODSC) 2 年前

The Gradient Boosted Algorithm Explained!

Damien Benveniste, PhD 10 个月前

Spotlight on Databricks RAG Tools, Vector Search…

Maria Pere-Perez 1 年前

This is still the goal, but it is common for many orgs to be flailing around with 3rd-rate data practices and the wrong set of philosophical commitments.

Just as we talk of "digital natives", there are also "data natives", but seldom are they the ones in the driving seat. It is surprisingly common for large enterprise data teams to be run by long-time company functionaries who see it as a safe long-term gig, making sure to build plenty of tribal knowledge into the process.

A common philosophical mistake is to still believe that data is a kind of "oil" inside the machine, or beneath the ground. This is false. Information, in the infotaxis sense, is the mechanism itself, just as the cutting edge of material physics is also information: bits (or qubits) not atoms.

In other words, the organization is not like a machine, or system (per systems thinking), and certainly not a von Neumann machine -- but more like the collective intelligence that Michael Levin has revealed to be the fundamental building block of, well, everything, including the machine of morphogenesis -- which is the closest parallel to how we ought to think of organizations!

Only by rethinking the philosophy of science (Levin's lab hires philosophers) was his team able to reimagine what scientific discovery looks like, re-introducing the notion of telos (agency) that had been shunned by scientific materialism.

Ditto, we must rethink what organizational discovery actually is, moving beyond the worn-out metaphors of industrial markets and "economic actors" and mechanical philosophy. These metaphors started to fail long ago, but the failure has been masked by the "(cheap) capital as strategy" frenzy that was anything but a strategy, as the mass lay-offs and various free-falls are now revealing.

GenAI inserted into this old model will fail. Moreover, I predict it will do worse than fail -- it will accelerate failure via its massive amplification of old semantic paradigms the promotion, per Iain McGilchrist, of extreme left-brain-ism viruses that turn society and organizations into dysfunctional "autistic" disorganizations, made worse by the forthcoming pincer of model monopolies (disguised as "AI safety").

Nowhere is left-brain-ism more apparent than the absurd world of crypto -- a kind of bastardized version of McLuhan's "media is the message" == "the block is the message". What are the blocks for? Who cares?

One of the challenges of RTDDS is the need to build models on the fly, perhaps thousands of them, if not millions in a large org. This was something I helped explore for TelOS, a bold start-up who were attempting to build a "knowledge observability" platform aimed at "mega-projects" (like entire cities). The motivating schema was: "What might sit behind that oracular-like screen that Tom Cruise uses in Minority Report?"

Many orgs are moving towards real-time processes, especially with the increasing importance of technologies like Change-Data Capture. But, this is just a technology enabler, albeit a powerful one.

As the work at TelOS showed, the only real hope for "prescriptive analytics" and "organizational infotaxis" (at scale) is to migrate from a static organization to a dynamic one via the world of simulation -- the ability to continually run many alternative what-if futures, and even pasts. We are miles from this data reality.

For the data scientists, you can think of this as a kind of bootstrapping: continual resampling of the information space, spawning millions of "self-organizing decision trees" in an effort to "find the prey" (sorry for a hunting metaphor).

Along the way, we must totally rethink what we mean by "data quality". There is no single source of truth. We need to move on from this delusion. The clue is in the work of Levin and his work with regenerative medicine. We will need regenerative organizations in which data fragmentation is a feature, not a bug.

What I am describing does not exist. But it is the future.

Jonathan Looney

Accelerating SATCOM with Networked Intelligence | NSF I-Corps Mentor | Business Development & GTM Leader

11 个月

Great read. Loved the comparison of web3 space as a bastardization of McLuhan. Cryptography, trust architecture, and ledger technology are all important and great, but the tulip crowd got a bit lost in their own enjoyment of the tech. Regarding the rest of the article, I love that you touched on rethinking von Neumann's architecture. I'd love to learn more about your thoughts on neuromorphic computing and its impact on driving the enterprise paradigm shift you're championing. While I don't agree completely with the Julian Jaynes-flavored left/right brain example, I 100% agree with the sentiment. This is the GenAI future that I hope comes, my cynicism says we will fall somewhere short, with incumbent players (and economics of violence) getting in the way. Looking forward to following along TelOS

1 次回应

Erin W.

Production-Grade Data Pipeline Orchestration | Operational Excellence

11 个月

Finally, someone who gets it ??

1 次回应

Nicholas Clarke

Chief AI Officer. Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems.

11 个月

I love this so much. I’d love to talk with you about this! I have many models for fully autonomous data science. Recursively even. It looks a lot like what you’ve got here actually I love the animation so much.

2 次回应

查看更多评论

要查看或添加评论，请登录

Paul Golding的更多文章

Co-pilots are like Multivitamin Pills for Olympians

2024年8月12日

Co-pilots are like Multivitamin Pills for Olympians

There has been a recent spate of stories along the lines of "GenAI not as expected", or "too costly, too little", etc…
Back Propagation: Holistic Overview

2023年11月10日

Back Propagation: Holistic Overview

Introduction This post is for readers interested to know the wizardry of AI behind the curtain. It is an attempt to…

2 条评论
Making the Winning Move: from Beyond Human to Beyond Organizational AI.

2023年6月15日

Making the Winning Move: from Beyond Human to Beyond Organizational AI.

When I first worked with the brilliant innovator Geoff McGrath at McLaren (Formula One), we were hawking the promise of…
The New Innovation: Right-brained AI

2023年5月16日

The New Innovation: Right-brained AI

GenAI is still in its infancy, yet many leaders already want to know what they should do differently beyond exploring…
Will AI Kill Creativity?

2023年5月12日

Will AI Kill Creativity?

This was a question posed in a philosophy forum. Below is my hurried response (slighted edited and extended to make…

1 条评论
Companies not using AI will lose sales

2023年3月25日

Companies not using AI will lose sales

Mind Meld with the Customer How would you like to mind-meld with the customer, as if you had perfect knowledge of all…

1 条评论
The AI Paradigm: Scaling

2023年3月14日

The AI Paradigm: Scaling

Summary The emergence of Large Language Models with their impressive beyond-human performance (in many benchmarks)…

1 条评论
Mindless Data

2023年3月8日

Mindless Data

I my last post, I wrote about being "data driven" operationally, and in the post before that I wrote about the mindsets…
ChatGPT is the Tip of the Iceberg

2023年2月28日

ChatGPT is the Tip of the Iceberg

ChatGPT is just the tip of the iceberg that enterprises find themselves crashing into even though the underlying mass –…

4 条评论
The Killer Use Case for Generative AI is Empowering Enterprise Citizens

2023年2月19日

The Killer Use Case for Generative AI is Empowering Enterprise Citizens

Digital Democratization: Tech Osmosis Following on from a recent post about digital democratization, let's explore…

6 条评论

See all articles

Real-time Distributed Data Science is The Future!

Paul Golding

Hands-on R&D Multidisciplinary AI Leader | 30 patents in AI/ML | Edge AI | AI Chip Design | Robotics

领英推荐

Paul Golding的更多文章

社区洞察

其他会员也浏览了

Know The Top 10 Data Science Trends (2022)

The Fear in Data Scientist called Autophobia

The Data Prep Kit and Open Source RAG

Data clustering

DATA Pill #052 - LLM, observability, Data Catalogs & storage cost reduction again

The Metamorphosis of Data Science: From Data Wrangling to Holistic Problem Solving

Galileo adds computer vision and image recognition

k-Nearest Neighbours (kNN) Imputation Algorithm (with an nice Golang example)

Building Automated Knowledge Graph from Unstructured Data Using LLMs and Neo4j

Ten predictions for data science and AI in 2020

领英推荐

Paul Golding的更多文章

Co-pilots are like Multivitamin Pills for Olympians

Back Propagation: Holistic Overview

Making the Winning Move: from Beyond Human to Beyond Organizational AI.

The New Innovation: Right-brained AI

Will AI Kill Creativity?

Companies not using AI will lose sales

The AI Paradigm: Scaling

Mindless Data

ChatGPT is the Tip of the Iceberg

The Killer Use Case for Generative AI is Empowering Enterprise Citizens

社区洞察

其他会员也浏览了

Know The Top 10 Data Science Trends (2022)

The Fear in Data Scientist called Autophobia

The Data Prep Kit and Open Source RAG

Data clustering

DATA Pill #052 - LLM, observability, Data Catalogs & storage cost reduction again

The Metamorphosis of Data Science: From Data Wrangling to Holistic Problem Solving

Galileo adds computer vision and image recognition

k-Nearest Neighbours (kNN) Imputation Algorithm (with an nice Golang example)

Building Automated Knowledge Graph from Unstructured Data Using LLMs and Neo4j

Ten predictions for data science and AI in 2020