Real-time Distributed Data Science is *The* Future!
Paul Golding
Hands-on R&D Multidisciplinary AI Leader | 30 patents in AI/ML | Edge AI | AI Chip Design | Robotics
No! Data is NOT your differentiator...
The real edge is infotaxis, which I will explain shortly.
And it is only possible via real-time distributed data science (RTDDS), which I argue is the future of data-driven AI-first orgs.
We get bombarded with soundbite posts that try to tell us that the future of AI is your data: the only differentiator. This is a common claim. But it is false and often shows the naivety of those whose world view is that of models and tech rather than organizations and value creation.
Anyone who has attempted to achieve enterprise-scale ML operations knows that it is a non-trivial task well beyond the current GenAI hype. It takes an enormous engineering lift just to make it happen, well beyond the skill of most data scientists and orgs, except the heavyweights who specialize in it.
Indeed, I argue it will become the #1 AI/ML challenge and the key differentiator.
Everyone has their pet competitive-edge claim: data, DataOps, causality, agents, graphs, and so on. But 99.9% of these claims are from those selling wares, teaching courses or trying to stand out (which is not to be faulted). Of course, I am too, but I have created numerous AI/ML teams tasked with re-engineering the org, not building models. Models are incidental.
Infotaxis (from biology) is the process of gaining information that drives the seeker (hunter) nearer to a goal (prey), even if that goal is moving!! So-called "pivoting" is baked into the process. In a nutshell, this is the organizational challenge.
However, unlike a single hunter and prey tuple, an organization is a composite of many infotaxis pathways that must ideally converge as an ensemble to produce a robust and, dare it say it, anti-fragile motion, possibly towards multiple competing goals (like net-zero vs. maximum profit).
But it is knowing what information to pay attention to and when, plus, of course, how to act upon it. It is a real-time game. Taking the definition of real-time from the world of real-time programming, this means whatever time window is needed to fulfill a task. It could be milliseconds or years. Indeed, we often overlook longer time horizons via organizational amnesia: a big mistake!
The tiniest of data packets might contain maximum entropy in terms of information gain for a particular task, maybe one you didn't even know needed attending to -- an anomaly, perhaps. I recently spoke to someone who realized, in hindsight, that their data was clearly showing the pending 2008 crash, but didn't get picked up as anomalous (due to concept drift).
I designed one of the earliest stream-processing systems with the formidable Geoff McGrath when he was spinning up the technology division of McLaren. Back then (2010), we were promoting prescriptive analytics (prescribing decisions). But it was not a flippant marketing phrase: we had understood that only data-driven businesses can win. Unbeknownst to many, racing is an information game as much as an engineering one.
领英推荐
This is still the goal, but it is common for many orgs to be flailing around with 3rd-rate data practices and the wrong set of philosophical commitments.
Just as we talk of "digital natives", there are also "data natives", but seldom are they the ones in the driving seat. It is surprisingly common for large enterprise data teams to be run by long-time company functionaries who see it as a safe long-term gig, making sure to build plenty of tribal knowledge into the process.
A common philosophical mistake is to still believe that data is a kind of "oil" inside the machine, or beneath the ground. This is false. Information, in the infotaxis sense, is the mechanism itself, just as the cutting edge of material physics is also information: bits (or qubits) not atoms.
In other words, the organization is not like a machine, or system (per systems thinking), and certainly not a von Neumann machine -- but more like the collective intelligence that Michael Levin has revealed to be the fundamental building block of, well, everything, including the machine of morphogenesis -- which is the closest parallel to how we ought to think of organizations!
Only by rethinking the philosophy of science (Levin's lab hires philosophers) was his team able to reimagine what scientific discovery looks like, re-introducing the notion of telos (agency) that had been shunned by scientific materialism.
Ditto, we must rethink what organizational discovery actually is, moving beyond the worn-out metaphors of industrial markets and "economic actors" and mechanical philosophy. These metaphors started to fail long ago, but the failure has been masked by the "(cheap) capital as strategy" frenzy that was anything but a strategy, as the mass lay-offs and various free-falls are now revealing.
GenAI inserted into this old model will fail. Moreover, I predict it will do worse than fail -- it will accelerate failure via its massive amplification of old semantic paradigms the promotion, per Iain McGilchrist, of extreme left-brain-ism viruses that turn society and organizations into dysfunctional "autistic" disorganizations, made worse by the forthcoming pincer of model monopolies (disguised as "AI safety").
Nowhere is left-brain-ism more apparent than the absurd world of crypto -- a kind of bastardized version of McLuhan's "media is the message" == "the block is the message". What are the blocks for? Who cares?
One of the challenges of RTDDS is the need to build models on the fly, perhaps thousands of them, if not millions in a large org. This was something I helped explore for TelOS, a bold start-up who were attempting to build a "knowledge observability" platform aimed at "mega-projects" (like entire cities). The motivating schema was: "What might sit behind that oracular-like screen that Tom Cruise uses in Minority Report?"
Many orgs are moving towards real-time processes, especially with the increasing importance of technologies like Change-Data Capture. But, this is just a technology enabler, albeit a powerful one.
As the work at TelOS showed, the only real hope for "prescriptive analytics" and "organizational infotaxis" (at scale) is to migrate from a static organization to a dynamic one via the world of simulation -- the ability to continually run many alternative what-if futures, and even pasts. We are miles from this data reality.
For the data scientists, you can think of this as a kind of bootstrapping: continual resampling of the information space, spawning millions of "self-organizing decision trees" in an effort to "find the prey" (sorry for a hunting metaphor).
Along the way, we must totally rethink what we mean by "data quality". There is no single source of truth. We need to move on from this delusion. The clue is in the work of Levin and his work with regenerative medicine. We will need regenerative organizations in which data fragmentation is a feature, not a bug.
What I am describing does not exist. But it is the future.
Accelerating SATCOM with Networked Intelligence | NSF I-Corps Mentor | Business Development & GTM Leader
11 个月Great read. Loved the comparison of web3 space as a bastardization of McLuhan. Cryptography, trust architecture, and ledger technology are all important and great, but the tulip crowd got a bit lost in their own enjoyment of the tech. Regarding the rest of the article, I love that you touched on rethinking von Neumann's architecture. I'd love to learn more about your thoughts on neuromorphic computing and its impact on driving the enterprise paradigm shift you're championing. While I don't agree completely with the Julian Jaynes-flavored left/right brain example, I 100% agree with the sentiment. This is the GenAI future that I hope comes, my cynicism says we will fall somewhere short, with incumbent players (and economics of violence) getting in the way. Looking forward to following along TelOS
Production-Grade Data Pipeline Orchestration | Operational Excellence
11 个月Finally, someone who gets it ??
Chief AI Officer. Visionary technologist and lateral thinker driving market value in regulated, complex ecosystems.
11 个月I love this so much. I’d love to talk with you about this! I have many models for fully autonomous data science. Recursively even. It looks a lot like what you’ve got here actually I love the animation so much.