A Field of Streams
The real-world history of the data warehouse has largely reflected a “build it and they will come” faith that has only been partially justified. For every data warehouse living in the beating heart of an enterprise’s day-to-day operations, there is (at least) one ghostly equivalent – a deep freeze where data goes into a state of suspended animation. Available but permanently unused.
When I was at EY, enterprise data was one of my central concerns. While it’s true that digital analytics tools provided standalone SaaS platforms that were almost universally adopted, I never believed that digital data should be siloed, or that those digital analytic tools provided the depth necessary to support sophisticated usage. Convincing organizations of that was a central tenet in the strategic consulting I delivered at both Semphonic and EY – and it’s advice that led to its own share of failures as well as successes. At Digital Mortar, I didn’t worry much about enterprise data. We weren’t doing consulting and our product was a standalone tool (though with unusually rich APIs and exports – nearly all of our clients did take data from our system into their enterprise data platforms). Our data might or might not be getting used from there, but we were always more concerned about whether our platform was getting used.
At PMY, the pendulum has swung about half-way back. When it comes to crowd intelligence data, we’re still mostly concerned with whether or not clients are using our platform. And in some ways, that’s even more pronounced in the sports/live event world because so much of the focus is on real-time monitoring. On the other hand, PMY (unlike Digital Mortar) does a lot of technology consulting. The technology setup for many of the major stadiums in the U.S. was designed by PMY and a big piece of what PMY does is build out (mostly cloud-based) enterprise data platforms.? Naturally, we’ve built integrations from our people measurement platform into that enterprise data architecture and because there is already data critical to intelligent use of crowd-management in those data stores, that integration tends to be a two-way street.
That’s given me some exposure into several large enterprise data platform clients and projects and it’s been interesting to see how much has changed in the work to build and activate an enterprise data platform. It used to be that most of the work in creating an enterprise data store was focused on moving the data into the store and shifting it into forms that were reasonably queryable. That work still exists, of course, but it’s become almost trivial. The widespread availability of APIs, improvements in data engineering tools, and the use of low or no-code solutions for pipelines have made the work of ingesting data much less complex and time consuming. The sort of projects that used to be scheduled in months are now timelined in weeks or even days. Nor is query performance the issue it once was. Tools like BigQuery don’t require the kind of careful tuning and schematization of traditional SQL databases to deliver efficient, scalable and cost-effective query performance. The guts of the data pipeline work are a LOT less trouble than they used to be.
On the other hand (and there does always seem to be an ‘other hand’), a lot of new stuff is expected from an enterprise data platform and those expectations mostly focus on data activation. Since that’s traditionally been where enterprise platforms struggle or fail, that’s a good thing.
The first thing I really noticed about the PMY enterprise data efforts was the focus on application activation. Part of bringing a data source onboard is creating an application API for it. No ifs, ands or buts. If it’s in the enterprise data store, it’s accessible with a standardized API. That’s pretty awesome. It underscores how important application activation is as part of a enterprise data strategy. It also means that applications that already consume sources like Ticketing or Merchandising can add Crowd Management with very little?effort.
A second aspect – deeply related to application activation – is much deeper support for real-time data. It’s amazing how much your perspective about real-time will depend on your vertical. When I was doing digital analytics, I mostly pooh-poohed the usefulness of real-time for analytics. I wasn’t wrong. The need for real-time data for traditional analytics is minimal and there are few organizations that can or should be making real-time operational changes. But that turns out to be a narrow view. In some verticals, real-time data is the primary use-case. Sports and live events (not to mention landside aviation) top that list. And when it comes to people-measurement, even retail has far more use for real-time data than eCommerce sites. With the exigencies of staffing and staff allocation, there are a bunch of real-time decisions a store can make with monitoring data that have no digital equivalent. Vertical aside, my negative view of real-time was also parochial to analytics as the application. With people measurement, I’ve come to appreciate that analytics may be a secondary part of the equation – with operational and security use-cases often leading the way. In those domains (and many others), real-time actioning is the primary use-case for data.
Given that the enterprise data platform is meant to be THE place for curated, useful data, it doesn’t make sense for most organizations to ignore the importance of real-time. We used to live in a world where populating data hourly or daily was routine. That still may be the case for some data sources, but a good enterprise platform should provide robust real-time support. Certainly, crowd management is a case in point. Our real-time integrations include wait times at queues, occupancy monitoring and alerting for any area, throughput monitoring and alerting, and individual journey alerting. This lets an enterprise land and operationalize crowd management on a single enterprise platform and it ensures seamless integration with complementary data sources like ticketing and PoS.
The third change I’ve seen is the nearly routine integration of AI into back-end and activation strategies. In the broader market it’s so hard to separate out hype from reality around AI that even saying this can be dangerous. But particularly with cloud-based platforms, the use of AI is becoming ubiquitous in pipeline engineering and data activation. In just a few months at PMY I’ve seen AI used in everything from the expected (supporting customer access to data with chat) to the mildly surprising (using LLMs to assess GDPR and CCPA data risks and support richer MDM) to the completely unanticipated (automatically customizing sales presentations to clients with CRM and web data for an internal sales team). Most of these (except chat) provided in very low-code, fast instantiations using tools native to the environment and deeply integrated into the platform tech stack. I’ve also seen how heavily engineers now rely on AI for code generation and documentation when it comes to pipeline engineering. Based on my own experience, LLM’s haven’t quite gotten to the point of replacing (and even sometimes usefully supplementing) code in big, complex applications. But, damn, they make pipeline engineering easy. This isn’t McKinsey crap – this is just the way things work nowadays.
The funny thing about all this is that an enterprise can now have a heavily activated data platform that STILL suffers from very poor analytic usage. The website, the app, digital displays and a bunch of internal applications may be relying on the enterprise data platform for all their data needs. And every single new application may be saving time and adding functionality by dipping into that same well. Yet users may still not be doing much with the data!
It’s not quite a case of “more of the same” when it comes to analytics and reporting. I’ve long encouraged clients to minimize traditional reporting and focus on analytic reports that embed predictive models. That’s gotten a lot easier to do with AI tools that take much of the work out of basic data science and the willingness of engineering teams to add that kind of predictive model to reporting pipelines is encouraging. That being said, those tools are not especially reliable and the engineers deploying them don’t always have the data science chops to provide adequate quality control. Compared to the underlying improvements in pipeline engineering and App activation, improvements in analytics and reporting seem rather paltry.
The original problem remains what it has always been – most organizations don’t have enough analytic culture built-in. Managers don’t know what they need, don’t know what to ask for, and aren’t necessarily committed to using the data to begin with. Solving those problems is critically important. The difference between an organization populated by people working to understand problems and optimize decisions vs. an organization populated by people just making decisions is profound. Still, when it comes to an enterprise data platform, the modern emphasis on application activation at least guarantees that your data investment is driving real value even before all your people are fully on board.