Data in Construction: How to Create Truly "Big" Data Architectures
Every week we'll publish an episode of the Data in Construction Podcast. This week, Ryan Gross, Partner at Credera, and Al Vazquez, CTO of Sweft, walk through their journeys as data architects. Along the way, we cover large scale implementations, IoT and massive data processing architectures; how to grow a data culture, and more.
Here's the conversation in total:
Hugh: Welcome to data and construction. I'm Hugh Seaton. Today I'm here with Ryan Gross, partner of the data practice at Credera and Al Vazquez, CTO of Sweft. Gentlemen, welcome to the podcast.?
Ryan Gross: Thanks for having us.?
Hugh Seaton: So I'm going to ask each of you to kind of talk a little bit about what you do and what your journey has been. Ryan, can we start with you??
Ryan Gross: Yeah. So I got my journey into data really starting around the 2015 timeframe before that I had a master's in computer science, have a very software engineering type of background, built a lot of different systems, right prior to this had just built an enterprise portal and the whole BI suite around that for a company involved in fleet management.?
So if you need 500 white vans, you rent them from this company and they give you the fuel, maintenance, all that kind of stuff. And then immediately after that, I started building the IOT data platform for one of the world's largest heavy equipment manufacturers that works in the agriculture space.
And that was this petabyte scale system taking all the data that came off of tractors, harvesters, sprayers. Globally distributed geospatial data across, you know, a farm field about both how the tractor was doing, and is it performing well, is the engine too hot. so on, as well as what was the tractor doing? It was planting seeds at a certain depth and a certain density and of a specific variety at each part of the field. And that really got me in love with building these data systems. Generally, you know, my start was on the cloud.
So I really skipped the whole data warehouse era and had to fill back in that knowledge over time. Then from there, I got into machine learning that was really about, "Hey, I love doing this one project. How am I going to be able to do more of those?" And machine learning was really taking off that 2017, 2016 timeframe in terms of building up the reason that companies want to put these massive cloud data platforms in place.?
And so, started to build a little bit of a business around that building predictive models in a number of different industries. Probably most interestingly here was with a large engineering firm on water filtration plants.
So doing a predictive maintenance on water filters was one of those earlier projects. And then over time, starting to layer in more of the overall enterprise, what should the organizational structure be around these things? How do people govern data in different ways with the way the technology has evolved over time?
And then most recently took on a leadership role, in a data practice, looking to really drive that data culture journey across clients. In addition to that, I do spend a lot of time looking at emerging technology, trying to find a way to, you know, what is the next machine learning going to be?
And that's taken me into kind of these newer waves of how you build machine learning, IOT style data platforms, similar to the first one that I worked on. The cloud technology providers have definitely caught up a lot over the last call it two or three years. And these systems are becoming, definitely not are, but are becoming a lot easier to build with what I'll call a distributed cloud, as opposed to kind of the edge device and then the backend, all in some centralized data center on the cloud. Spent a lot of time working that out over the last year or so.?
Hugh: Awesome. So much to unpack there, but Al why don't you tell us about your journey and where you come from?
Al Vazquez: Sure Hugh. So I've been working with computers since I was a kid. You can almost think of it as I got into computers, programming and technology as my trade. When I'd solve problems as a child for, for my household, it was computer problems and networking problems. When I was working in, in my contexts and in schools or, in friends houses, or as I got into my first jobs when I was 14 and 15, they were all technology. So I've been doing technology as my trade for my whole life. And, I didn't think of it really as a career early on, it was just something I did to solve problems.
?And I ended up studying philosophy of all things and you'd think that would kind of put me up into the totally useless stratosphere of what to think about. But, as it turns out, being really practiced with technology and solving practical problems for my whole life is a pretty valuable thing. And my journey into big data really came from trying to figure out how to do the absolute most that we can do with the least amount of effort and enabling the most number of people very, very quickly.?
So I got into agile software development methodologies, including Kanban and Kaizen and lean startup very early, and built software and business data platforms for any industry that needed the help.
So I've been working in retail. I've done community networks in big cities, I've done internet infrastructure projects, I've done all sorts of things to apply technology quickly, affordably, and really enable some sort of force multiplication in whatever context was necessary.
Few years ago when I was working, actually with Ryan, at a consulting firm, I got into a really, really heady project at a top 10 global hedge fund. And we designed and built a petabyte scale data lake for them to gather data incredibly quickly and then process that data at an incredibly rapid pace without a ton of overheadso they could enable small teams to experiment and find value quickly, and then leverage that value in a context that had a ton of compliance problems that that really needed to be solved and a ton of governance problems that really needed to be solved, but still enable people to move quickly and discover value as they went.
We ended up building a platform that was so fast that the teams were accidentally on-boarding data sets that were larger than most other companies' entire data warehouses. And then they would roll them back and then bring them back on as they discovered what value they really wanted. So it was, it was very interesting.
And since then, I've moved on to taking on a C-suite role, building business systems and laying the foundation for big data platforms in retail. And I'm also consulting independently with several AEC technology firms now. That's sort of my background in a nutshell, but, uh, really excited to be here.
Hugh: Yeah. Well, I want to define a couple of terms and a couple ideas you guys talked about, first of all, let's start with petabyte. How big is a petabyte?
Al: Ryan. You want to take that one??
Ryan: Yeah. Yeah. If you think about the amount of space on your laptop, typically it's measured in gigabytes and oftentimes on your large desktop computers, your drive will hold one terabyte of data and that's enough to hold several thousand movies.
Now, a petabyte is the next level up in scale. It's 1000 terabytes. So it is data that becomes computationally very hard to process using your typical database that you might have already installed for running, whether that's a data warehouse or backup for an application. And it's typically the barrier that people think of these days when they're talking about big data systems.
I think when that term first got coined, if you had 10 terabytes of data, meaning, you know, more data than would fit on 10 desktop machines, that that was considered big data. These days, that bar has certainly moved up over the last four or five years.?
Hugh: And the key thing that you just talked about, it's not about storing it, although that's its own thing, it's about doing something with it, right?
So what you were able to build, both of you at varying times and in varying places, are systems that are so fast and so well architected that they're managing data so it can be handled and you can do something with it at that scale.?
Ryan: Exactly.
Al: That's right. Yeah. There's some… deep in the teams that do this type of work, they're always jokes about, “that's not really big dataâ€. So there's a lot of people that are like, man, I'm writing a query on a database and it takes 25 minutes to run. This data is just too big. And the joke is sort of, you know, computers are pretty, pretty powerful nowadays, right? Can I load your entire database into RAM.
If I can load your entire database into RAM, you don't have a big data problem because at that point, once it's in RAM, you can do a lot of work, very, very quickly.?
But there, there aren't petabyte systems that that can hold a petabyte in RAM today, a terabyte you can get, you can get systems that'll hold a terabyte or two terabytes in RAM nowadays. So that's sort of where things start to get kind of impossible. And you know, eventually we'll break that limit and it'll be an exabyte that will be considered big data. But right now, nobody can really handle a petabyte in RAM consistently for an affordable price.?
Hugh: And as a result you're breaking the data apart and you're managing it, right? And, and how do you, what do you, how do you even do that? How do you think about a petabyte, which is again, roughly a thousand times the size of what you can handle in RAM just to make the math easy.?
Um, what do you do??
Ryan: So, like you mentioned, you almost always end up parallelizing your processing, which means breaking down the data into smaller chunks and distributing it across a bunch of different computers in order to actually process it.
And in order to do that, you need to understand both the data itself, you know, what are the different fields or values about that data. In the example of, you know, the agriculture platform you had both, space and time were probably good ways to break things down. As you were on a field, or you know, I guess maybe taking it to construction, you might have similar things.
If you're looking at an IOT platform in a building space, you have what floor of the building you're on versus what type of data is actually being captured, whether it's audio, video, that'll generally balloon your data size is those types of modalities
Al: temperature, audio, or security, transactions…
Hugh: So you're working with the business to understand how to segment it, right? So that things that can be isolated so that you can do things.?
Ryan: The other thing is what you're trying to do with it. So that, that side of it, like how can I segment it? And then the other side is, what am I trying to do with it?
If I'm trying to calculate something that would make the most sense on a per floor, in an active construction site, then partitioning, which would be the word we use for how you break down the data, by floor would make a ton of sense. If I'm trying to do something where I'm getting an aggregate over all of the buildings that are in under construction right now, then putting that data together and storing it over time might be a better way to break that down.?
And oftentimes what you end up doing is you take the same data and you store it in two different ways based on what you're trying to do with it. Which obviously adds complexity to the management side of things.?
Hugh: And translating this into what is reasonably, you know, on the plates of, of teams in construction, I think we can assume that that a project is probably terabyte sized. I mean, depending on what you're doing, but if you want to start doing things across projects, especially big projects that are high value, you're, you're going to start running into maybe not petabyte, but still the same problem where you can't handle it all in, in RAM, you're going to have to think about what you're trying to do with it and get somebody to help you think through how to break the problem apart so that it's solvable, you know, with today's infrastructure.?
Al: Yeah. I think, I think one of the great insights that, that came out of what Ryan just said was that use case matters and knowing sort of how you're going to use the data is the key to unlocking the value in the data, just collecting a whole bunch of data is never really enough.
And we've seen that in, in many different contexts, right? We have systems that allow us to gather data, but the data never comes out and it's never useful. So really the key to unlocking the power of big data is understanding the value that you're going for so that you can manipulate the data and do the work necessary to make these big data problems tractable.?
But there's a way to over-correct and over-correcting in that way looks like you have to solve everything before you get started. And what Ryan and I both have done a lot of work on is how to create data platforms that enable experimentation. So you're looking for those answers, not trying to design them before you get started.
You're creating data platforms that enable you to have small teams, right? Two pizza teams, right? Four people, six people that can experiment cheaply, quickly, identify value, and then use that experimentation and those insights to drive how you're going to structure your next dataset, right? Your next big data set.
And by doing that over the next two decades, you're really enabling value discovery and value compounding, that you couldn't do if you were trying to, say, figure out exactly what you're going for first and then design a platform fit to purpose.?
Hugh: I love this term that you just used, value discovery. Having played around with data myself a little bit, you often don't always know quite where the trends are going to be or where the insights are going to come from until you get in there a little bit.
So this idea of creating a, a testing platform that is clearly not at the petabyte scale. But where you can start to figure out where the, where the magic is, where the gold is and then scale it is, is really cool. And I could see that, you know, thinking about in, in the construction space, there's a lot of there's, we're early in some of this where we're trying to understand what correlates with what, or what, what gives us what insights? So the idea of creating an experimentation platform for value discovery is pretty cool.?
Ryan: Yeah, absolutely. And that's where the… it's an experimentation platform. I think coupled with a process side of things of kind of continuously ideating using what you learn during one experiment, maybe you're trying to figure out the optimal way to plan a particular type of project that's done a number of different times and by collecting the data in terms of each of the job properties from your project management software, adding in some IOT data in order to figure out what is actually valuable for any of those things, you may actually realize that that IOT data is useful for a lot of use cases in the safety space.
And so by continuously bringing together people who are working on data to just, ya know, spit ball, if you will around... based on what you've seen in the data, what else might be possible? You really unlock a lot of those, you know, those value generating use cases. One of the things that we've, I think Al and I both have spent some time talking to people about is that about 2 to 3% of use cases, and this is, you know, stats from big tech firms so not necessarily a hundred percent translatable, but 2 to 3% of the data use cases generated 90 to 95% of the return on investment for those companies. It's really looking for those ones that are just the absolute home run.?
Hugh: I love that.That makes a lot of sense where you, you find something you didn't know that makes you change something that has measurable and clear returns. That's great.?
I got another question. You mentioned overhead in describing some of the interfaces that... or some of the ways of dealing with this. When you say overhead, what do you mean??
Ryan: So there's a couple of different levels to it, but one of the big things that, that I found in building these big data processing systems is there's overhead just based on the amount of data that's coming in.
So just to capture and reliably store that data without losing it has a fixed cost associated with it. And then anything you want to do downstream computationally to transform that data is going to increase that cost. And so trying to avoid pre-computing or, you know, spending a whole bunch of money, building something that you think you're going to use.
And instead, focusing on to the point that Al made earlier, and then I was expanding on, on known value where you can track the ROI and doing just enough data transformation to get that value is going to keep that overhead of just the volume of data down.?
Then the second part of overhead is I think more valuable, and that's really thinking about how do you build the system in a way that it's flexible so that when you want to go after that second and third, and when you really discover that home run use case, you don't have to start from scratch and get rid of everything you already have in order to go after it.
So that's really the architectural overhead of thinking through the design. You might, you may not actually build everything that you think about, but you want to keep things in mind and I'll pause here, I do think it might be a good point to introduce the four D's when you're designing one of these systems.
Okay. So four D's are really about the design of a system, with each potential feature of the system, you could either develop it, which means build it right now, because it's going to help you solve, you know, the current use case. You could design it so that you know what you want to do in the future and you're aware of the choices that you're making. Are they going to make it so you can't actually hit the design you want for something else? You can delegate it, which means you're either going to buy that part of your solution as packaged software of some kind or some other team is going to do that, and you need to give them really solid requirements in order to make sure they do what you want or need.?
And the last one is just defer. This is something that's just really not important yet. If at a later stage of maturity, we'll come back to it. And really thinking about those different options with each of the things that might be important can help you get that minimum viable system without it actually being a minimum viable throw away system.?
Hugh: I love that, "the four Ds". And going back to your point also about, standing something up that's low overhead, worth noting that it's almost like AWS and some of the other cloud platforms are they're made for this.
They're not, but they make it so easy, right. To stand something up, configure something, try it out, plug in this API, that API that will allow you to at least get something stood up to, to start figuring things out before you code a lot.?
Ryan: And that's true. I always just assume that part of it. And I, it's definitely worth saying out loud that using tools that are pay as you go, meaning you only pay for what you use versus having to buy a lot upfront, will dramatically reduce that overhead that I was talking about before. I often get caught just assuming that everybody thinks that way, these days.?
Hugh: Well, it, it just makes possible things that you wouldn't even consider in the past if you had to, God forbid buy on-prem anything.
But there they're also clearly… the cloud providers are also clearly viewing that sort of capability as a way of growing their businesses. So they, it seems like, they keep making it easier and easier to stand things up, more and more complicated things, to stand them up quickly, and then decommission them pretty quickly.
Ryan: Yes. And that not only can you experiment with the different types of data, but you can experiment with different ways of storing the data. You can experiment with different computational frameworks and do all of that at a very low cost to entry, both from what you need to know about setting up the infrastructure in order to be able to do that,?and also just the, you know, the amount of money that you're actually paying. As long as you do a good job of selecting your, you know, subset of data that you're going to start with.?
Hugh: Well, here's a crazy question that you may or may not really be able to answer. But one of the questions that is constantly being looked at inside of construction companies is should we train someone who knows construction on various digital skills?
领英推è
Or should we take someone who knows those digital skills and teach them construction? I tend to be a little bit more on the former. That within limits, train the person who understands the complexity and the interconnectedness of construction. Cause it really is complex and interconnected.?
There's just an intuition that you get where you kind of assume we got to check we got to make sure we got to make sure. But if you think about what it would take to get somebody so that they could at least be a good partner of a higher level architect, how complex do you think some of what you're talking about is? Obviously the high level ideation and all that. You know, you guys both have decades of experience with this.
What do you guys think??
Al: Sure. So I think that the insight that people with domain specific knowledge are incredibly valuable is the right one. I think that teaching programmers all about construction is pretty tough. Teaching data analysts all about construction is pretty tough.
In fact, if you talk to really strong data analysts and data engineers and data scientists, they're always looking for subject matter experts that they can work with in partnership. And I do think that partnership is, is a cross-functional partnership. is really where, where the synergy comes in. You really need both, and not an either/or.?
But to your second part of that question, which is how do you get people started? It's certainly not going to be in some of the things that we talked about earlier, which were how to partition data, so that it's performant in memory when doing heavy duty mathematical computations. That's just… that’s so specific and so experience driven that it's very difficult to, to teach people that in a short period of time.
However, I think that there… there's at least two really clear areas where domain experts can get involved immediately. And one of them is even non-technical and that’s data governance, data governance meaning the rules of the road and the guardrails that people need to follow in order to ensure continuing value delivery and compliance with whatever regulations necessary to keep the operations moving. Governance is something that ties very tightly to value propositions. If the rules of how we get things done, including standard operating procedure on the data management side, as well as rules and regulations for data privacy and data security... if those rules get in the way of value discovery and value delivery, then generally speaking, those rules will win out and if they don't win out, you have a few years before you get caught out and then you have a huge catastrophe. So, so having somebody who is in data governance, who understands why you're looking for value in the data is super critical because it helps to make sure that you can solve problems before they become problems.?
Governance is one of those spaces.?
Hugh: And some of that, just to quickly add onto what you're saying, isn't just security, which I think the first thing people think of when they hear the term governance is they assume, "oh, that's permissions and who can see what," but it's not just that right. It's data quality, it's data recency, it's data frequency. It's making sure you're marrying the same level of quality or at least are aware of the, of the varying levels of quality of data you're marrying together. Right??
Al: Yeah, that's right. That's right. I mean, it's everything that has to do with standards and conventions all the way from, you know, this European GDPR law that we're seeing about individual users and, and the privacy of their data, that goes all the way to laws and regulation all the way down to as simple as what should we name this thing so that we know what it is when we use it in six months.?
Right, just coming up with a meaningful name for something is part of data governance, because if you can't understand the meaning of something that you stored, no one's ever going to look at it. So it's, it's really just that. How do we get the data to be useful? So quality is part of that. And, and once you get into quality, data quality teams can, can also include experts in the field.
But I think primarily looking at what are we gathering? Why are we gathering it? What should we call it? What should the standards be for what is good and what is bad? And when should we stop, right? When should we pump the brakes? Having somebody who's steeped in industry, I think is, is critical in that space.
The second area that I think is actually more technical, but, but still really achievable for anybody in an industry to get into is business intelligence, right? Is starting to work with data right in a dashboard, a visual dashboarding tool, building pie charts, and bar graphs and line graphs, and really exploring what's possible to build, in terms of reports and in terms of KPIs and early warning systems and really experimenting with what's possible in data visualization, because everybody has an opinion on what looks good, and what has meaning and what's valuable.?
That type of role will need to be supported by somebody with a bit more of a academic statistics background. But if you have no understanding of technology, you can still build a dashboard and having people right at that initial experimentation of how can we use this is a really, really great place for somebody very knowledgeable about the industry.
Hugh: That's great. And we're seeing people increasingly take on the BI role, often using power BI, but not only. So yeah, that's great to hear and I think you're starting to see that, but hopefully we'll see it more .
Ryan: So just two, like, rules of thumb that I use in that space, I do actually relatively have the opinion that you need to have at least a couple of people who come from the data technical background to supplement your folks that really know the domain well, and maybe the rule of thumb on that is, and I'm trying to think of it. Maybe I don't adjust it because construction companies have more revenue, I think per IT need that a lot of other industries.
But I would say for every billion dollars of business, you probably need 10 people that really know data in your average industry. And then the second thing is to what Al said earlier, pairing up people who really know the space and training them to the point where they can be conversational with people who really know data for some of these really big data or harder data science predictive problems is the key that I've seen differentiate really successful from just good across industries.?
Hugh: Yeah. I love that. Great rule of thumb, 10 people per billion. Again, it's a rule of thumb, but it's a benchmark people can think about. And you're right. I think translating from the realities of the business and what context provides in terms of what things mean and so on, but then being able to translate that into terms that a pure data scientist can do something with it absolutely is essential. I love that balance between industry knowledge and really deep understanding of the technologies and techniques that are being used, because it's moving so quickly, which actually brings me to the gear that I wanted to get to next, which is, I know both of you, but Ryan especially, have been thinking about distributed cloud. Tell me what you mean by that. And I think you took take it a little further to the idea of kind of IOT to… I'll tell you what, rather than me make, make a mess of it, tell us what you mean.?
Ryan: Sure. So I actually think of it kind of come in the opposite direction from where you were headed there. It kind of started with cloud computing and then we added this concept of the internet of things, IOT, where edge devices are getting sensors put on them with potentially some local computational power to read those sensors and send it back to the cloud.
And when we started to think about how to build these systems. There's a lot of things that you really can't do. Again, I got started in this space in building heavy equipment, right? Tractors that cost hundreds of thousands of dollars. So they could put a fair amount of compute and, telecommunications on those devices without breaking the bank, in terms of raising the cost. A lot of things, though, didn't really fit into that space where your, you know, your average robot in the house, which is a vacuum, couldn't quite imbue as much power. Similarly, I think at the onsite construction industry - actually one of my first IOT talks was at the CribMaster conference where they make the Cribs that hold the tools that are onsite in factories and construction sites.
And a lot of the challenges they were dealing with was around just tracking how many disposable parts you had and how can you actually get the compute out there reliably to the site, such that those systems didn't go down. Where distributed cloud comes in is kind of bridging those two systems together - the cloud on the backend with infinite computational power and the edge where a lot of the data collection happens, where you can get those petabytes of data without having to actually send every bit and byte of those petabytes all the way back to the cloud. Incurring a lot of that overhead costs that I was talking about before.
But the key is before, in the early days of this, it may have been called fog computing. It's possible you may have heard that term, which is basically just saying, take the cloud and bring it all the way down to the ground, meaning out to the edge.?
But even in those systems, you had to learn a totally new way of writing the code and programming and processing data in order to actually build those systems with these distributed cloud systems, there's now a number of different layers. So you have the edge that's really on site, the actual, you know, each individual drill in the crib master example. Then you have local compute provided by Amazon or Microsoft or Google. That can run onsite, but has the same exact programming approach and model as what you would deploy into one of their hyper scale cloud data centers.
So you can build the systems, test them and validate them up in the cloud, and then just deploy them out to the edge. Then there's other layers to that where let's say you're really disconnected and you can't put IT infrastructure in place. I don't know exactly what the right example in construction would be for that.
But then you have 5g connectivity back to a data center that AWS runs in the 5g hotspot. So co located with T-Mobile or Verizon or whoever to be able to give you millisecond response time with full access to everything you'd expect running on cloud computing. So the key is that you can now build these systems, test them in the cloud, and then deploy your computational infrastructure out to the edge, which means that you can now process data on a scale that, you know, maybe even takes us up to that next order of magnitude of you could process exabytes of data, which would be 1000 petabytes or a million times what you could fit into memory meaning real-time analytics of video, audio, so on and so forth.?
Al: So Ryan, could you maybe elaborate on the fog, when you're thinking in terms of construction and how, you know, a lot of what we've been talking about is really gathering data into a central core.
Right? Bringing it to the office. Even if we're talking about field, we're talking about gathering data from the field and bringing it to the office and putting it into a data platform and then having some smart people, putting together all these data products and coming up with insights.
But what you're describing, was sort of bringing the cloud down to the ground. can really, I think, make something very meaningful in construction since, you know, we're, we're constructing the environment. Right. So could you, could you maybe elaborate more on what that looks like in the built environment?
Ryan: Yeah, so I think there's several different levels to what can be done here. As you're thinking about building the construction site, what are you going to bring onsite? If you're a large scale engineering firm or, you know, building those multimillion dollar projects, you can then buy these appliances from Amazon or from Microsoft, bring them to each of your sites and have your office design applications that will run directly on site, giving both privacy of the data, because it never goes back into some centralized location, which may be good for the folks who are actually working on the site, as well as real time applications.?
So a lot of those safety use cases that I hinted at before really require you to know when something is amiss right now, not in 15 minutes, not in five minutes. And that's actually really hard to engineer for when you're sending all of the data all the way from wherever it's being collected to the cloud, running a whole bunch of computation, then sending all of that back out to the sites.?
Then I think downstream, planning for a smart building infrastructure where you're designing these buildings to have computational properties for the eventual owners - and I have done a lot of work with property management firms and the types of problems that they're facing there - ss definitely one of the things that is very much at the top of the list for those companies, as they're looking at their technology strategy, is how do we enable a smart building future, both for, you know, if you're thinking about your JLLs and CBREs for their clients, and,?you know, their tenants, but also for companies that really just own and operate their own infrastructure, whether those are factories or data centers or whatever it might be.
Hugh: This is really exciting stuff. So if you think about what you just described, there's there's, you just described it very well, but there's a lot to it. If somebody wanted to go and learn a little bit more about this, Ryan, where, where might they go? Or just hear it again so they can process it!
Ryan: Yeah, yeah, absolutely. It's definitely something that is very much in the emerging technology space. This is not something that everyone's already doing today and you're behind if you're not thinking through it or having a strategy around it. But that being said, there are really good talks in the main conferences for each of the cloud providers.
So at, at Microsoft Build conference or AWS Re:Invent going and seeing their state of the IOT type of talks would be a good baseline for your technology folks, looking to understand the implementation side of this. You know, I guess I'll give a shameless plug. I wrote an article in the BI journal earlier this year that's published on a blog and maybe we can give the link out there, that goes into a lot more detail on this. And I've done a couple of conference talks on it as well, where there's some visuals to back up what I was just mentioning there. Then the… the last company that's really been doing a lot of the work in this space has been Cisco, the router company, because obviously they're concerned with setting up all the networking infrastructure to enable this.
And so they've got a lot of thought leadership out there as well.?
Hugh: Fantastic. So as we kind of bring this one in for a landing, I wanted to talk a little bit about data culture. So one of the things that is visibly happening in construction is people are shifting from a you know, kind of a low level distrust or just disinterest in, in data to asking for it, to understanding that, look, I don't want to be a data scientist and I don't want to do this for a living, but that, that data point you told me last time really helped us.
So I want more, I want a dashboard. I want to, whatever. So it's, it's happening. It's a huge industry, so obviously there's a long way to go. But how have you seen data... well, how would you even define data culture, let's start with that?
Al: So first off, I think it's actually pretty interesting that, the distrust is starting to melt away finally in construction. I think that actually might be a testament to the new way of doing things. The old way of doing things, which was plan everything ahead of time, build a data warehouse, put a data governance team in place and do everything by hand leads to a lot of distrust. Because nothing ever works in that model. The new ways of doing things with value discovery and rapid experimentation, and, kind of move fast and break things, reduce overhead, increase speed to market, those things are trust-building exercises. So the fact that the ice is starting to break, I think is a really good, really good sign that people are starting to do it right in the construction industry.?
A data culture… in my view a data culture is a culture where people don't look at data as, some sort of objective truth, but one where people look at data as something that's valuable and useful, for some purpose, whether it's decision support -we need to make decisions,?we need some perspective that we can't get just from having a bunch of smart people in the room we also want to get some data into this discussion about decision support.
It can be about a value use case of risk management, where we're aware that there are risks.We could hire more people and train them better. And that's one way to manage risk, but data has a role to play. And a data culture is going to be a group of people who all see data as a teammate, as another technique, as another perspective that they can mix in as opposed to hard facts, objective reality, everybody has to agree to it, this is what's real, and if the data doesn't match, then we're going to go with the data, right?
That's how you kill data cultures.?
So it's an openness to seeing data as a participant in whatever it is that you're working on. And, and a tool that you can add to your toolbox.
That's kind of the beginning of, for me, what a data culture is.?
Ryan: Yeah. And I guess the way that I've thought about it and been thinking about this a lot recently is that data culture is about changing the way that your business is run, such that data provides an underpinning for just about everything that you do.
And that means typically, taking into account a couple of capabilities. First one, I think we've hit ad nauseum here is identifying those use cases and finding the things that you're going to be using data for. At first that's much more targeted. Eventually that becomes much more distributed out into more of a self-service environment.
Second one is around the platforms. And again, I think we've talked quite a bit about the technology underlying this such that when people want to go figure something out, they don't have to go spend six months building a system to enable it.?
Third one is around governance, such that when people are enabled to go do a lot of things in parallel, they don't go run afoul of either the trust of the business or your customers, or, you know, in the worst case, a regulatory body of some kind. And then the last one is really that tracking of what you're doing. So running real experiments becomes kind of that last component that shows can I gather data in a way such that I understand what is causing the results that I'm seeing.?
The real data culture companies, the Uber's and the Facebooks and the tech firms typically, are run that way. That everything's an experiment all the time, in order to continuously improve.?
Now, I do see it as kind of a journey. And if we actually talk about six stages of a journey to data culture, the first stage of that is exactly what you just said for the construction industry. And that's identifying the need for better data and analytics capabilities and the strategic importance of those capabilities. And then, it proceeds through a number of different stages that kind of the later ones are really where you're adding in that experimentation capability of the earlier ones are where you're identifying use cases and building out your initial platform.
Hugh: Fantastic. Okay, gentlemen, thank you for a ton of insight. This has really been great. I want to end with what you think a listener might do next. What, what should they go read or watch? We talked about AWS Re:Invent. We talked about Google and Microsoft Build. Where should, where should they start?
Ryan: I can go first on that one, I guess. So I think it's really about the stage where you're in. So if you haven't done anything in this space and you're just trying to figure out where should the data strategy for your organization go, I strongly recommend focusing in on the use cases first, and there's a couple of different places you can go to in order to read and get insight into that.
Most of the major consulting firms have published information. I, one of the ones that I actually recommend to people regularly is from McKinsey, where they've actually broken down individual use cases by industry, along with the impact that they can drive and potentially some stories to show what they really look like.
But there's also articles from BCG and Bain, so forth. From there, bringing in a consulting firm or someone like that to help you ideate within your specific context of specific data that you have available would be the next step to really get to something actionable. And then the next step is if you have the use cases in place, then it's really about the technology platform to enable experimentation as a first step, meaning,?being able to take static data sets and figure out where the value lives within them before you go building these huge systems that Al and I were describing earlier in the show.?
Hugh: Makes sense. Like anything it's a journey, Al any thoughts??
Al: Yeah. I'll take it in a totally different direction to try to give everybody options here.
I think that if you're going to get started in getting value out of data, the place to start is with your team. And there's a little bit of prep work before you get to your team. And that is going to be about making sure you have serious thinking session about questions and about perspectives.
The obvious pedestrian way of thinking about it is who, what, when, where and why, but some perspectives can be thought of as historical /future; internal/ external; our people/ our subcontractors/ other people, right? There are different perspectives and different places where you can source insight and where you can really get at why we're going to do something with data and sitting with your team and with multiple different teams, and having a, having a frank conversation that really tries to get the juices flowing. If we could have any data we wanted, if we could do anything we want with it, what would we want to use it for? How could you use data? What have you already been thinking about? Right. And really bringing that conversation into context and as a leader, really leading that discussion through the different perspectives. Cause a lot of people will get hung up on, well, what data do we have or what data can we get? So as a leader, having enough context to really say, we can get data from the government, we can get data from the industry. We can get data from our previous projects. We can get data from our future projects if we plan for it now, right?
Knowing, knowing all of the different perspectives across time, space, and people, that you can bring data in and start using it for different purposes, really starting with your team and then continually revisiting that team to talk through, how can we do this fast? How can we do it cheap? How can we do it faster and cheaper?
What experiments can we run, to really see if we can find some of that value? Once that process has started, figuring out who to work with, what technologies to work with, what data to capture starts to fall into place. I think it's really important to start with why we're going to do anything and that would be my recommendation.
Hugh: Well, gentlemen, thank you for being on the podcast. This has been amazing.?
Al: Thanks for having us Hugh, I'd love to come back.?
Ryan: Likewise. It was a great conversation.?
Hugh: Fantastic.
Sign up for news about the upcoming "Data in Construction" book?
Sign up for Data in Construction skills webinars
Follow Hugh here: https://www.dhirubhai.net/in/hughseaton/
Buy The Construction Technology Handbook here: https://www.amazon.com/gp/product/B08PNHBB1M/ref=dbs_a_def_rwt_bibl_vppi_i0