Creating Data Infrastructure for AI and BI At Scale
Creating Data Infrastructure for AI and BI At Scale

Creating Data Infrastructure for AI and BI At Scale

In business today, the difference between success and failure often comes down to an organization's ability to leverage data. Data lets us understand our customers, our markets, our competition, and our own processes and operations. By applying analytics to this data – from basic business intelligence (BI) all the way up to cutting-edge artificial intelligence (AI) technologies like machine learning - we extract insights that help us drive growth, innovation, and efficiency.

Any business can work with data, but as with anything in life, the better prepared we are, and the more in-depth our understanding is of the platforms and processes involved, the better our results are likely to be. As they become more proficient at extracting insights and turning them into business growth, businesses move along what is often referred to as an "analytics journey," becoming more mature in their ability to deploy the technological infrastructure needed to make the magic happen.

The aim is to reach a level of maturity where an organization can really consider itself to be data-driven. I’ve worked with businesses across many industries to help them along this path, and it's my experience that while a lot of them like to say they are data-driven, or follow data-driven business practices, far fewer are actually at the stage where they can really apply data, at scale, throughout all of their organization. This means using it for all of the objectives listed above: Understanding customers, understanding their markets and competition, understanding their own internal operations and processes, and ultimately, using it to create better products and services.

For every business, the journey will be unique, and the way it’s completed depends on the desired outcomes, the strategic objectives of the business, and the resources – including skills – that are available. However, there are certainly some core principles that apply to any business setting out on this journey. In this article, I’m going to cover some of the most important ones, so let’s dive in!

The semantic layer - giving data meaning

Firstly, it’s important to understand that while data may be the fuel of the information age, when it comes to working with it at scale to drive organization-wide growth, it's not that useful on its own.

For a data strategy to be effective, a business needs to implement a "semantic layer" – a level of process that sits between the data and the people whose job it is to make decisions and helps them understand what the data is telling them.

Say you have a business that sells 100 different products, and you’re able to measure how many of each item are sold. The person in charge of buying products might be able to use that information to make basic decisions about what stock is needed. But there’s very little there that will give a marketing person a clue about what customers the business should be targeting with its advertising, and even less that will tell an HR person what employees the company should be hiring.

In traditional BI, the solution can be as simple as creating charts and visualizations that put the data into context and highlight the key findings, along with the recommended course of action.

In more advanced cases, such as when we are looking towards using data at scale, organization-wide, to enable machine learning, the semantic layer needs to be tailored towards the specific user that the insights are intended for. These people – sales staff, marketing staff, HR staff – may very well not be data professionals themselves, but it’s clear that they can benefit from having better access to data or, more precisely, better access to the insights it contains. An intelligent semantic layer imparts meaning to the data end-user in a way that is specifically helpful to them.

Data for everyone

When planning data infrastructure, a guiding principle should be that all information, regardless of where it originates from in the business, needs to be accessible to the whole business. Traditionally, businesses have often fallen into the trap of keeping data “siloed” within the department or operation where it’s generated. Without a unified structure – such as a data warehouse or data lake strategy – for storing information, it can end up stuck in databases or data formats that others who can benefit from it might not be able to use or access – or even know that it exists!

To illustrate this, data scientists – including Kirk Borne, chief science officer at DataPrime - sometimes uses an analogy involving an elephant in a room full of people wearing blindfolds. With only their hands to work out what is in the room with them, one might feel the trunk and say, "It's a snake," another might feel the legs and say, "It's a tree trunk," and another might feel the tusks and say "It's a spear."

Until they start putting together what they know, it’s very difficult for any of them to tell what they are dealing with.

In business, we often have marketing datasets, financial datasets, manufacturing datasets – all valuable within their own departments, but putting them together – breaking down siloes and taking a unified approach to data strategy – can potentially make them much more valuable to the business as a whole.

Two approaches to achieving this are known as the data warehouse and the data lake. To put it simply, a data warehouse is a unified repository for processed data, that conforms to standardized structure and labeling, ready for use in BI. The concept was first defined by Bill Inmon - known as the father of the data warehouse – and it is often the foundation of enterprise BI and analytics strategies. However, as a model, it isn’t always flexible enough when it comes to handling the new and exotic types of unstructured data that businesses need to work with today (more on this in the next section!)

?A data lake, on the other hand, is a unified repository for raw, generally unstructured data that data scientists might find any number of ongoing uses for. Today, Inmon likes to talk about an approach termed “data lake house," which attempts to build some of the architecture of the data warehouse model onto the data lake – thereby preventing it from becoming a “data swamp”!

Use new types of data

Most businesses have some proficiency at getting insights from very straightforward, simple data, such as structured transactional data. But for the really valuable insights – the sort that can be a differentiator between an innovation leader and an also-ran in a competitive marketplace – we have to be a bit more adventurous these days!

Unstructured data is the sort of data that doesn’t fit neatly into rows and columns of a traditional computer spreadsheet – it includes picture and video data, audio data such as recordings of conversations and telephone calls, and written text, like emails, customer comment slips, and even handwritten doctors’ notes.

Structuring this data in order to analyze it at scale involves working with advanced, AI-based technologies like computer vision and natural language processing. But considering that this type of data is the most abundant by far – accounting for over 80% of the data generated by business – ignoring it means overlooking what is potentially your most valuable source of insights.

Building a data culture

In these days of cloud platforms and services that can quickly be configured to fill just about any data requirements a business may have, getting the technology right is the easy part when it comes to leveraging data at scale.

More tricky is getting the human elements right – and this is where culture comes in. Building a data culture means creating an environment where everyone is a stakeholder in moving towards data-driven decision-making, innovation, and growth. Many well-intentioned data initiatives have been grounded because of an insufficient level of buy-in – both at the executive leadership level or among the wider workforce which has to put them into action – or a lack of belief in the value it will bring.

Good ways to start making sure this culture is in place include making sure data is available to everyone and has meaning for everyone (as discussed above), as well as focusing on "quick win" initiatives that demonstrate value with a minimum of invested time and resources. Data infrastructure should be designed to facilitate a culture of experimentation and innovation at all levels, so employees can quickly test ideas and measure results, regardless of their role.

Andrew Sohn is VP of Enterprise Data & Data Products Delivery at Inspire Brands, the second largest restaurant company in the US, and owner of brands including Arby's, Dunkin, and Rusty Taco. After close to doubling the size of the company in under a year, thanks to a number of acquisitions, Sohn is now leading a number of initiatives aimed at embedding analytics and data-driven decision-making throughout the business.

You can hear Sohn speak about this analytics journey – as well as hear insights from Bill Inmon (father of the data warehouse), Kirk Borne (DataPrime) and Soham Bhatt (Databricks – key architects of the data lakehouse concept) during a panel talk hosted by AtScale on Thursday, November 18, 2021 at 11:00 AM PT (2:00 PM ET). Register to take part by clicking here.?

No alt text provided for this image


Thank you for reading my post.?Here?at LinkedIn?and at?Forbes?I regularly write about management and technology trends. To read my future?posts simply?join my network here?or click 'Follow'. Also feel free to connect with me via?Twitter,??Facebook,?Instagram,?Slideshare?or?YouTube.

About Bernard Marr

Bernard Marr?is a world-renowned futurist, influencer and thought leader in the field of business and technology. He is the author of 18 best-selling books, writes a regular column for Forbes and advises and coaches many of the world’s best-known organisations. He has over 2 million social media followers and was ranked by LinkedIn as one of the top 5 business influencers in the world and the No 1 influencer in the UK.

Kenneth Dobrowolski

Bachelor of Applied Science - BASc at United States Air Force Test Pilot School

2 年

I have been working on an anti-gravity disk that is of my own design. I am an electro-mechanical engineering entrepreneur. This design is Classified and will transform the way we travel forever.

回复
Berdine Hugo

Enabling Municipalities to implement a Records Keeping and Workflow Optimisation System that is simple, easy, less time consuming and works

2 年

Fantastic, informative article Bernard Marr - Thank you for posting.

回复
Suraj Juddoo

Executive Director, Mauritius Digital Promotion Agency

2 年

Guess there are challenges all along a data pipeline, and having a good governance framework might be handy for many organisations

回复
Mojtaba Mohammadi

Etvto Elearning at etvto

2 年

Excuse me, my questions are a bit rudimentary. What roadmap do you suggest for me to enter the field of studies in this field and use data analysis and artificial intelligence in skill training programs in my organization?

回复

Interesting and useful article. Generally, data and it's importance to understand customers, markets , competition and organization's success / failure are most noted points but building a data culture within organization is truly a foundation for data-driven decision-making.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了