Data Nugget May 2023

Data Nugget May 2023

30 May, 2023

A new month, a new nugget! We welcome everyone?to the latest episode of our #DataNugget. So before you go out on a vacation, grab a cup of coffee and enjoy this month's edition.?

First and foremost, we have brought an interesting read on the importance of data governance and information security on unstructured data. Second, we are thankful to our guest contributor for an interesting overview of the API strategy. Third, we have a quick overview of the data-driven culture across organizations.?And last but not the least, we bring the next episode of the podcast series on sustainability and AI in transportation.?

Enjoy reading!

Let's grow Data Nugget together. Forward it to a friend. They can sign up here to get a fresh version of Data Nugget on the last day of every month.


Ingen alternativ tekst tilgjengelig for dette bildet

Importance of data governance on unstructured data

This nugget is the first part of the two-part series. Contribution?by? Sylwia Harewska .

Even though data is a valuable asset for businesses in today's era and many aim to make data-driven decisions, it is quite a complex topic. Firstly, it needs to be acknowledged that today’s data is much more complex than before. Organizations must understand and analyze insights, know how to use it legally and manage it properly. Secondly, not all data is equally created. Data can be structured, like highly organized and easy to access; or unstructured, such as text or audio. Some of the data captured and created by organizations is structured, but most of it is unstructured.?

Since #UnstructuredData represents 80% to 90% of all new enterprise data (Gartner, 2022) businesses try to adapt to handle the increasing amount of unstructured data. The example of unstructured data can be text like an e-mail or a chat with the customer service, audio such as call center recordings, contracts, Internet of Things (#IoT) sensor data, and more. That's a source with the potential to gain competitive advantage for organizations which know how to use it.

According to the survey from Komprise 2022 ”State of Unstructured Data Management” 65% of organizations plan to or are already delivering unstructured data to their big #DataAnalytics platforms (Komprise, 2022). This is a huge change compared to the survey conducted by Deloitte in 2019 where only 18% of organizations were able to take advantage of such data (Deloitte, 2019). Most probably COVID-19 pandemic was one of the reasons for this significant shift.

Besides big opportunities, unstructured data can bring big challenges. In this case, the challenge which companies struggle with is searching, managing, and analyzing unstructured data (MIT, 2021). In addition, unstructured data poses several risks to organizations. There can be technical and compliance related risks. Most common risk for organizations is a data breach. It is quite challenging to secure and monitor unstructured data, which is often stored in locations which are decentralized. For companies that are heavily regulated like the financial sector, the big risk is compliance issues. Organizations can expect that unstructured data often contains sensitive information. This type of information is regulated by the European Union’s General Data Protection Regulation (#GDPR), the Health Insurance Portability and Accountability Act (#HIPAA), Payment card industry compliance (#PCI) and other regulations. The failure to comply can result in heavy fines and legal penalties, which can have significant impact on the company’s future.

Unstructured data is also more often affected by data loss due to the challenge of backup and recovery during disaster. This can lead to major business disruptions if critical information is lost. Unstructured data can also pose risks to #DataQuality, as it is often entered inconsistently or inaccurately, leading to data inconsistencies and errors. Poor data quality can cause poor decision-making, financial losses, and customer dissatisfaction. Lastly, unstructured data is often not governed by #DataGovernance policies and procedures. It makes it more challenging to ensure data accuracy, consistency, and security. The lack of data governance can also limit collaboration and block #DataDriven decision-making (Egli, 2016).

In order to identify, understand and govern the unstructured data, two disciplines of #DataManagement Framework from #DAMA DMBOK are essential: Data Governance and #InformationSecurity.


Ingen alternativ tekst tilgjengelig for dette bildet

API Strategy: Conway's law and the inverse Conway manoeuvre

Guest contribution by? Mikael Wallén .

API Strategy

Defining an IT and digitalization strategy is crucial regardless if you are a small, midsize or a large company. This will give guidance, priorities, and a path on where,?how,?and what your business should focus on over the next 1-, 3- to 5 years. In addition to this, defining an #APIstrategy connected to your IT & digitalization-strategy will give guidance on?how?you as a company make the most out of your digital assets.?

There are thousands of ways to write a strategy, specifically an API-strategy. During the API era, which span over the?past 20 years or so, many companies have built their API strategy around?#ConwaysLaw.?Conway’s Law?aims to build systems that closely reflect an organisation’s internal structures. It was formulated by programmer Melvin Conway more than 50 years ago, where he stated:?

“Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.”

Trying to describe this in a simple picture – where we mirror our system design to our company structure – would look something like this:

Ingen alternativ tekst tilgjengelig for dette bildet
Figure 1: Conway's Law illustrated

On paper, this looks pretty good. To further emphasis this we can illustrate what your organisation is more likely to produce in terms of software using?Conway’s law:

Ingen alternativ tekst tilgjengelig for dette bildet
Figure 2: Conway’s Law, paraphrased

Let’s look at an example to understand. Suppose your organization deals in both wine and beer, and as part of your #digitalization strategy, you are looking to build a few customer-facing inventory APIs. If the structure of your organization is heavily divided between “beer people” and “wine people” – each unit having separate tech teams – then you’re likely to end up with two APIs, one for each product?coded very differently.

Criticism of this model claims that the communication structure is not always reflecting the organisation structure, causing a complex system environment because people in?and outside the organisation chart have?different relationships and communication structure.?

The inverse Conway manoeuvre

The concept of Conway’s Law can also be applied in a reverse manner. For example, having a target architecture in mind, we can challenge and form the communication structures within the organisations and by doing that, we?achieve our planned design.?If an organisation uses a service-based architecture, each component is developed and managed independently from each other. Individual teams will be responsible for an individual service, having complete and independent decision-making capabilities.

Looking at our “beer and wine” example above, and by applying a service-based architecture, our organisation will use the same product, marketing, and tech teams for both beverages. This will more than likely result in just one API – beverage?– that does it all in a more controlled matter.

Understanding and applying Conway’s law is a crucial building block of a modern company where we want to bring organisation and technology together as closely as possible. We want to break down communication silos to speed up productivity and quality. This is, from experience, more an organisational challenge than a technical one. Technical challenges tend to be the easiest ones to change. But without being prepared to change the organisation and the communication structures, I promise you this: You are doomed to fail.


Ingen alternativ tekst tilgjengelig for dette bildet

Data-driven culture: A short overview

Nugget by Isa Oxenaar .

A data-driven culture can be summarized as a culture in which everyone involved is able to interact with data to make better decisions. A?key characteristic of the users, i.e., the employers is, therefore, #DataLiteracy.

Some useful steps for creating a data culture are listed in the 2020 article “10 steps to create a #DataCulture” by David Waller.?

  • Step 1. Data-driven culture starts at the?top.
  • Step 2. Choose what metrics to use with care.
  • Step 3. Make the boundaries between data scientists and the business porous and insist on employees being code literate and conceptually fluent in quantitative topics.
  • Step 4. Fix basic data access issues quickly.
  • Step 5. Quantify the uncertainty level of the data.
  • Step 6. Make proof of concepts practical and robust, not promising and fancy.
  • Step 7. Time specialized training for employees wisely.
  • Step 8. Empower employees through data fluency.
  • Step 9. Pick canonical metric and programming languages for your business.
  • Step 10. Let data scientists explain their analytical choices.?

Which of these, if any or if not all, are still useful in 2023? A big shift from 2020 is that businesses are now aware and convinced that data is essential to decision-making, a data-driven culture is not seen as elusive any longer.

The showcase used in the article “What does it actually take to build a data-driven culture?” written by Mai B. AlOwaish and Thomas C. Redman in May 2023, describes the key components for the successful transition to a data-driven culture at Kuwait Gulf bank. Two years of dedication has already made a difference. The two writers of the article were leading the transition, and this is what helped:?

  • They focused on #DataQuality. This somewhat aligned with step 6: focus on quality, robustness, and practical solutions instead of promising ideas or quick fixes.
  • A way to get everyone involved in the cultural shift at the company was to help them realize that each of them is a data customer and a data creator: no one can do their tasks without data. This realization empowered employees as in step 8.
  • They assigned data-ambassadors to teams, a top-down approach as addressed in step 1, and used world-class training, media and branding for the ambassadors to lower their skepticism about the extra work their new position entailed.
  • After getting the ambassadors aboard, they got the rest involved by creating a “Data 101 program”. This aligns with step 7; they timed specialized training wisely. As a consequence, all the employees started using the new skills to innovate on their own.?

So, some of the steps listed in the 2020 article were, indeed, still useful in the transition made over the last two years at the previously named bank. What seems most key after aligning the 2020 and 2023 article is empowering employees through data literacy, getting everyone involved and choosing 'data'?quality over quick fixes.

Read more here.


Ingen alternativ tekst tilgjengelig for dette bildet

MetaDAMA 2#9:?Sustainability & AI in Transportation

Nugget?by? Winfried Adalbert Etzel .

How can we use AI to work more sustainably?and optimize our operations for less pollution and more efficiency? I talked with Umair M.Imam , Head of Data Science, Data Warehouse and Artificial Intelligence at Ruter As , the public transport authority in the Oslo region, the Norwegian capital. Umair is also an associate professor at OsloMet – storbyuniversitetet , teaching #AI to bachelor degree students,?founder and CTO of Bineric Crowdsourcing, and founder of the volunteer organization Offentlig AI. Here are some of the highlights of the conversation:

Public Transportation

  • Public transportation is complex?because data is not coming in from many different internal and external sources.
  • A bus can have minimum 20 sensors.?All theses sensors are sending realtime data.
  • A huge amount of data is collected through external sources.
  • Ruter is going away from centralized data management teams to more of a mesh approach.
  • #DataMesh will give you a more complex data function?with need for more people and more organization and coordination.
  • One team cannot have the full ownership for the entire data-driven prerogative of a company.
  • Two factors that helped Ruter succeed with data and AI:?

???????- E2E responsibility for the whole AI algorithm

???????- Create in Production and don’t overdue PoCs

Sustainability & AI

Capacity prediction

  • Predict capacity three?days in advance, but possible up to 1 month in advance.
  • This gives passengers the possibility to plan their trips better.
  • An operations team that sees a pick in traffic real time?send additional buses to ensure enough capacity.
  • Through the prediction algorithm, fleet capacity can be reduced, and it is easy to plan for balancing the load beforehand.

Fleet management

  • The long-term vision for Ruter is to work more with order services. In the future, you would not have to walk to a bus stop, but can order a bus to your home.
  • The existing solution is for seniors (67+) and is tested in the Viken area.
  • But how can you ensure that the busses are close to an eventual future order?
  • Ruter is training an algorithm to predict where orders might come from to ensure a bus is parked close by. This results in less driving and less emissions.
  • To train the algorithm, Ruter uses mainly historical information, but combined with weather information, as an example.

Analysis of customer feedback

  • Sentiment analysis to see how happy or unsatisfied a customer is.

Explainable AI

  • AI is just statistics on steroids!
  • It is hard to explain how a probability output is achieved.
  • That makes an AI algorithm a #BlackBox.
  • Developers create a set of tools and applications, which can give insight on different factors, such as how a decision was achieved by an AI algorithm.

Quantum Computing

  • Traditional computing is expensive and needs more time and resources to reach a specific output.
  • Volumes of data are constantly increasing.
  • This was a research project together with OsloMet.
  • #QuantumComputing was cheaper to work with than traditional computing.
  • Quantum computers are more sustainable and energy efficient.

You can listen to the podcast? here ?or on any of the common streaming services (Apple Podcast, Google Podcast, Spotify, etc.)



Thank you for reading this edition of Data Nugget. We hope you liked it.

Data Nugget was delivered with a vision, zeal and courage from the editors and the collaborators.?

You can visit our website here, or write us at [email protected].

I would love to hear your feedback and ideas.


Nazia Qureshi

Data Nugget Head Editor

Jarle Kalberg Thanks for your feedback. The link that you referred to is a Forbes article, you will need a subscription or registration to open that. Going forward, we'll try to add an article link that is accessible to everyone ??

回复
Jarle Kalberg

Senior Data Governance specialist

1 年

The "read more here" link in the section on unstructured data does not seem to work... Can you update it please? :)

回复
Sylwia Harewska

Senior Data Governance Advisor at DNB bank|| VP Finance & Partnership at DAMA Norway

1 年

This is a big day ?? thanks a lot for giving me this amazing opportunity to write my first Data Nugget! ??

要查看或添加评论,请登录

Data Management Association Norway (DAMA)的更多文章

社区洞察

其他会员也浏览了