Data Nugget January 2025

Data Nugget January 2025

January 31, 2025

Welcome to the first 2025 edition of the Data Nugget.

This edition of the DAMA Norway newsletter covers insights on?AI, data modeling and data governance.? We have an interesting read about the role of data governance in the AI race. Then a?nugget about data modeling for data warehouses specifically. There is a new episode of DAMA's podcast series that discusses AI in Norway and lastly a nugget on NARS, a generative intelligent system.?

Happy reading!

Let's grow Data Nugget together. Forward it to a friend. They can sign up here to get a fresh version of Data Nugget on the last day of every month.



The AI race push and role of Data Governance in it

Nugget by Achillefs Tsitsonis

2025 has kicked in and the AI race continues and intensifies into new directions. Large Language Models development is reported to be reaching limitations in terms of training data and the focus shifts towards artificial/synthetic data use to overcome these struggles. Moreover, this is a year where AI agents development expected to gain significant traction and boost the overall growth of the field. Already now, it is easy to find multiple posts online about glorified automation workflows posing as “AI agents” promising new capabilities and speed but at the same time disregarding basic ones like quality. The global data community thus, often finds itself in a struggle between boosting innovation, balancing for regulatory/ethical concerns and ensuring the quality of products and services.

In this landscape what is the role of Data Governance and what are the AI perspectives that need to be taken into consideration? I have touched the subject of Data vs AI governance in a previous “nugget” and I still believe that what is often referred these days as AI governance, is but a subset of what Data Governance is and that the basic principles that need to be used remain the same. The only real difference is the domain of application.?

So, it seems to be that data governance finds itself to be in a rather peculiar spot right now, where it must juggle between the innovation promoter or the regulation enforcer. This has been a constant challenge for all data professionals working with Data Governance. I would however argue that there is little to no need for data governance to operate such a balance. It all boils down to lifting data governance as an enabler rather that an enforcer in your organization. Data governance is not here to slow down growth or innovation within AI. It is rather to help organisations understand their data better, succeed with their endeavours faster, while eliminating potential risks that might prove extremely costly or even dangerous in the development of AI solutions and services. With ensuring the promotion of the enabler role of data governance we can provide a key element to the success of working with data. Which is the answer to one of the most common question data professionals faces while working together with business users. Why should I care about Data Governance, what’s in it for me????

To achieve these goals, people play a key role, and they are expected to play an even more important one going forward in terms of AI development. Here are some application areas where the people element is crucial for data governance and the overall success of the solutions developed:

  • Definition of policies and data governance frameworks, making sure that AI applications are developed with fairness, transparency, and accountability as core principles.
  • Data collection and curation, ensuring data are both high quality but also relevant for their intended use, reducing thus bias risks and improving effectiveness and transparency.
  • Ensuring regulatory compliance via collaboration with legal teams, while using the opportunity to provide extra business value through establishing processes that go beyond the regulations themselves.
  • Establishing an ethical decision-making mindset, through use of cross-functional teams of both engineers and business users. Aim to define ethical principles that reflect both the developer as well as the consumer of the application.
  • Monitor and improvement by establishing a human in the loop design throughout the development of the application to constantly evaluate and improve based on the purpose of the application. This should also be applicable during the post launch phase.

As AI development continues and especially as more and more synthetic data is used for training of machine learning models, the human aspect is more and more critical in ensuring AI applications are ethical, effective, and aligned with societal values. Furthermore, transparency and active engagement of all stakeholders, internal and external, will improve the AI development cycle of such applications, thus building trust and driving innovation while mitigating risks and related costs.


Data Modelling for a DWH

Data Modelling for a DWH

Nugget by Gaurav Sood

A data model for a data warehouse (DWH) is a conceptual representation of the elements and the relationships between these data elements that make up the DWH. This model is the starting point to inform the stakeholders about how data is organized and accessed within the organization. The notation and elements of the entity-relationship diagram (ER diagram or ERD) are commonly used to model data warehouses. There are different types of data models

Transactional Data Models

Transactional models are designed to optimize the management of transactions in business applications. Their primary objective is to maintain the consistency and integrity of the information stored in the database. Another key objective is to guarantee efficiency and agility in the processing of operations originating in transactional applications, such as e-commerce or online banking.

Dimensional Data Models

Dimensional models are designed to facilitate the analysis of large volumes of data for informed decision-making. Their primary objective is to provide summarized information to business intelligence and data science systems – and to do it at maximum speed. Very often, dimensional databases are fed with historical and synthesized information from transactional databases. It is also common for these databases to combine information from different sources to establish correlations between facts that, at first glance, may seem unrelated.?

Designing a Data Model for a Warehouse

In transactional databases, the design is based on normalization technique. This technique ensures that the database structure itself avoids data inconsistencies because of insertion, modification, or deletion. While working on actual creation of the data model for a DWH, you may be tempted to use your favorite SQL client and start creating tables right away. This may work for a very small data warehouse. But data warehouses are not exactly known for being small. For this reason, using the “quick & dirty” methodology to create a data warehouse is decidedly a bad idea. This is mainly because, once the data warehouse is full of data, modifying its structure is a cumbersome task. We must keep in mind that a data warehouse – even more than a database – is a strategic and critical tool for business. For this reason, we must minimize all possible risks during its construction. So, we follow the steps mentioned below for the creation of a robust DWH

Step 1: Understand Business Objectives and Processes

First step is requirement engineering work, in which you gain an overall understanding of the information and results you expect from using the data warehouse. As a result of this phase, you should get a detailed description of data warehouse requirements; this serves as input for the next phase.

Step 2: Create a Conceptual Model

Using the detailed requirements obtained in the initial phase, start building a conceptual model that provides an overview of the two main types of tables in any data warehouse:?fact tables?and?dimension tables and their relationships.

FACT tables

The fact tables are at the center of the Data model. They contain two fundamental types of attributes: numerical measures and dimension identifiers. Numerical measures are aggregate values (totals, averages, etc.) for each combination of dimension identifiers. Dimension identifiers are usually foreign keys to the dimension tables surrounding the fact tables. There are many different types of dimension tables, which types you use depends on how the tables are maintained and the kind of information they store.?For simplicity, you can think of dimension tables as lookup tables for identifiers that appear in fact tables, such as product SKUs, customer codes, vendor codes, etc.


Pictue 1: The basic structure of a fact table consists of a set of dimensions and a set of measures containing aggregate values.

DIMENSION tables

Dimension tables are supposed to hold information about the aspects of the business that don’t change very often. Dimensions are like the attributes which enhance the information provided by a transaction for example, the product description, product name etc in the above example.

Step 3: Define the Shape of the Data Model

The conceptual model should show the shape that the data model will have. This shape will be determined by the distribution of the fact and dimension tables.Three fundamental types of Data models are recognized globally, named for the similarity of their shapes:?star,?snowflake, and?constellation.

Star Schema

In a star data warehouse schema, a single fact table is placed in the center of the diagram. All dimension tables are related to the fact table by a foreign key in the fact table. In the diagram, the dimension tables surround the fact table, giving it a star-like shape.?


Picture 2: Basic form of a star schema with one fact table (green) and five-dimension tables (blue).


Snowflake Schema

In a snowflake schema, the fact table is surrounded by small clusters formed by hierarchies of tables. Each of these hierarchies is a normalized sub-schema that is associated with a dimension of the fact table.?


Picture 3: Basic form of a snowflake schema with one fact table (green) and two-dimension tables (blue), each one related to two grouping tables (red).


Constellation Schema

In constellation schemas – also called galaxy schemas – several fact tables appear. Each of them responds to a different business information need and has a set of dimensions surrounding it. Some dimension tables may be shared between the different fact tables. In the model we will build below, you will see an example of a constellation schema. It will have two fact tables: one for sales and one for procurement (i.e. purchases made by the company).

Step 4: Design the Conceptual Data Model

For the construction of the conceptual model, we do not need to define the diagram down to the smallest detail. It is enough to show the fact tables, the dimension tables, and the relationships between them. This diagram will help us explain the model to users and stakeholders. The goal is to obtain feedback and approval so that when the database is running there is no possibility of misunderstandings and complaints.


Picture 4: In this constellation schema, fact tables are green, shared dimension tables are yellow, and the rest of the dimension tables are blue.

Step 5: Create the Logical Data Model

The conceptual model allows everyone involved in the process to verify that the model meets the requirements and to give their approval for further development.?To complete the logical model for our data warehouse, we need an entity for each fact table and for each dimension table. Then we need to establish the corresponding relationships between them. In the logical model, we must include all the entities and all the attributes. Don’t overlook any of them!


The logical model includes all the entities and all the attributes that make up our data warehouse. Our data model for a warehouse includes two fact tables: one for?Sales?and one for?Procurement. Both share the dimensions?Time?and?Product, since the same data is used to characterize both sales and procurement facts. Then, both fact tables are related to dimensions that are specific to each business process:

  • The?Sales?fact table relates to the?SalesPerson?and?Client?dimension tables.
  • The?Procurement?fact table relates to the?BuyerAgent?and?Provider?dimension tables.

At this point, it is important to submit the data model to a validation process that will give us the greatest possible assurance that the database can be implemented without errors. This validation process will also minimize risks to information integrity. Some of the common issue caught at this point are

  • Entity name repetition.
  • Attribute name repetition within the same entity.
  • Entities without attributes.
  • Entities without primary identifiers.
  • Attributes with different data types involved in a relation.
  • And many other possible errors that could cause future problems.


Step 6: Create the Physical Data Model

Now we need to generate a physical model from the logical one. At this point we need to think about the Database engine that we want to use for building and querying our DWH.


In this case, we picked Google BigQuery as the target database engine. Some issues might arise at this point because of the choice of the Database engine. An example is columns with data types that are incompatible with the warehouse tool on which the model will be implemented.

Step 7: Implement the Model

The last step required to build our data warehouse is to implement the physical model on the target DBMS. This step consists of generating the scripts that must be executed on the DBMS to create the database. If you did not skip any step and your models were properly validated, the scripts to implement the database will run smoothly and your data warehouse will be ready to fill with information. If all these steps are followed the resulting DWH will be near perfect. Remember the needs of the? business are always changing so it is quite possible that from the time you took the requirements to the time it took you to deliver the DWH, things might have changed and you will have to adjust your DWH model accordingly.


MetaDAMA 3#5 - Norway & AI (Eng) with Alex Moltzau

Nugget by Winfried Adalbert Etzel

?How well are we rigged in Norway to handle this?? What a fantastic talk -? With so much happening in Norway in autumn 2023, I brought on Alex Moltzau for a chat in AI policy and Norway. Alex Moltzau is a Senior Policy Advisor at the Norwegian Artificial Intelligence Consortium (Nora.ai), and one of the most outspoken experts on AI policy and ethics in Norway.

  • Throughout the last years, there has been a significant change in public attention to AI, even though AI has been part of our lives for quite some time.
  • There is a great AI community in Norway, with great research that is done.

What is NORA.ai?

  • NORA is a Norwegian collaboration between 8 universities, 3 university colleges and 5 research institutes within AI, machine learning and robotics.
  • NORA is strengthening Norwegian research, education and innovation within these fields.
  • NORA’s ambition is International recognition of Norwegian AI research, education and innovation.
  • NORA’s vision is excellence in AI research, education and innovation.
  • NORA is active both in the Nordics, but also collaborating broadly on the international stage, like exchange programs for Ph.D. students, collaboration with other national institutes, contribution to eg. OECD, even contributing to shaping bi-lateral agreements, +++

Why AI policy?

  • There is a growing concern in society about AI and its impact on our lives, how it affects elections, misinformation, our work
  • How can AI help us to handle information on our citizens more effectively?
  • How does AI affect our children, their learning?
  • There is a misconception, that we don’t have sufficient regulations for AI. Existing laws apply to AI as much as to other methods and technologies.
  • What kind of infrastructure do we need to build in society? Is language an important infrastructure for our society?
  • What is the public infrastructure, the public good we need to invest in as a nation?

State of AI in Norway

  • What Government mechanisms are we going to build to handle artificial intelligence?
  • There are three major announcements that have shaped the state of AI in Norway during the last weeks and months: (1) The AI Billion: The Norwegian Prime minister has announced that the Norwegian Government will invest 1 billion NOK in AI over the course of 5 years. (2) The Ministry of Defense has published their AI strategy. (3) A new Ministry of Digitization and Governance has been established in the Norwegian Government, with responsibility of AI.

  • Internationally there are two concerns around AI that are predominant: (1) Security - how to ensure cyber security and reliability in models. (2) Bias - how to tackle bias in AI systems, work with fairness and trust.

  • We need to ensure that possibilities through AI configure to our Norwegian society.
  • We need to think about the values we have build our society on, and how AI can support these values.
  • Norway is earlier than most countries on actively working with regulating AI, eg. in relation to privacy.
  • AI is about implementation - it is about trying, failing and trying again.
  • We need to minimize possibilities for disaster, by taking learning from other countries.
  • There need to be mechanisms to ensure that the cost of compliance with regulations is not too high.

The role of Data Professionals

  • We would love to see data folks should take a more active role in society in regards to help everyone to understand the challenges within data and AI better.
  • Data Management professionals can ensure safety and trust in our society going forward, and should therefore have a more active role in politics.

You can listen to the podcast?here?or on any of the common streaming services (Apple Podcast, Spotify,? etc.)?Note: The podcasts in our monthly newsletters are behind the actual airtime of the MetaDAMA podcast series.



Nugget by Isa Oxenaar

The project was started by Dr Pei Wang who works at the department of computer and information sciences at Temple University in Philadelphia, USA. In the web-book “a Logical Model of Intelligence” he explains the project in more detail. The article Self in NARS, an AGI System” by himself and two other writers explains the use cases for the NARS system. There is also an informational page on how to apply NARS that instructs on practical use cases of the project. Since about seven years an open source version of the NARS project called open-NARS has been maintained on github, and a more recent open source project focuses on NARS for applications.?

These sources can support those working in data management that are interested in an alternative view?on AI to dive into this general intelligence system that aims to support the growth of AI in a possibly more data efficient way.


Thank you for reading this edition of Data Nugget. We hope you liked it.

Data Nugget was delivered with a vision, zeal and courage from the editors and the collaborators.

You can visit our website here, or write us at [email protected].

I would love to hear your feedback and ideas.

Isa Oxenaar

Data Nugget Head Editor


Ronald Baan ???

Trainer and Mentor in Data Management | Active on the boards of DAMA International and DAMA NL

3 周

Great Nugget again!

要查看或添加评论,请登录

Data Management Association Norway (DAMA)的更多文章

社区洞察

其他会员也浏览了