登录查看更多内容

The Model Openness Framework: A Practical Approach to AI Transparency

Vincent Caldeira

Chief Technology Officer, APAC at Red Hat ? Technical Oversight Committee Member at FINOS ? Green AI Committee Member at Green Software Foundation ? Technical Advisor at OS-Climate ? Technology Advisor at U-Reg

发布日期: 2025年1月29日

The original MOF research paper (March 2024) defined a three-tiered classification for AI models based on their openness:

Class 1: Open Science Model – Full access to model weights, training data, code, and architecture under open licenses.
Class 2: Open Tooling Model – Some components (e.g., training data) are restricted, but model weights and architecture remain open.
Class 3: Open Model – Only certain details, such as weights or inference code, are available.

While this research set a foundation for defining AI openness, the newly released MOF Implementation Framework translates it into practical evaluation criteria that organizations can adopt. It strengthens transparency requirements, provides guidance on licensing and reproducibility, and introduces an assessment process to standardize how AI models are classified.

AI Openness Is Not Binary: Moving Beyond Open vs. Closed Models

I believe that from a practical point of view, AI models exist on a spectrum of openness, not a simple open vs. closed divide. Some models release code but not training data; others open-source weights but restrict commercial use. The MOF framework acknowledges this complexity and provides structured criteria to evaluate transparency levels accurately.

Rather than enforcing full openness, the framework promotes completeness of disclosure, ensuring that enterprises can make informed decisions about the AI models they adopt.

A key example of this is DeepSeek-V3 and DeepSeek-R1, two recent open-source AI models that push the boundaries of open innovation while also demonstrating the limitations of transparency in AI development.

DeepSeek-V3 and DeepSeek-R1: Advancing Open-Source AI with Partial Transparency

Technical Innovation Through Open Research

DeepSeek has contributed significant advancements in reinforcement learning (RL) and distillation techniques through its latest models:

DeepSeek-V3 has introduced improvements in RL-based fine-tuning, leveraging reinforcement learning from AI feedback (RLAIF) to enhance alignment and control.
DeepSeek-R1, which is primarily designed for advanced reasoning tasks, focusing on areas such as mathematics and coding, has demonstrated the use of effective distillation techniques to impart advanced reasoning capabilities to smaller models, enhancing their performance beyond that of their original base versions.

These contributions accelerate innovation in the open-source AI ecosystem by allowing researchers and developers to build upon state-of-the-art training methodologies.

Limited Transparency in Training and Tuning Data

Despite their technical openness, DeepSeek-V3 and DeepSeek-R1 do not fully disclose their training and fine-tuning datasets, which is a common issue in today’s AI landscape. The primary concerns include:

领英推荐

Putting Generative AI To Work Inside The Enterprise

Bernard Marr 9 个月前

Data-centric approach vs model-centric approach

Steve Nouri 3 年前

Scaling Generative AI Models: Key Challenges and…

Miracle Software Systems, Inc 1 个月前

Lack of clarity on data provenance – The specifics of how DeepSeek fine-tunes its models remain undisclosed. While the base model training process may be partially documented, the fine-tuning data plays a crucial role in model alignment, safety, and performance.
Limited reproducibility – Without full access to training data, independent researchers cannot verify potential biases, fairness issues, or ethical considerations in the data used.
Compliance and enterprise risk – Organizations adopting DeepSeek-based models must assess whether the lack of transparency in tuning data affects their governance, compliance, or regulatory requirements.

The MOF framework provides a structured approach to evaluating such models, recognizing their contributions to open-source innovation while also flagging areas where transparency is incomplete.

The Reality of Data Management: Why Full Openness Is Unrealistic

While transparency is critical, it is not always realistic for AI models to fully disclose their training data due to:

Privacy and Regulation – Compliance with laws like GDPR and HIPAA often prevents dataset release.
Intellectual Property – Proprietary datasets give businesses a competitive edge.
Security Risks – Public data disclosure can expose models to adversarial attacks.

Instead of requiring full data openness, the MOF framework emphasizes clear documentation of dataset sources, bias considerations, and processing methods. This enables organizations to assess risks, compliance, and ethical considerations without needing direct access to training data.

The DeepSeek models illustrate this balance—contributing significantly to the AI research community while retaining commercial and strategic control over fine-tuning data.

Transparency as a Foundation for AI Risk Management

AI models that lack full openness require stronger system-level guardrails. The MOF framework helps organizations manage AI risks in line with their AI usage and risk tolerance by promoting:

Disclosure Over Access – Companies can assess AI risks without needing raw data if key attributes (e.g., dataset sources, preprocessing steps, licensing restrictions) are clearly documented.
Bias and Fairness Audits – Transparency in data origins and composition allows enterprises to evaluate and mitigate potential biases with additional guardrails in system testing post-deployment validation.
Independent Validation – Where data cannot be released, enterprises can adopt third-party audits and red-teaming approaches to assess a model’s behavior, reliability, and security. This includes external evaluations of how models perform across different demographic groups, industries, or regulatory environments.

Additionally, organizations can implement AI system-level controls such as:

Robust monitoring frameworks to track model drift and detect unintended biases over time.
Human-in-the-loop interventions in high-risk applications where automated decisions must be reviewed or overridden.
Automated policy enforcement to ensure that AI outputs adhere to ethical guidelines, regulatory requirements, and corporate risk frameworks.

By integrating disclosure-driven risk assessments and system-level guardrails, enterprises can ensure accountability in AI deployments without relying on full data openness, making AI systems more transparent, compliant, and aligned with ethical and operational goals.

Remote Stack AI

1 个月

AI ‘openness’ is starting to feel like a marketing buzzword rather than a true commitment to transparency. If companies pick and choose what to disclose, are we really any better off than with fully closed models? ????

Ajinkya Nikam

Technical Support Manager | Support Engineering Manager | Customer Success Leader | Linux Expertise, Troubleshooting, Coaching & Career Development | IT & Customer Support Operations Expert

1 个月

The shift from viewing AI models as simply open or closed to recognizing a spectrum of openness is revolutionizing how we approach transparency and innovation. DeepSeek-V3 and DeepSeek-R1 highlight this complexity—advancing AI with significant technical contributions while navigating the limitations of full transparency due to privacy, intellectual property, and security concerns. The MOF framework offers a practical solution by emphasizing disclosure and risk assessment over unattainable full openness. This balance allows organizations to innovate responsibly, ensuring compliance and ethical integrity without stifling progress. As we stand at this crossroads, the question arises: Can embracing nuanced transparency drive accountability while still fueling the evolution of AI?

1 次回应

Dr. Prateek Thapar

1 个月

Indeed the MOF promotes reproducibility of AI models that is crucial for democratising AI, accelerating the innovation cycles across the world and building trust in AI. Thanks for sharing.

1 次回应

Sam Johnston

AI Leader · CEO/CTO · MBA · Founder · Xoogler

1 个月

Open Source is (necessarily) binary though, with the part of the spectrum of openness we care about being the various license styles that all meet the OSD’s requirements (e.g. permissive MIT vs copyleft GPL).

Tushar Katarki

Head of Product, GenAI Foundation Model Platforms

1 个月

Good stuff. For open science model, do I mean full transparency about the “training data”or do you mean full access. Those could be two different things and while full transparency is manageable granting full access to all the traing data may not be practical

2 次回应

查看更多评论

要查看或添加评论，请登录

Vincent Caldeira的更多文章

Sovereign AI: The New Strategic Imperative for Governments and Enterprises

2025年3月22日

Sovereign AI: The New Strategic Imperative for Governments and Enterprises

Over the past year, the concept of "Sovereign AI" has evolved from an aspirational idea to a strategic priority for…
Building Trust in Agentic AI: The Case for Model Supply Chain Transparency

2025年2月22日

Building Trust in Agentic AI: The Case for Model Supply Chain Transparency

Introduction As AI systems transition from standalone models to autonomous, agentic systems, the need for trust…

1 条评论
The Impact of the EU AI Act on Open-Source AI Development

2024年7月24日

The Impact of the EU AI Act on Open-Source AI Development

The EU AI Act presents a complex regulatory framework that is potentially setting a precedent for how global Artificial…

4 条评论
Green Codes: Evaluating the EU Artificial Intelligence Act's Environmental Framework

2024年7月16日

Green Codes: Evaluating the EU Artificial Intelligence Act's Environmental Framework

The EU Artificial Intelligence Act marks a significant regulatory step forward by setting standards that aim to…

3 条评论
Build Trust with a Transparent ML Supply Chain

2024年6月25日

Build Trust with a Transparent ML Supply Chain

In an era where artificial intelligence (AI) reshapes industries at an unprecedented pace, trust remains the paramount…

1 条评论
Unifying AI and Application Development

2024年6月1日

Unifying AI and Application Development

Since the release of GPT-3, generative AI technologies have shaken up the IT industry. The most obvious consequence is…

2 条评论
Future-proof your AI Innovation: Overcoming Lock-In Across Hardware, Frameworks, and Models

2024年5月13日

Future-proof your AI Innovation: Overcoming Lock-In Across Hardware, Frameworks, and Models

As I make my way back from Red Hat Summit 2024 in Denver, I am reminded of this prediction from our CEO Matt Hicks in…

1 条评论
Optimize Your AI, Minimize Your Costs

2024年4月27日

Optimize Your AI, Minimize Your Costs

NVIDIA's recent acquisition of Run:ai, a startup specializing in Kubernetes-based GPU orchestration, underscores the…

12 条评论
Bring AI to your Data, not your Data to AI

2024年4月16日

Bring AI to your Data, not your Data to AI

In the realm of artificial intelligence, the strategic deployment of AI resources can redefine how businesses operate…

15 条评论
AI Without Borders: The Five Pillars of a Hybrid AI Strategy

2024年3月28日

AI Without Borders: The Five Pillars of a Hybrid AI Strategy

Drawing on insights from my past few weeks on the road, meeting with our customers, it's clear we're at the brink of an…

10 条评论

See all articles

The Model Openness Framework: A Practical Approach to AI Transparency

Vincent Caldeira

Chief Technology Officer, APAC at Red Hat ? Technical Oversight Committee Member at FINOS ? Green AI Committee Member at Green Software Foundation ? Technical Advisor at OS-Climate ? Technology Advisor at U-Reg

AI Openness Is Not Binary: Moving Beyond Open vs. Closed Models

DeepSeek-V3 and DeepSeek-R1: Advancing Open-Source AI with Partial Transparency

Technical Innovation Through Open Research

Limited Transparency in Training and Tuning Data

领英推荐

The Reality of Data Management: Why Full Openness Is Unrealistic

Transparency as a Foundation for AI Risk Management

Vincent Caldeira的更多文章

社区洞察

其他会员也浏览了

GenAI-Direct Preference Optimization (DPO): A Revolutionary Paradigm for Human-Centric Artificial Intelligence in Enterprise Applications

Generative AI Amplifies the Focus on Data: How Companies Must Evolve into Data-Centric Organizations

Building and Deploying Robust AI Systems

April 2024 (Part 1)

AI: from a business perspective

Gen AI in enterprises - playtime is over

Solving the Precision Problem in IDP: Strategies for High-Accuracy AI Automation

Are you ready for the transformative impact of AI with Outsystems?

Five Orders of Data Abstraction

Analysis and Strategy to Realize the Orion AI Model

AI Openness Is Not Binary: Moving Beyond Open vs. Closed Models

DeepSeek-V3 and DeepSeek-R1: Advancing Open-Source AI with Partial Transparency

Technical Innovation Through Open Research

Limited Transparency in Training and Tuning Data

领英推荐

The Reality of Data Management: Why Full Openness Is Unrealistic

Transparency as a Foundation for AI Risk Management

Vincent Caldeira的更多文章

Sovereign AI: The New Strategic Imperative for Governments and Enterprises

Building Trust in Agentic AI: The Case for Model Supply Chain Transparency

The Impact of the EU AI Act on Open-Source AI Development

Green Codes: Evaluating the EU Artificial Intelligence Act's Environmental Framework

Build Trust with a Transparent ML Supply Chain

Unifying AI and Application Development

Future-proof your AI Innovation: Overcoming Lock-In Across Hardware, Frameworks, and Models

Optimize Your AI, Minimize Your Costs

Bring AI to your Data, not your Data to AI

AI Without Borders: The Five Pillars of a Hybrid AI Strategy

社区洞察

其他会员也浏览了

GenAI-Direct Preference Optimization (DPO): A Revolutionary Paradigm for Human-Centric Artificial Intelligence in Enterprise Applications

Generative AI Amplifies the Focus on Data: How Companies Must Evolve into Data-Centric Organizations

Building and Deploying Robust AI Systems

April 2024 (Part 1)

AI: from a business perspective

Gen AI in enterprises - playtime is over

Solving the Precision Problem in IDP: Strategies for High-Accuracy AI Automation

Are you ready for the transformative impact of AI with Outsystems?

Five Orders of Data Abstraction

Analysis and Strategy to Realize the Orion AI Model