The Model Openness Framework: A Practical Approach to AI Transparency

The Model Openness Framework: A Practical Approach to AI Transparency

The original MOF research paper (March 2024) defined a three-tiered classification for AI models based on their openness:

  • Class 1: Open Science Model – Full access to model weights, training data, code, and architecture under open licenses.
  • Class 2: Open Tooling Model – Some components (e.g., training data) are restricted, but model weights and architecture remain open.
  • Class 3: Open Model – Only certain details, such as weights or inference code, are available.

While this research set a foundation for defining AI openness, the newly released MOF Implementation Framework translates it into practical evaluation criteria that organizations can adopt. It strengthens transparency requirements, provides guidance on licensing and reproducibility, and introduces an assessment process to standardize how AI models are classified.


AI Openness Is Not Binary: Moving Beyond Open vs. Closed Models

I believe that from a practical point of view, AI models exist on a spectrum of openness, not a simple open vs. closed divide. Some models release code but not training data; others open-source weights but restrict commercial use. The MOF framework acknowledges this complexity and provides structured criteria to evaluate transparency levels accurately.

Rather than enforcing full openness, the framework promotes completeness of disclosure, ensuring that enterprises can make informed decisions about the AI models they adopt.

A key example of this is DeepSeek-V3 and DeepSeek-R1, two recent open-source AI models that push the boundaries of open innovation while also demonstrating the limitations of transparency in AI development.


DeepSeek-V3 and DeepSeek-R1: Advancing Open-Source AI with Partial Transparency

Technical Innovation Through Open Research

DeepSeek has contributed significant advancements in reinforcement learning (RL) and distillation techniques through its latest models:

  • DeepSeek-V3 has introduced improvements in RL-based fine-tuning, leveraging reinforcement learning from AI feedback (RLAIF) to enhance alignment and control.
  • DeepSeek-R1, which is primarily designed for advanced reasoning tasks, focusing on areas such as mathematics and coding, has demonstrated the use of effective distillation techniques to impart advanced reasoning capabilities to smaller models, enhancing their performance beyond that of their original base versions.

These contributions accelerate innovation in the open-source AI ecosystem by allowing researchers and developers to build upon state-of-the-art training methodologies.

Limited Transparency in Training and Tuning Data

Despite their technical openness, DeepSeek-V3 and DeepSeek-R1 do not fully disclose their training and fine-tuning datasets, which is a common issue in today’s AI landscape. The primary concerns include:

  • Lack of clarity on data provenance – The specifics of how DeepSeek fine-tunes its models remain undisclosed. While the base model training process may be partially documented, the fine-tuning data plays a crucial role in model alignment, safety, and performance.
  • Limited reproducibility – Without full access to training data, independent researchers cannot verify potential biases, fairness issues, or ethical considerations in the data used.
  • Compliance and enterprise risk – Organizations adopting DeepSeek-based models must assess whether the lack of transparency in tuning data affects their governance, compliance, or regulatory requirements.

The MOF framework provides a structured approach to evaluating such models, recognizing their contributions to open-source innovation while also flagging areas where transparency is incomplete.


The Reality of Data Management: Why Full Openness Is Unrealistic

While transparency is critical, it is not always realistic for AI models to fully disclose their training data due to:

  • Privacy and Regulation – Compliance with laws like GDPR and HIPAA often prevents dataset release.
  • Intellectual Property – Proprietary datasets give businesses a competitive edge.
  • Security Risks – Public data disclosure can expose models to adversarial attacks.

Instead of requiring full data openness, the MOF framework emphasizes clear documentation of dataset sources, bias considerations, and processing methods. This enables organizations to assess risks, compliance, and ethical considerations without needing direct access to training data.

The DeepSeek models illustrate this balance—contributing significantly to the AI research community while retaining commercial and strategic control over fine-tuning data.


Transparency as a Foundation for AI Risk Management

AI models that lack full openness require stronger system-level guardrails. The MOF framework helps organizations manage AI risks in line with their AI usage and risk tolerance by promoting:

  1. Disclosure Over Access – Companies can assess AI risks without needing raw data if key attributes (e.g., dataset sources, preprocessing steps, licensing restrictions) are clearly documented.
  2. Bias and Fairness Audits – Transparency in data origins and composition allows enterprises to evaluate and mitigate potential biases with additional guardrails in system testing post-deployment validation.
  3. Independent Validation – Where data cannot be released, enterprises can adopt third-party audits and red-teaming approaches to assess a model’s behavior, reliability, and security. This includes external evaluations of how models perform across different demographic groups, industries, or regulatory environments.

Additionally, organizations can implement AI system-level controls such as:

  • Robust monitoring frameworks to track model drift and detect unintended biases over time.
  • Human-in-the-loop interventions in high-risk applications where automated decisions must be reviewed or overridden.
  • Automated policy enforcement to ensure that AI outputs adhere to ethical guidelines, regulatory requirements, and corporate risk frameworks.

By integrating disclosure-driven risk assessments and system-level guardrails, enterprises can ensure accountability in AI deployments without relying on full data openness, making AI systems more transparent, compliant, and aligned with ethical and operational goals.

AI ‘openness’ is starting to feel like a marketing buzzword rather than a true commitment to transparency. If companies pick and choose what to disclose, are we really any better off than with fully closed models? ????

回复
Ajinkya Nikam

Technical Support Manager | Support Engineering Manager | Customer Success Leader | Linux Expertise, Troubleshooting, Coaching & Career Development | IT & Customer Support Operations Expert

1 个月

The shift from viewing AI models as simply open or closed to recognizing a spectrum of openness is revolutionizing how we approach transparency and innovation. DeepSeek-V3 and DeepSeek-R1 highlight this complexity—advancing AI with significant technical contributions while navigating the limitations of full transparency due to privacy, intellectual property, and security concerns. The MOF framework offers a practical solution by emphasizing disclosure and risk assessment over unattainable full openness. This balance allows organizations to innovate responsibly, ensuring compliance and ethical integrity without stifling progress. As we stand at this crossroads, the question arises: Can embracing nuanced transparency drive accountability while still fueling the evolution of AI?

Dr. Prateek Thapar

Data and Artificial Intelligence | Engineering Director | DataLake | MLOps | Big Data Analytics | Natural Language Processing | Computer Vision | Generative AI | LLM | CISO | Indian Air Force | Quantum Computing | QML

1 个月

Indeed the MOF promotes reproducibility of AI models that is crucial for democratising AI, accelerating the innovation cycles across the world and building trust in AI. Thanks for sharing.

Sam Johnston

AI Leader · CEO/CTO · MBA · Founder · Xoogler

1 个月

Open Source is (necessarily) binary though, with the part of the spectrum of openness we care about being the various license styles that all meet the OSD’s requirements (e.g. permissive MIT vs copyleft GPL).

回复
Tushar Katarki

Head of Product, GenAI Foundation Model Platforms

1 个月

Good stuff. For open science model, do I mean full transparency about the “training data”or do you mean full access. Those could be two different things and while full transparency is manageable granting full access to all the traing data may not be practical

要查看或添加评论,请登录

Vincent Caldeira的更多文章

社区洞察

其他会员也浏览了