Best Practices in AI Documentation: The Imperative of Evidence from Practice
Center for Democracy & Technology
Promoting democratic values by shaping technology policy and architecture, with a focus on the rights of the individual.
By: CDT's Amy Winecoff & Miranda Bogen
Recent AI incidents have underscored the urgent need for robust governance to mitigate risks and ensure responsible development of AI-powered systems. For example, 4chan users leveraged AI tools to create violent and explicit images of female celebrities , and Google’s Gemini generated offensive images of “historical” figures . These incidents are part of a long history of AI failures, from chatbots spewing hate speech to algorithms exacerbating racial disparities in healthcare . Ongoing AI incidents raise a crucial question: Why do AI failures persist? The answer, while complex, centers on the inadequacy of current AI governance procedures.?
Effective risk management and oversight of AI hinge on a critical, yet underappreciated tool: comprehensive documentation. Often, documentation is conceptualized as a tool for achieving transparency into AI systems, enabling accountability to external oversight bodies and the public. However, third-party visibility and accountability are only two of the many goals that documentation can facilitate. Documentation also serves as the backbone for effective AI risk management and governance, and helps practitioners assess potential failure modes and proactively address these issues throughout the development and deployment lifecycle. Well-maintained documentation offers organizations ongoing insights into their systems’ strengths and weaknesses, fostering iterative improvements. And documentation informs decisions about whether to launch systems at all, given the potential benefits and risks they stand to pose. In essence, documentation is a tool that has the potential to —?and is indeed necessary to — facilitate both external accountability and internal risk management practices.
Notwithstanding that potential, approaches that seem beneficial in theory are not always successful in practice. To ensure documentation can fully support robust AI governance, researchers, policymakers, and advocacy groups should consider insights from public and private-sector practitioners experienced in creating and using documentation, as well as evidence of its efficacy in real-world AI contexts.
The Theory of Documentation
In ideal forms, AI documentation records fundamental details about AI systems, including the sources of training data, the hardware and software used to train the component AI models, and the evaluation methodologies used to assess the systems for efficacy and errors. Documentation can also describe the procedures a company has followed in designing, developing, and deploying these systems, such as the original motivation for developing the system, whether the system underwent an impact assessment or ethics review, and how training or evaluation data were labeled.?
Good documentation can provide insight into an AI system’s risks and improve AI development more generally. For example, Wikipedia developers participating in a research study were asked to use a documentation framework to guide their development of a machine learning system for predicting quality in community content moderation applications. Through their engagement in the documentation process, practitioners identified accuracy metrics that were more closely aligned with the priorities of the system’s target users. Navigating this process also provided them with a deeper understanding of the system they were working on, which could facilitate more efficient development in the future.
On the other hand, when AI systems or components are not sufficiently documented before deployment, errors or problems with these components may go undetected, potentially deteriorating system performance and posing ethical and legal risks. For instance, when one group of AI researchers documented BookCorpus —a previously undocumented dataset used to train popular large language models (LLMs)—they revealed numerous duplications of books and an overrepresentation of specific genres, like romance, which could potentially skew outputs of models trained on that dataset. Researchers also found that BookCorpus may have violated copyright restrictions, highlighting significant legal and ethical implications of using the dataset. This case illustrates the importance of thorough documentation in identifying hidden problems and promoting responsible AI development and use.
Numerous AI researchers and governance groups have proposed frameworks for AI documentation. As a part of an ongoing effort to understand themes that emerge from documentation research, we identified and reviewed 37 different approaches to documenting AI data, models, systems, and processes that have been proposed in the academic and gray literature. These proposals have significantly influenced academic researchers and policymakers seeking to define best practices for responsible AI development. For instance, the concept of model cards , a method adopted by a number of AI developers for documenting AI models, has been cited nearly 1,800 times in academic publications in the last five years. The National Institute for Standards and Technology (NIST) references datasheets for datasets , a method for documenting AI training data, 26 times in its guide for companies on implementing effective AI risk management . The technical documentation requirements for high-risk AI systems in the European Union’s AI Act draw on a number of AI documentation proposals, including both datasheets and model cards.
Improving governance outcomes can be achieved both through the documentation artifacts practitioners create for data, models, and systems, and through their active participation in the documentation process. Specific outputs like datasheets, model cards, and system cards can help downstream stakeholders understand the intended and unintended uses of these components. These artifacts enable stakeholders to assess whether their planned uses comply with organizational or legal requirements, preventing non-compliant development efforts. These artifacts can also alert downstream stakeholders to instances where risk mitigation techniques may be necessary before systems can be safely deployed.
Beyond the benefits of the artifacts themselves, the documentation process can foster a healthy risk management culture within an organization. Regularly documenting risks helps practitioners better understand responsible AI principles and practices, influencing their behavior beyond the documentation process. Documentation can potentially serve as a forcing function, encouraging practitioners to follow best practices in software development and adopt more rigorous scientific approaches since documentation allows for increased scrutiny and accountability from other internal stakeholders. Moreover, documentation can facilitate collaboration among different stakeholders by establishing a common knowledge base about systems. This shared understanding helps cross-functional teams collectively examine the strengths, weaknesses, and risks of their approaches from multiple perspectives.
From Theory to Practice
While proposed documentation frameworks have laid a foundation for AI development norms, the emerging policy attention to defining more specific documentation standards necessitates a careful evaluation of which practices have or will most effectively support governance goals in real-world settings. Documentation is a tool, and the success of that tool depends on the human and social context in which it is used . Without adequate evidence to support understanding of social, organizational, and institutional dynamics that could influence the utility documentation provides, unproven and potentially less effective methods could end up becoming a norm, while more robust approaches that better serve governance and accountability goals go unadopted.
To contribute to an evidence-based understanding of effective documentation strategies, the AI Governance Lab reviewed 21 research papers that present empirical findings related to documentation and convened stakeholders from technology companies, AI governance and compliance consulting firms, nonprofit organizations, and government agencies to identify insights. We also consulted individually with stakeholders who use documentation in their work. In each effort, we sought to identify challenges and opportunities for translating the theoretical approaches to documentation that researchers have proposed into real-world practice.?
Empirical studies identify a variety of challenges for implementing documentation effectively. These include:
Our convening and consultations with documentation stakeholders likewise underscored that implementing documentation in real-world development environments is more complex than it may appear, and the assumptions of proposed documentation frameworks often diverge from the conditions that shape real-world practice. Considerations that our workshop participants raised included:
To be sure, none of these concerns should be an excuse for poor governance practices, but they present factors for organizations to consider when designing and implementing documentation practices to maximize their likelihood of success, and for policymakers to be aware of when considering what guidance or requirements would be effective to incentivize useful documentation practices.
The Path Forward
Given the urgency of preventing AI-driven harms, stakeholders will need to balance some uncertainty around which practices will best support risk management with the consequences of continued inconsistency and insufficiency in many current documentation approaches. In the meantime, successes and failures of past AI documentation research and implementations offer valuable lessons. One important lesson is that proposed approaches to documentation are more valuable when they are accompanied by empirical evidence about how these approaches can and can’t respond to applied development contexts. While extensive real-world investigations in multiple contexts may not be practical in the short term, smaller qualitative studies and focus groups can still provide helpful insights. For example, researcher Karen Boyd conducted a study to assess the effectiveness of data documentation in raising ethical awareness among AI practitioners . In her study, 23 AI practitioners participated, but only 11 were given data documentation artifacts to consult. Results indicated that those with access to data documentation were more likely to recognize ethical issues than those without it. Although the findings are not definitive, Boyd’s study represents some of the best available evidence on the impact of data documentation on practitioners’ ethical deliberation.?
In some instances, empirical evidence on documentation exists but remains unpublished, which is regrettable. Take, for example, documentation frameworks developed through an approach known as “co-design.” In co-design studies, researchers iterate between proposing a framework, gathering feedback from relevant stakeholders or gathering data from pilot implementations, and updating the framework design until they converge on a final product. Co-design is effective for developing empirically-informed frameworks that are responsive to organizational context. However, the data collected during these studies has not generally been shared in publications. This is most likely because the authors believe the primary contribution of their work is the framework rather than the findings, but findings can provide crucial context as to why the authors adapted their initial designs or where the framework might best succeed. Without this context, practitioners and policymakers might not understand the rationale behind specific design choices that can be necessary for effective implementation.?
Moving forward, researchers should prioritize evaluating proposed approaches in practice and publicly sharing findings to provide insight into where there may be subtle tradeoffs with significant implications for governance and accountability efforts. Yet, even if researchers embrace empirical evaluations as a necessary step when proposing documentation approaches, reaching a consensus on evidence-based best practices will likely take time — and in the meantime, lack of or ambiguity in evidence for such proposals risks being weaponized by actors looking to water down rules they are ultimately expected to follow.?
Nevertheless, policymakers and government agencies should more carefully review AI documentation proposals that have a weaker empirical basis than those informed by more robust evidence. They should also build in processes to review and revisit the effectiveness of recommendations or guidelines over time to ensure gaps are spotted and filled. Doing so will help documentation as a risk mitigation strategy live up to its promise. Policymakers and government entities like NIST and the EU AI office also should prioritize enhancing the evidence base for documentation practices. Guidance to companies, for instance, could recommend mechanisms for evaluating and disclosing the success of particular documentation frameworks. As companies adopt documentation frameworks to manage AI risks, they could then assess both the usability and effectiveness of these practices — both of which are instrumental to the success of any responsible AI tool .?
Grantmaking organizations like the National Science Foundation (NSF) and AI safety research initiatives could promote evidence-based research on documentation by requiring grant applicants to detail how they will empirically evaluate their proposed approaches. Just as academic conferences and journals have shifted researchers’ embrace of open science and focus on the potential ethical implications of their work, publication venues could further encourage empirical evaluation by adjusting review guidelines for research on documentation. Reviewers could then place significant emphasis on the presence or absence of empirical evaluations when making acceptance decisions.
As AI systems become increasingly integral to technology products, establishing effective governance practices is critical. Robust documentation practices are essential to these efforts. While the need to document AI systems is evident, it is equally important for AI companies to adopt empirically validated methods that improve outcomes in real-world development and risk management settings. Researchers, policymakers, and other stakeholders defining best practices for AI governance can learn from the successes and failures of past efforts, empirical studies on documentation, and insights from practitioners. By doing so, they can develop and refine documentation approaches that fulfill their theoretical potential in practice. Moving forward, collaborative efforts that emphasize evidence-based practices will be crucial in harnessing the benefits of AI while minimizing its risks. At the AI Governance Lab, we will continue to investigate this important area and highlight promising practices within policy and practitioner communities.
Privacy, Data Protection, and Digital Health Associate at Hintze Law
4 个月Alexandra Schlight