登录查看更多内容

Toward a Turing Audit: A Proposal for Systematic Authenticity Assessments in Academic Writing

Mark DiMauro, PhD

Assistant Professor at University of Pittsburgh-Johnstown

发布日期: 2024年12月12日

Abstract

The purpose of a writing assessment, when divorced from "staple" writing instruction like grammar, punctuation, formatting, or style guides, is, ostensibly, to "view a student's thought process," from research origination through original thought production via research synthesis. However, in our increasingly augmented era, of course influenced by no shortage of AI (specifically, LLMs), determining the authenticity of academic prose, and thus, the student's original thoughts that underpin it, is both more challenging and more critical than ever before. Simplified content creation via augmentation has undermined both trust and transparency. Traditional anti-plagiarism methods now run parallel with emergent “AI-detection” strategies, none of which are yet reliable in isolation. To address these concerns, and as an extension of an article I published here last Spring, I propose the concept of “Turing Auditing.”

I should take a brief aside here to note that the bulk of my Digital Humanities research is dedicated to the use, exploration, and edge case deployment of AI tools (and I leverage them both for lesson design and activity creation in most of my classrooms), but I do recognize the inherent danger of over-reliance on these tools, as well as the less-than-universal acceptance of such by all faculty. While I personally argue for a "coaching, not catching" viewpoint when examining student writing in light of AI, there absolutely exist a multitude of use cases wherein it is both prescient and necessary to separate the automated from the original.

Turing Auditing leverages multiple complementary methods—automated AI-detection tools, linguistic frequency analysis, close reading techniques, and consultation with AI models themselves (the “AI fan” test)—and integrates them into a coherent scoring system designed to help instructors, editors, and grant panels gauge whether a given text is likely to be human-authored, machine-generated, or a blend of the two.

This grant proposal seeks to explore the concept and, potentially, act as a rough draft toward a fully-fleshed out grant proposal to seek funding to formally develop, test, and refine the Turing Audit methodology and to implement it as a standardizable academic protocol. To that end, I've formatted this article like a grant proposal.

---

Introduction and Rationale

The academic community’s longstanding trust in the integrity of authorship has been shaken by the rise of AI-generated writing. While tools such as Turnitin, GPTZero, and Copyleaks have emerged with capabilities to signal AI involvement, these tools often produce false positives and false negatives with distressing frequency, offering accuracy rates barely above chance (ranging between 40–55% in many cases; audit information available). This instability reflects a need for a far more reliable protocol, specifically, one that combines automated examination with traditional scholarly scrutiny and rigorous qualitative analysis. In short, the correct approach here is a blend of both the Digital and the Humanities.

The Concept of Turing Auditing

Turing Auditing is inspired, of course, by the Turing Test, but it resembles that test in name only. Turing Auditing establishes authenticity by assembling multiple analytic tracts and combining them via a basic arithmetic formula to result in a full score. Note that this approach necessarily requires human oversight at several steps, but this intentional introduction of humanities close-reading is a core facet - one might argue, the most critical facet - of the methodology. Note here that a sophisticated and/or experienced prompt engineer, or someone using an edge model (or modifying text ex-post-facto) is far less likely to be detected. I recognize this inherent weakness, and as such, I do not argue that this methodology should be, for example, the basis of an academic integrity charging letter, but rather, another tool in the toolkit of the writing assessor.

I define Turing Auditing with five key metrics:

1. Automated AI-Detection Tools (Baseline Screening)

A selection of at least three AI-detection tools in parallel—such as GPTZero, Copyleaks, etc. Each tool’s reliability alone is limited, but combining their outputs into a meta-assessment reduces the probability of misleading conclusions. While no tool currently exceeds 55% accuracy consistently (again, I have conducted an audit of these systems and can provide the data to back this conclusion), a consensus approach might yield a more balanced initial filter. I will develop a weighting system that accounts for the average confidence scores and commonalities among tool outputs.

2. Frequency Analysis of Stylistic Lexicon

AI-generated texts often rely on certain stylistic markers—words that signal a polished but generic rhetorical style. Current research has identified a set of “tell-tale” terms that appear with disproportionate frequency in AI outputs. These include: Elevate, Tapestry, Leverage, Journey, Headache, Resonate, Testament, Explore, Delve, Enrich, Seamless, Multifaceted, Foster, Convey, Beacon, Interplay, Navigate, Adhere, Paramount, Comprehensive, Placeholder, Realm, and Symphony. By quantifying occurrences of these terms, adjusting for document length and disciplinary norms, I can develop a probability indicator that the text aligns with known AI-generated linguistic patterns. This approach mirror's Moretti's Distant Reading (computational linguistics) and is a well-established aspect of text analysis. There are myriad tools to accomplish this, some, ironically, also driven by AI text extractions.

3. Turing Auditing Close-Reading

The cornerstone of Turing Auditing is close-reading by a human expert, typically the assessor (instructor). This approach scrutinizes coherence, argumentation quality, the proper and consistent use of citations, and the presence of discipline-specific nuance. Hallmarks of AI-generated text, especially those from poorly-guided, free, or legacy models (often the type students are likely to employ), often include:

- Vague and overly general statements juxtaposed with flawless syntax.

- Citations that are either non-existent, improperly formatted, or suspiciously generic (e.g., referencing a well-known author without attributing a verifiable source).

- Structural uniformity that feels “templated” rather than organically reasoned.

By systematically examining content for intellectual depth, subtlety, and citation authenticity, close-reading can detect patterns that machine learning models struggle to forge convincingly. This facet holds the greatest weight in our composite scoring system, as it embodies the humanistic and scholarly judgment that no tool can (yet) fully replicate.

4. AI “Fan” Analysis

Ironically, LLMs can serve as meta-critics. While these models can be biased toward praising their own kind (write something with 4o, for example, and then ask the same 4o, while still in the context window, how good the text is. AI is a big fan of itself), comparing how multiple LLMs evaluate a text’s quality can yield insights. If they uniformly deem a piece “well-structured” yet fail to pinpoint original thought or specific intellectual contributions, their enthusiasm might be indicative of AI-generation. On the other hand, if an LLM struggles to categorize the text or points to highly nuanced, human-like reasoning that is consistent with established scholarly discourse, it may suggest human authorship. This facet is experimental.

领英推荐

Maintaining the human touch of report writing with AI

Twinkl Educational Publishing 9 个月前

How to Use ChatGPT Effectively in Academia

OpenGrowth 3 个月前

Five Tips for Writing Academic Integrity Statements in…

Magna Publications 9 个月前

5. Mathematical Integration and Scoring Formula

To translate these qualitative and quantitative observations into a standardized measure, I propose a unified scoring system that aggregates each facet’s output into a single composite score. This score ranges from -50 (most likely AI-generated) to +50 (most likely human-generated), with 0 serving as a neutral midpoint.

Let:

- S1 = Score from automated detection tools (range: -10 to +10)

- S2 = Score from frequency analysis of key terms (range: -5 to +5)

- S3 = Score from close reading (range: -25 to +25)

- S4 = Score from AI “fan” analysis (range: -10 to +10)

The final Turing Audit Score (TAS) is:

{TAS} = S1 + S2 + S3 + S4

Because close reading demands the greatest share of the weighting (it is the most critical and reliable method), it has the largest scoring range (-25 to +25), ensuring it significantly influences the final outcome.

---

Expected Outcomes

By developing and testing the Turing Auditing protocol, I expect:

- A quantifiable, transparent measure to identify AI-generated text in academia.

- A stronger sense of trust among scholars, editors, and grant committees in the authenticity of submissions.

- The establishment of a community standard that can adapt as AI evolves.

Long-Term Vision

Once validated, Turing Auditing can be integrated into academic workflows—journal submissions, grant proposal reviews, admissions essays, and peer review processes. The methodology will remain open-source and regularly updated to respond to advances in AI language generation. Over time, a community-driven feedback loop will improve the protocol’s accuracy, fairness, and utility.

Conclusion

In a scholarly ecosystem increasingly intertwined with AI-generated language, we must safeguard the authenticity and intellectual rigor of academic communication. This grant proposal outlines Turing Auditing: an integrative, evidence-based approach that combines technology, linguistic insights, and human expertise to restore and maintain confidence in academic authorship. By funding this initiative, we lay the groundwork for a new standard that meets the evolving demands of the information age.

I welcome your feedback - or your enthusiasm, if you'd like to be a part of this (speculative) project!

要查看或添加评论，请登录

Mark DiMauro, PhD的更多文章

Coding Away, or, Why AI Won't Kill Computer Science

2025年2月7日

Coding Away, or, Why AI Won't Kill Computer Science

"Coding is dead!" "Don't go to school for computer science, you won't find a job!" "The sky is falling!" ..
Algorithmic Education

2024年10月7日

Algorithmic Education

I have posts - both present and forthcoming - regarding the complex interplay of ethical responsibilities, assignment…
Reading the Unreadable

2024年4月16日

Reading the Unreadable

When considering Digital Humanities, the "Digital" side of things is often the focus. Adam Hammond spends a good deal…
Editor or Auditor?

2024年4月1日

Editor or Auditor?

"AI will change everything." This statement is certainly no surprise; no deep, valuable, or unseen insight will be…

1 条评论
Sophocles: Lost & Built

2024年2月7日

Sophocles: Lost & Built

Some time ago, when AI was still nascent and with only trial versions of GPT-3 available in the OpenAI Playground (this…

2 条评论

See all articles

Toward a Turing Audit: A Proposal for Systematic Authenticity Assessments in Academic Writing

Mark DiMauro, PhD

Assistant Professor at University of Pittsburgh-Johnstown

领英推荐

Mark DiMauro, PhD的更多文章

社区洞察

其他会员也浏览了

The road ahead: Conversations shaping scholarly publishing

Top 15 Online Resources for Teachers (Compiled)

Access AI Research Assistant, Reference Manager With Paperguide Lifetime Deal

Using AI to Solve for AI in College Admissions

Article Review: Future Trends in AI and Academic Research Writing

4 Predictions for AI-Human Writing in the Coming Decade

The Ethical Implications of AI-Generated Academic Content

Best AI Writing Tool for Research Paper Writing?

In Defense of Augmented Writing Technologies in the Age of Artificial Intelligence

Welcome to the April 2024 edition of the Edtech & ELT Newsletter.

领英推荐

Mark DiMauro, PhD的更多文章

Coding Away, or, Why AI Won't Kill Computer Science

Algorithmic Education

Reading the Unreadable

Editor or Auditor?

Sophocles: Lost & Built

社区洞察

其他会员也浏览了

The road ahead: Conversations shaping scholarly publishing

Top 15 Online Resources for Teachers (Compiled)

Access AI Research Assistant, Reference Manager With Paperguide Lifetime Deal

Using AI to Solve for AI in College Admissions

Article Review: Future Trends in AI and Academic Research Writing

4 Predictions for AI-Human Writing in the Coming Decade

The Ethical Implications of AI-Generated Academic Content

Best AI Writing Tool for Research Paper Writing?

In Defense of Augmented Writing Technologies in the Age of Artificial Intelligence

Welcome to the April 2024 edition of the Edtech & ELT Newsletter.