Generative AI Seeped into Research Peer Reviews

Generative AI Seeped into Research Peer Reviews

A while ago Wired wrote about how #ChatGPT and the other similar #GenerativeAI tools are now deployed to mass-produce scammy books flooding online book markets [1]. The story seems straightforward: there’s a profit motive and there’s a tool enabling it, voila.

But that can never happen to ethics-conscious, integrity-driven researchers, can it? It turns out it absolutely can, albeit as of this writing, probably at a smaller scale (?) [2].

But if authors of research papers do it, the group that should hold them accountable, aka the reviewers, surely would not take the shorcuts? Well, according to a recent paper, they also do [3]:

  • The authors’ goal is to estimate what percentages (“alpha”) of the reviews are significantly modified by AI (i.e., not just minor edits to correct grammatical errors). They discovered out of AI conferences #CoRL, #ICLR, #EMNLP, #NeurIPS, and #NaturePortfolio, all but the last exhibit sharp increases in alphas after the launch of ChatGPT (November 2022). In particular 10.6% of ICLR 2024 and 16.9% of EMNLP 2023 reviews are substantially modified by AI (screenshot 1).


  • They also correlated the alpha spikes with the other factors such as how close the reviews were submitted before the deadlines, how many citations were given in the reviews, how much reply rates were from the reviewers, how homogeneous the review content was, and how confident the reviewers were. No surprise is in any of these findings (screenshot 2).


  • They use a simple, corpus-based method to estimate the fraction, alpha, of the given documents that are modified by AI (screenshot 3): P and Q stand for the probability distributions of docs written by humans and AI, respectively, and can be estimated empirically.


  • For datasets, they collected papers and reviews from #ICLR 2018-2024, #NeurIPS 2017-2023, #CoRL 2021-2023, and #EMNLP 2023 (screenshot 4).


  • To validate the MLE estimation of alpha, they created document mixtures for a range of alphas, performed the estimation, and computed the errors. They found the method is robust in both in-domain and out-of-domain (screenshot 5).


  • Remarkably, when compared to much more sophisticated instance-based detection methods, their corpus-based method is 3.4x and 4.6x better in in-domain and out-of-domain settings, respectively (screenshot 6).


So the genie is out of the bottle, what now? All of the venues already have policies in place for authors regarding the use of AI writing assistance (e.g., #ACL 2023 [4]), and some even gives guidance to reviewers (e.g., #ACL 2023 [5]). Perhaps using these tools is not a sin, as long as they are not used to replace a genuine understanding of the work they are entrusted to review and evaluate.

REFERENCES

[1] Kate Knibbs. Jan 10, 2024. Scammy AI-Generated Book Rewrites Are Flooding Amazon. Wired. https://www.wired.com/story/scammy-ai-generated-books-flooding-amazon/

[2] Previous post on research papers claiming “I am an AI language model”:? https://www.dhirubhai.net/posts/benjaminhan_elen-le-foll-elenlefoll-activity-7174624268310261760-eW0M

[3] Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, and James Y. Zou. 2024. Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews. https://arxiv.org/abs/2403.07183

[4] Jordan Boyd-Graber, Naoaki Okazaki, Anna Rogers. 2023. ACL 2023 Policy on AI Writing Assistance. https://2023.aclweb.org/blog/ACL-2023-policy/

[5] Anna Rogers, Jordan Boyd-Graber, Naoaki Okazaki. 2023. ACL’23 Peer Review Policies. https://2023.aclweb.org/blog/review-acl23/#faq-can-i-use-ai-writing-assistants-to-write-my-review

Woodley B. Preucil, CFA

Senior Managing Director

8 个月

Benjamin Han Very Informative. Thank you for sharing.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了