ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Comparing AI-Generated Text with Human Language

Sean Shiverick, MS, PhD

Research | Analytics | Consulting

å‘å¸ƒæ—¥æœŸ: 2023å¹´8æœˆ26æ—¥

Large Language Models (LLMs)

Recent advances in artificial intelligence (AI) have led to the development of large language models (LLMs) with the transformative ability to generate fluent and grammatical text by predicting the likelihood of word sequences based on enormous amounts of training data (Brown et al., 2020; Radford et al., 2019).?The tremendous progress in natural language generation (NLG) over a relatively short time poses a challenge for distinguishing whether content created across various tasks was produced by an AI algorithm or a human (Guo et al., 2023). OpenAIâ€™s ChatGPT is very good at mimicking human language in response to user prompts, but can also confidently produce inaccurate information about people, places, or facts. Yet an important question remains about how to evaluate complex AI models (Celikyilmaz et al., 2021). The increase in AI-generated content has raised concerns about academic integrity, detection of artificial content, and the spread of inaccurate or misleading information (Uzun, 2023). Although LLMs can generate high-quality, grammatically correct text that resembles human language, a gap remains in the level of detail and overall quality of text produced by AI language models versus human language (Liao et al., 2023; Ma et al., 2023).?

Detecting AI-Generated Content?

A growing concern in educational settings is that students could misuse AI technologies to cheat on assignments and exams (Dou et al., 2022). Developing tools and strategies to detect artificially generated content is an active area of research with applications in education, journalism, and social media (e.g., GPTZero, Turnitin, metadata analysis, stylometric analysis). Methods for detecting artificial content may be limited by the possibility of manipulated metadata and the reliance on machine learning algorithms that require large amounts of training data. In addition, detector systems are based on systematic differences between human and machine text, though the goal of AI is to make machine generated text as close to human language as possible. LLMs have also been used to solve introductory level programming assignments while bypassing detection from plagiarism detection tools (Biderman & Raff, 2023). Methods for marking AI-generated content have been proposed, such as the use of â€˜watermarksâ€™ or â€˜accentsâ€™, to facilitate the detection of artificial content and reduce the potential for misuse (Kirchenbauer et al., 2023).?

How Accurate is AI-Generated Content?

It is easy to get caught up in the excitement about powerful new AI models, but how well do LLMs perform on challenging tasks? Researchers at Purdue University analyzed ChatGPTâ€™s answers to programming questions from Stack Overflow in terms of correctness, consistency, comprehensiveness, and conciseness (Kabir et al., 2023). Nearly half of the AI-generated responses were correct, but almost forty percent of human reviewers were persuaded by the lengthy and detailed responses generated by ChatGPT. The comprehensive and articulate responses and the polite, authoritative style of ChatGPT made some completely wrong answers seem as though they were correct. The human reviewers were better at identifying errors in ChatGPT responses when the error was obvious, but when the error was not easily identified, users often failed to detect or underestimated the errors in AI-generated responses. The confident way that ChatGPT conveys information gained the userâ€™s trust, leading them to accept and even prefer answers that were incorrect. Jakesch and colleagues (2022) proposed that innate heuristics from human communication and self-presentation (e.g., first-person pronouns, contractions, family topics) can undermine judgments about AI-generated content that may be inaccurate or misleading.?

Comparable versus Equivalent

LLMs are encoder-decoder transformer models with a self-attention mechanism whose outputs are generated by processing terabytes of data to reach the most probable sequence of words. Currently, AI-generated text is comparable to human language in grammar and fluency, but does not seem to be equivalent in terms of factual accuracy or overall quality. A recent study by Chen et al. (2023) examined how ChatGPT is changing over time, finding that performance varied on different tasks, and gains in performance on one task occurred alongside decreased performance on another task. In part, this may be due to model tuning, or how learning on new tasks affected performance on previously learned tasks. Evaluation of AI-models is limited in that they are â€œblack boxesâ€ given that the complex computations of their deep learning architectures are uninterpretable at a human level. Model predictions on novel inputs can also retain the biases of data the model was trained on?LLMs may be overfit to the data used to train them. Furthermore, AI-models can generate content that is incorrect or misleading, without the knowledge required to understand why an error is wrong. AI applications in sensitive areas such as health-care and medicine have led to some calls to reconsider the use of LLMs until they have been more thoroughly evaluated (Armitage, 2023; Liao et al., 2023). AI is a powerful change agent that will have a lasting effect on human communication. It will continue to be important to understand how AI content is generated and evaluated as the distinction between human and AI content becomes less clear.?

References

Armitage, H. (2023). Rethinking large language models in medicine. Stanford Scope.?Blog post. https://scopeblog.stanford.edu/2023/08/07/rethinking-large-language-models-in-medicine/

Biderman, S., & Edward Raff, E. (2022). Fooling MOSS detection with pretrained language models.?arXiv Preprint, arXiv:2201.07406. https://doi.org/10.48550/arXiv.2201.07406.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A. â€¦ & Armodel, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing?Systems, 33, 1877-1901. https://doi.org/10.48550/arXiv.2005.14165.?

é¢†è‹±æŽ¨è

AMR Future Brief| Why Have Large Language Models (LLMs) Become Indispensable to the Healthcare Sector in 2024?

AMR Future Brief| Why Have Large Language Modelsâ€¦

Allied Market Research 8 ä¸ªæœˆå‰

Enhancing Large Language Models with Reinforcement Learning from Human Feedback: An In-depth Analysis

Enhancing Large Language Models with Reinforcementâ€¦

Sanjay Kumar MBA,MS,PhD 1 å¹´å‰

Unleashing the Power of LLMs with Flash Attention

Kavana Venkatesh 1 å¹´å‰

Celikyilmaz, A., Clark, E, & Gao, J. (2021). Evaluation of text generation: A survey. arXiv:2006.14799? https://doi.org/10.48550/arXiv.2006.14799.

Chen, L., Zaharia, M., & Zou, J. (2023). How is ChatGPTâ€™s behavior changing over time? arXiv:2307.09009. https://doi.org/10.48550/arXiv.2307.09009.??

Kirchenbauer, J., Geiping, J., We, Y., Katz, J., Miers, I., & Goldstein, T. (2023). A watermark for large?language models. arXiv:2301.10226. https://doi.org/10.48550/arXiv2301.10226.?

Dou, Y., Forbes, M., Koncel-Kedziorski,? R., Smith, N., & Choi, Y. (2022). Is GPT-3 text indistinguishable?from human text? Scarecrow: A framework for scrutinizing machine text. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Association for Computational Linguistics, pp. 7250â€“7274.? https://doi.org/10.48550/arXiv.2107.01294.??

Guo, B., Zhang, X., Wang, Z., Jiang, M., Nie, J., Ding, Y., Yue, J., & Wu, Y. (2023). How close is ChatGPT?to human experts? Comparison corpus, evaluation, and detection. arXiv: 2301.07597. https://doi.org/10.48550/arXiv.2301.0759.

Jakesch, M., Hancock, J., & Naaman, M. (2022). Human heuristics for AI-generated language are flawed. PNAS, 120(11),?e2208839120. https://doi.org/10.1073/pnas.2208839120

Kabir, S., Udo-Imeh, D.N., Kou, B., & Zhang, T. (2023). Who answers it better? An in-depth analysis of?ChatGPT and Stack Overflow answers to software engineering questions. arXiv: 2308.02312,?published online, 4 August 2023. https://doi.org/10.48550/arXiv.2308.02312.?

Liao, W., Liu, Z., Dai, H., Xu, S., Wu, Z., Zhang, Y., Huang, X., Zhu, D., Cai, H., Liu, T., & Li, X. (2023).?Differentiate ChatGPT-generated and human-written Medical Texts. arXiv:2304.11567.?https://doi.org/10.48550/arXiv.2304.11567.?

Ma,?Y.,?Liu,?J., &?Yi, F. (2023). Is this abstract generated by AI? A research for the gap between AI-generated scientific text and human-written scientific text. arXiv:2301.10416.?https://doi.org/10.48550/arXiv.2301.10416.?

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1, 9.

Robosell, J. Self-attention: Illustration of the transformer attention mechanism. TikZ. Accessed August 21, 2023. https://tikz.net/self-attention/.

Uzun, L. (2023). ChatGPT and academic integrity concerns: Detecting artificial intelligence generated?content. Language, Education, and Technology, 3(1), https://www.langedutech.com/letjournal/index.php/let/article/view/49/36.?

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Sean Shiverick, MS, PhDçš„æ›´å¤šæ–‡ç«

Controlling the Difficulty of Automatically Generated Questions

2023å¹´8æœˆ9æ—¥

Controlling the Difficulty of Automatically Generated Questions

Automated Question Generation (AQG) Over the past two decades, educators and researchers have been actively developingâ€¦

3 æ¡è¯„è®º
Predictive models of student performance for data-driven learning analytics.

2019å¹´6æœˆ17æ—¥

Predictive models of student performance for data-driven learning analytics.

The development of analytic approaches for predictive modeling allows researchers and educators to detect patterns inâ€¦

2 æ¡è¯„è®º
Discussing climate change with relatives at the holidays

2018å¹´12æœˆ28æ—¥

Discussing climate change with relatives at the holidays

How do you talk about climate change with family members who do not believe it is real? At a recent family event, I hadâ€¦

10 æ¡è¯„è®º
Comparing Classifier Models of Prescription Opioid Misuse

2018å¹´12æœˆ17æ—¥

Comparing Classifier Models of Prescription Opioid Misuse

The misuse and abuse of prescription opioids (MUPO) has become a major health crisis in the U.S.

16 æ¡è¯„è®º
Modeling Opioid Pain Reliever Misuse and Abuse

2018å¹´5æœˆ22æ—¥

Modeling Opioid Pain Reliever Misuse and Abuse

Opioid abuse is often modeled as a discrete outcome that describes the likelihood that an individual will misuse orâ€¦

4 æ¡è¯„è®º
Can machine learning help predict opioid addiction?

2017å¹´12æœˆ19æ—¥

Can machine learning help predict opioid addiction?

Health informatics is generating huge amounts of data at a rapid pace, from electronic medical records (EMRs), clinicalâ€¦

2 æ¡è¯„è®º
Exploratory data analysis, formulating questions, and visualization.

2017å¹´9æœˆ2æ—¥

Exploratory data analysis, formulating questions, and visualization.

Coming up with good questions is probably one of the hardest parts about designing a research study. If you have takenâ€¦
Statistical Learning and Machine Learning: Similarities and Differences.

2017å¹´9æœˆ1æ—¥

Statistical Learning and Machine Learning: Similarities and Differences.

Venn diagrams are often used in data science to illustrate areas of overlap and distinctions between statistics andâ€¦

1 æ¡è¯„è®º

See all articles

Comparing AI-Generated Text with Human Language

Sean Shiverick, MS, PhD

Research | Analytics | Consulting

é¢†è‹±æŽ¨è

Sean Shiverick, MS, PhDçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

LLM Tokenizers: The Hidden Engine Behind AI Language Models

Large Language Model ( LLM ) Trends

Diffusion-Based Large Language Models (LLMs)

Technical Implementation of Sora - Following Language Instruction in AI Models

The Technical Essence and Future Path of Large Language Models in AI

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

?????? Large Language Models (LLMs) are revolutionizing natural language processing! But with great power comes great responsibility.

Luna: A Breakthrough in Hallucination Detection for Language Models

Decoding LLM Implementation

Custom Embedding Opportunities of Research in the Era of LLMs

é¢†è‹±æŽ¨è

Sean Shiverick, MS, PhDçš„æ›´å¤šæ–‡ç«

Controlling the Difficulty of Automatically Generated Questions

Predictive models of student performance for data-driven learning analytics.

Discussing climate change with relatives at the holidays

Comparing Classifier Models of Prescription Opioid Misuse

Modeling Opioid Pain Reliever Misuse and Abuse

Can machine learning help predict opioid addiction?

Exploratory data analysis, formulating questions, and visualization.

Statistical Learning and Machine Learning: Similarities and Differences.

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

LLM Tokenizers: The Hidden Engine Behind AI Language Models

Large Language Model ( LLM ) Trends

Diffusion-Based Large Language Models (LLMs)

Technical Implementation of Sora - Following Language Instruction in AI Models

The Technical Essence and Future Path of Large Language Models in AI

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

?????? Large Language Models (LLMs) are revolutionizing natural language processing! But with great power comes great responsibility.

Luna: A Breakthrough in Hallucination Detection for Language Models

Decoding LLM Implementation

Custom Embedding Opportunities of Research in the Era of LLMs

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†