Bias, drift, and hallucination in AI Large Language Models(LLMs) may be misrepresenting Vedic Sanatan Dharma

Bias, drift, and hallucination in AI Large Language Models(LLMs) may be misrepresenting Vedic Sanatan Dharma

This article is based on one of the papers presented at GLOBAL CONFERENCE ON DHARMA IN DIGITAL AGE [https://dcfusa.org/dda24/ ].

Coded prejudice: How bias against Vedic Sanatana Hinduism (VSH) in LLMs is sparking chaos

Large Language Models (LLMs) have become the new frontier in artificial intelligence, promising to revolutionize everything from content creation to scientific research. However, a recent revelation has cast a dark shadow over this progress – a potential bias against VSH within these powerful AI models. This bias, if left unchecked, threatens to sow discord, spread misinformation, and hinder cross-cultural understanding.

Professor Rajiv Malhotra, author of a new book on AI explains:

https://www.youtube.com/watch?v=ThkfQotEF7k

Dr. Kush Varshney from IBM research explains:

"Colonialism is one country controlling another and exploiting it economically and in other ways. Coloniality, however, describes domination, including in abstract forms such as in the production of knowledge, that remains after the end of formal colonialism [119]. Decoloniality is the process of challenging and dismantling coloniality [96]. The terms usually refer to European or Western colonialism and its remnants in the Global South. Decolonial computing is developing computing systems with and for people there that reduce asymmetric power relationships, based on their values and their knowledge systems [7]. Based on these ideas, there has been a recent flowering of research on decolonial artificial intelligence (AI), beginning with the seminal paper by Mohamed, Png and Isaac [99]. Through this lens, extractive providers of closed models may be viewed as metropoles: the colonial powers. Further discussion of the decolonial AI literature is provided in Section" [1]

"This suggested approach builds upon the non-universal non-absolutist tradition of moral philosophy known as Hinduism [33, 122], which includes vibrant argument and debate on the nature of dharma (right behavior) and its explication through various ways of knowing, including artistic expression [36]. The syncretic framework of Hinduism (described in greater detail in Section 2.3) has the appropriate characteristics of openness to be used as a starting point for an alternative future of AI alignment [127, 134]. At the end, I build upon the suggested dharmic approach and give a more concrete reference architecture of a technology stack for less morally absolute and less colonialized AI alignment". [1]

"Shani and Chadha Behera explain that [130]: “the concept of dharma offers a mode of understanding the multidimensionality of human existence without negating any of its varied, contradictory expressions.” For example, Carv ˉ aka, Buddhist, Jain, and other so-called n ˉ astika samprad ˉ ayas (knowledge systems) reject ˉ the Vedas.2 Moreover, even within astika samprad ˉ ayas that accept the Vedas, their utility is questioned. For example, ˉ the Bhagavad-Gˉ?ta says that the Vedas are of limited use to people who have understood their main message (chapter 2, ˉ verse 46). Such ‘heresy’ is not only tolerated, it is accepted and encouraged". [1]

Lack of Dharmic guardrails:

The conference highlighted the lack of Dharmic guardrails and how we can use principles of Parikshinam (examination), Bhaja Kavacanam (protection) to ensure the outputs are carefully filtered to adhere to Sanatana Dharma principles.

Inaccurate Representation

At the heart of the issue lies the way LLMs are trained. These models learn by analyzing massive datasets of text and code. Unfortunately, the data used to train many LLMs often lacks diversity and can be skewed towards Western viewpoints. This can lead to a situation where the LLMs develop a skewed understanding of VSH, one that is incomplete, inaccurate, or even negative.

Consequences of Bias

The ramifications of such bias are far-reaching:

  • Misinformation and Stereotypes: Biased LLMs perpetuate negative stereotypes about VSH, its practices, and its followers. This can fuel social discord and hinder interfaith dialogue.
  • Educational Disadvantage: Students and educators relying on LLMs for research or learning materials encounter inaccurate information about VSH. This can distort their understanding of the religion and its rich history.
  • Algorithmic Bias: LLMs are increasingly being used to power search engines, social media platforms, and recommendation systems. Bias against VSH within these models lead to unfair representation, censorship of Hindu voices, and the suppression of information about the religion.

Examples of Bias: Recent incidents highlight the potential dangers of bias in LLMs:

  • A student's research paper: A student using an LLM for research on Hinduism found the model primarily generating responses that focused on negative stereotypes about caste and social practices.
  • Social media bias: A social media platform powered by an LLM flagged posts about Hindu festivals as spam or hate speech, while allowing similar content from other religions to pass through.

Combating the Bias

Addressing this issue requires a multi-pronged approach:

  • Data Diversity: The datasets used to train LLMs need to be more inclusive and representative of Hinduism's vast philosophical and cultural spectrum. This includes incorporating texts in various Indian languages, historical documents, and scholarly works.
  • Algorithmic Auditing: Regular audits of LLM algorithms are crucial to identify and mitigate bias. Developers need to implement fairness checks and ensure the models are not discriminatory towards any religion.
  • Human Oversight: While LLMs are powerful tools, they should not operate in a vacuum. Human oversight is essential to ensure the accuracy and fairness of the information they generate.

Hanuman Lands in the Arena: India's AI Landscape Gets a Boost

The realm of Artificial Intelligence (AI) has witnessed a global race, with tech giants vying to develop the most powerful and versatile large language models (LLMs). This pursuit has seen a recent surge in India, with a new contender entering the ring – Hanooman. Inspired by the revered monkey god, this LLM promises to be a game-changer in the Indic AI landscape, joining established players like Ola's Krutrim, SaravamAI's OpenHathi, and IIT-Madras's Airavata model.

Hanooman, a collaborative effort led by the Indian Institutes of Technology (IITs) in partnership with Reliance Jio and Seetha Mahalaxmi Healthcare (SML), boasts a unique advantage – its focus on understanding and processing the rich tapestry of Indian languages. This focus addresses a critical gap in the current LLM landscape, which has largely been dominated by models trained on English data.

Why is an Indic LLM Important?

India, with its 22 official languages and hundreds of dialects, presents a complex linguistic environment. Existing English-centric LLMs struggle to grasp the nuances and cultural context inherent in these languages. This creates a digital divide, where a significant portion of the population is excluded from the benefits of AI advancements.

Hanooman, trained on a massive dataset of text and code in various Indian languages, aims to bridge this gap. Here's how:

  • Accessibility: By enabling interaction and information access in native languages, Hanooman can empower a wider audience to engage with technology. This can revolutionize education, healthcare, and government services, making them more inclusive and accessible.
  • Preservation: Hanooman can serve as a powerful tool for language preservation. By analyzing vast amounts of text data, it can help identify and document endangered languages and dialects, promoting cultural heritage.
  • Content Creation: Hanooman's ability to generate text, translate languages, and write different kinds of creative content can fuel a new wave of vernacular literature, music, and media.

Hanooman vs. The Competition

While Hanooman represents a significant step forward, it's important to see how it stacks up against existing players:

  • Ola's Krutrim: Launched in 2023, Krutrim focuses on understanding and generating Indian language text. It excels in tasks like question answering and summarization in Hindi and English.
  • SaravamAI's OpenHathi: This open-source LLM, launched in 2022, primarily focuses on Tamil. OpenHathi is a valuable resource for developers working on Tamil language applications.
  • IIT-Madras's Airavata: This multi-lingual LLM, unveiled in 2021, supports several Indian languages. Airavata demonstrates the potential of multilingual LLMs and paves the way for further advancements.

Hanooman's Differentiators

Hanooman stands out in several ways:

  • Multimodality: Unlike some competitors, Hanooman is designed to be multimodal. This means it can not only process text but also understand and generate audio, video, and images, making it a more versatile tool for various applications.
  • Open-Source Focus: Hanooman's developers have pledged to make parts of the model open-source, encouraging collaboration and innovation within the Indian AI community. This approach fosters faster development and wider adoption.
  • Focus on Specific Needs: Hanooman is being developed with a specific focus on addressing India's social and economic challenges. This tailored approach could lead to breakthroughs in areas like agriculture, education, and healthcare.

Challenges and the Road Ahead

Despite its potential, Hanooman faces challenges:

  • Data Quality: Training large language models requires massive amounts of high-quality data. Ensuring the accuracy and comprehensiveness of data in various Indian languages is crucial for optimal performance.
  • Ethical Considerations: Like all AI models, Hanooman raises ethical concerns around bias and fairness. Mitigating these concerns requires careful design and ongoing monitoring.
  • Infrastructure and Expertise: Developing and maintaining large language models necessitates robust computing infrastructure and a skilled workforce. Building this capacity within India is crucial for long-term success.

Conclusion: A Collaborative Future for Indic AI

The arrival of Hanooman signifies a new chapter in India's AI journey. By fostering collaboration between academia, industry, and government, India can establish itself as a leader in the development of responsible and inclusive AI solutions. The combined efforts of Hanooman, Krutrim, OpenHathi, Airavata, and other future projects hold immense potential to bridge the digital divide, empower local communities, and enrich the global AI landscape. As these models continue to evolve, we can expect to witness exciting advancements that shape a more inclusive and technologically advanced. A world where AI perpetuates bias against Hinduism not only hinders understanding but also excludes a significant portion of the global population. Hinduism, with its rich traditions and diverse schools of thought, has much to offer the world. Ensuring fair representation within LLMs is not just about being unbiased, it's about fostering inclusivity and enriching the global knowledge pool. The potential of LLMs is undeniable. However, for them to fulfill their promise, they must be built on a foundation of fairness and inclusivity. By acknowledging the problem of bias against Hinduism in LLMs and taking concrete steps to address it, we can ensure that these powerful tools serve humanity as a whole, promoting understanding and enriching the tapestry of human knowledge.

References:

  1. https://arxiv.org/pdf/2309.05030

要查看或添加评论,请登录

社区洞察

其他会员也浏览了