Sanskrit & AI: languages, Ambiguity and Efficiency

Sanskrit & AI: languages, Ambiguity and Efficiency

I can't even begin to express how blown away I am by the response to my very first article on Sanskrit & AI. Seriously, you guys rock! So, here I am, back with a follow-up, all thanks to a super intriguing comment from Harsh Raj.

People often claim that Sanskrit is the least ambiguous language. So can you tell me about it's intermediate representation that happens in the hidden layers? How is it different than that of English for the same semantic sentence? I am really curious whether we can augment the other language NLP training with Sanskrit if it is less ambiguous.

Here, lets look at the premise of the comment: "Sanskrit is the least ambiguous language". Now, that begs the question, what are the most ambiguous natural languages? More importantly, how do we measure it and then rank these languages. The nerdy computer scientist in me would be tempted to go into the literature on undecidability, and build upon the work on Context free grammar (CFG) languages by Ginsburg and Ullian, where they show that determining the ambiguity in such languages is undecidable.

Let's step back and take a wider view. Is it truly ideal for a language to be completely devoid of ambiguity? From where I'm standing, definitely not. I would say, a natural language's expressive power should enable humans to produce both ambiguous and unambiguous statements. For instance, would anyone actually enjoy spending an evening or a vacation leisurely reading legal documents or contracts? I highly doubt it. But why are these texts often so painstakingly pedantic to the point of being a bit dull or excessive? It all boils down to this: these documents are being optimised to be as unambiguous as possible. An alternative interpretation of a sentence in a contract would be costly for the stakeholders.

If we look at social contexts, unambiguity need not be the only objective to be optimised for. Efficiency also comes to the picture here. We want our communication to be smooth, with minimal effort from both the sender and receiver. Often, we rely on context to iron out any potential misunderstandings. But, balancing efficiency and clarity isn't always a walk in the park. As we can imagine, optimising for both efficiency and unambiguousness often may lead to need for trade-offs in the way we communicate. Languages do evolve and incorporate various linguistic tools and techniques to optimise for efficiency and unambiguity. For instance, most languages would have their common words to be short, including functional words. However, not every such property is universal across all languages. For instance, languages like English is a vocabulary heavy language and maintain a large inventory of static words. At the same time, languages like Sanskrit (or German) relies more on the generative process of productivity. For instance, the following is a single "word" in Sanskrit, rather a compound word:

pravaramuku?ama?imarīcima?jarīcayacarcitacara?ayugala
It means, "O! the one whose dual feet are covered by the cluster of brilliant rays from the gems of the best crowns, from Panchatantra." However, it is created by combining 9 simple word stems to form a single compound word:
pravara-muku?a-ma?i-marīci-ma?jarī-caya-carcita-cara?a-yugala

Sanskrit relies on productivity, rather than on maintaining an inventory of large number of words. Now, which approach is more efficient? Which approach would make it less ambiguous. It is quite difficult to say. English and Sanskrit use two different linguistic tools to achieve quite similar outcomes in communication. Similarly, take sentence structure, for instance. English leans on word order to convey meaning, while Sanskrit relies on its morphology. Hence one would find (seemingly) arbitrary free word order sentences in Sanskrit, especially in the classical era literature. Again, both languages tackle the efficiency-ambiguity trade-off in their own ways.

Finally, let me showcase few scenarios where ambiguity is not only desirable but celebrated.

Sri Raghava Yadhaveeyam is a 30 stanza "bidirectional" poem in Sanskrit, which narrates the story of Rāma when read forwards, and when read backwards, it plunges into a story from Krishna's life. It is a display of linguistic and prosodic mastery. Similarly, Avadhanam is a literary improv performance, that encourages mastery of various cognitive capabilities including observation, memory, multitasking, task switching, retrieval, reasoning and creativity, nothing short of a mental gymnastics. It was a prevalent entertainment performance performed in various Indian languages. Please refer to one such performance in the video given below.

Now, let's cap things off with one last example of celebrating ambiguity. if we look at the following conversation between Sri Krishna and Satyabhama, his wife:

(Source)

????????? ?? ????? ??????? ?????? ????? ??? ?????? ?? ????? ??? ?????? ? ?? ??????? ??? ?????????? ?????????? ???? ??????????? ????? ????????? ???? ??? ????????? ??????? ???????? ???????????? ???? ??????????????
According to this shloka, Lord Krishna visits his wife Satyabhama when she is upset. Finding the door closed, he knocks. Pretending to not know, Satyabhama asks her aide Vishikha to check who it is. Krishna introduces himself with his name but Satyabhama finds another meaning for the word. Krishna starts describing himself with other words but each time Satyabhama teases him by finding the other meanings of the words.

The conversation goes this way:-

  • Satyabhama :- ????????? ?? ????? ??????? ??????? (O Vishikha, who knocks on the door?)
  • Krishna :- ????? (I am Madhava) | Satyabhama :- ??? ??????? (Is it the spring season?)
  • Krishna :- ?? ????? (I am Chakri, the holder of a disc) | Satyabhama :- ??? ??????? (A potter then?)
  • Krishna :- ? ?? ??????? (No, I am the one who holds the Earth) | Satyabhama :- ??? ?????????? ?????????? (Is it Adi Shesha, the serpent king, who carries the earth on his head?)
  • Krishna :- ???? ??????????? (No. I am the one who suppressed the poisonous snake Kaliya) Satyabhama :- ????? ?????:? (Is it Garuda, the King of Birds?)
  • Krishna :- ?? ???? (No, I am Hari) | Satyabhama :- ??? ?????????? (Is it a monkey?)
  • Finally the poet says, ??????? ???????? ???????????? ???? ???????????? (May Lord Krishna, thus defeated by Satyabhama in a wordplay, protect you)

I hope I've covered most of the stuff Harsh brought up here. But there's still some AI/NLP bits in his comment that need addressing. Let's save that for next week. I've got two keywords to tease you with until then: Behaviourism and Cognitive processes.

Amrith Krishna

AI Researcher and Entrepreneur | Alum at UniCambridge, ITU | PhD at IITKgp | AI Researcher | Youtuber - 100K+ Subs

10 个月
回复
Kavya Manohar

Building AI for Justice Systems | PhD in Speech Technology | Language Technology | Research | Scientific Writing

10 个月

Amrith Krishna, Loved how this article covered various aspects of the aesthetics of linguistic ambiguity. Thanks for introducing Sri Raghava Yadhaveeyam.

回复
Nishant Jha

IEEE Member (Student) | Master's Student (Data Science) @ UNSW Sydney | Artificial Intelligence, Blockchain & Cryptography Researcher | Womanium Quantum Scholar' 23 | ACM ICPC' 23 (South Pacific) Regionalist

10 个月

Hello Mr. Krishna, I'm currently working on the same topic and currently I'm working of CFG design for sanskrit languages and compilers and I would be glad if I get any resource or input from your end. Thanks.

Raviraja Bhat

Data Science Engineer - AVP @ Swiss Re | AI/ML Specialist | LLM and NLP Expert

10 个月

Amrith Krishna Thanks for this detailed insights ?? I am really curious to know from you what are your thoughts on using vedic chanting recitation styles (Samhita, Pada, Krama, Jata, Maalaa, Sikha, Rekha, Dhwaja, Danda, Rathaa, Ghana) to enhance our approach to any language understanding tasks ?

要查看或添加评论,请登录

Amrith Krishna的更多文章

社区洞察

其他会员也浏览了