Sanskrit & AI: languages, Ambiguity and Efficiency
Amrith Krishna
AI Researcher and Entrepreneur | Alum at UniCambridge, ITU | PhD at IITKgp | AI Researcher | Youtuber - 100K+ Subs
I can't even begin to express how blown away I am by the response to my very first article on Sanskrit & AI. Seriously, you guys rock! So, here I am, back with a follow-up, all thanks to a super intriguing comment from Harsh Raj.
People often claim that Sanskrit is the least ambiguous language. So can you tell me about it's intermediate representation that happens in the hidden layers? How is it different than that of English for the same semantic sentence? I am really curious whether we can augment the other language NLP training with Sanskrit if it is less ambiguous.
Here, lets look at the premise of the comment: "Sanskrit is the least ambiguous language". Now, that begs the question, what are the most ambiguous natural languages? More importantly, how do we measure it and then rank these languages. The nerdy computer scientist in me would be tempted to go into the literature on undecidability, and build upon the work on Context free grammar (CFG) languages by Ginsburg and Ullian, where they show that determining the ambiguity in such languages is undecidable.
Let's step back and take a wider view. Is it truly ideal for a language to be completely devoid of ambiguity? From where I'm standing, definitely not. I would say, a natural language's expressive power should enable humans to produce both ambiguous and unambiguous statements. For instance, would anyone actually enjoy spending an evening or a vacation leisurely reading legal documents or contracts? I highly doubt it. But why are these texts often so painstakingly pedantic to the point of being a bit dull or excessive? It all boils down to this: these documents are being optimised to be as unambiguous as possible. An alternative interpretation of a sentence in a contract would be costly for the stakeholders.
If we look at social contexts, unambiguity need not be the only objective to be optimised for. Efficiency also comes to the picture here. We want our communication to be smooth, with minimal effort from both the sender and receiver. Often, we rely on context to iron out any potential misunderstandings. But, balancing efficiency and clarity isn't always a walk in the park. As we can imagine, optimising for both efficiency and unambiguousness often may lead to need for trade-offs in the way we communicate. Languages do evolve and incorporate various linguistic tools and techniques to optimise for efficiency and unambiguity. For instance, most languages would have their common words to be short, including functional words. However, not every such property is universal across all languages. For instance, languages like English is a vocabulary heavy language and maintain a large inventory of static words. At the same time, languages like Sanskrit (or German) relies more on the generative process of productivity. For instance, the following is a single "word" in Sanskrit, rather a compound word:
pravaramuku?ama?imarīcima?jarīcayacarcitacara?ayugala
It means, "O! the one whose dual feet are covered by the cluster of brilliant rays from the gems of the best crowns, from Panchatantra." However, it is created by combining 9 simple word stems to form a single compound word:
pravara-muku?a-ma?i-marīci-ma?jarī-caya-carcita-cara?a-yugala
Sanskrit relies on productivity, rather than on maintaining an inventory of large number of words. Now, which approach is more efficient? Which approach would make it less ambiguous. It is quite difficult to say. English and Sanskrit use two different linguistic tools to achieve quite similar outcomes in communication. Similarly, take sentence structure, for instance. English leans on word order to convey meaning, while Sanskrit relies on its morphology. Hence one would find (seemingly) arbitrary free word order sentences in Sanskrit, especially in the classical era literature. Again, both languages tackle the efficiency-ambiguity trade-off in their own ways.
领英推荐
Finally, let me showcase few scenarios where ambiguity is not only desirable but celebrated.
Sri Raghava Yadhaveeyam is a 30 stanza "bidirectional" poem in Sanskrit, which narrates the story of Rāma when read forwards, and when read backwards, it plunges into a story from Krishna's life. It is a display of linguistic and prosodic mastery. Similarly, Avadhanam is a literary improv performance, that encourages mastery of various cognitive capabilities including observation, memory, multitasking, task switching, retrieval, reasoning and creativity, nothing short of a mental gymnastics. It was a prevalent entertainment performance performed in various Indian languages. Please refer to one such performance in the video given below.
Now, let's cap things off with one last example of celebrating ambiguity. if we look at the following conversation between Sri Krishna and Satyabhama, his wife:
(Source)
????????? ?? ????? ??????? ?????? ????? ??? ?????? ?? ????? ??? ?????? ? ?? ??????? ??? ?????????? ?????????? ???? ??????????? ????? ????????? ???? ??? ????????? ??????? ???????? ???????????? ???? ??????????????
According to this shloka, Lord Krishna visits his wife Satyabhama when she is upset. Finding the door closed, he knocks. Pretending to not know, Satyabhama asks her aide Vishikha to check who it is. Krishna introduces himself with his name but Satyabhama finds another meaning for the word. Krishna starts describing himself with other words but each time Satyabhama teases him by finding the other meanings of the words.
The conversation goes this way:-
I hope I've covered most of the stuff Harsh brought up here. But there's still some AI/NLP bits in his comment that need addressing. Let's save that for next week. I've got two keywords to tease you with until then: Behaviourism and Cognitive processes.
AI Researcher and Entrepreneur | Alum at UniCambridge, ITU | PhD at IITKgp | AI Researcher | Youtuber - 100K+ Subs
10 个月Sanskrit & AI Part 3: https://www.dhirubhai.net/pulse/sanskrit-ai-part-3-aint-thing-free-word-order-really-amrith-krishna-ggcfc
Building AI for Justice Systems | PhD in Speech Technology | Language Technology | Research | Scientific Writing
10 个月Amrith Krishna, Loved how this article covered various aspects of the aesthetics of linguistic ambiguity. Thanks for introducing Sri Raghava Yadhaveeyam.
IEEE Member (Student) | Master's Student (Data Science) @ UNSW Sydney | Artificial Intelligence, Blockchain & Cryptography Researcher | Womanium Quantum Scholar' 23 | ACM ICPC' 23 (South Pacific) Regionalist
10 个月Hello Mr. Krishna, I'm currently working on the same topic and currently I'm working of CFG design for sanskrit languages and compilers and I would be glad if I get any resource or input from your end. Thanks.
Data Science Engineer - AVP @ Swiss Re | AI/ML Specialist | LLM and NLP Expert
10 个月Amrith Krishna Thanks for this detailed insights ?? I am really curious to know from you what are your thoughts on using vedic chanting recitation styles (Samhita, Pada, Krama, Jata, Maalaa, Sikha, Rekha, Dhwaja, Danda, Rathaa, Ghana) to enhance our approach to any language understanding tasks ?