Advances in non-generative Machine Learning
GenerativeAI has been in the news for the past couple of years. However, in the background, significant work has happened in the non-generative AI space. This topic is interesting because most products we build are not direct genAI applications. For example, a RAG application needs a good search to work well. So search may become a limiting factor for a RAG application.
The excitement around GenAI has channeled money to many relatively underfunded areas in Machine Learning. In the rest of this article, I will elaborate on some topics where I see steep improvements.
Search
A lot of funding and effort has gone into Search. ChatGPT showed the potential of generative models, but everyone immediately knew that chatGPT had to be augmented with new knowledge if its responses were to be useful. Search was the best way to get there.
Dense retrieval saw innovations in entailment tuning and multi-variate dense retrieval. Research has shown that the choice of retrieval unit (document, passage, sentence, or proposition) significantly impacts the performance of dense retrieval systems. A novel proposal is to use propositions—compact expressions encapsulating distinct facts—as retrieval units. Sparse Retrieval similarly saw a spurt of interesting ideas. One such example is the Learned Sparse Retrieval model called SPLADE.
Search infrastructure is another area of innovation. Vector databases improved a lot in terms of efficiency and performance.
Applications
Voice and Visual Search Integration - Multi-modal representations
Multimodal embedding is a sophisticated approach in machine learning that integrates multiple types of data—such as text, images, and sometimes audio or video—into a unified vector space. This technique allows for the simultaneous processing and analysis of different data modalities, enabling models to capture complex relationships and semantics across varied forms of information.
领英推荐
Applications
Multimodal embeddings have a wide range of applications across various fields:
Federated Learning for Privacy-Preserving Search
Federated learning (FL) is a decentralized approach to machine learning that enables multiple parties to collaboratively train models without sharing their raw data. This method addresses critical privacy concerns by allowing data to remain on local devices, thus enhancing security and compliance with regulations such as GDPR and HIPAA.
Here are a couple of papers talking about the advances and challenges in Federated Learning.
Applications
Conclusion
While generative AI captures the most attention, a wealth of exciting research and applications is emerging across various domains. Every sub-field of machine learning is witnessing significant advancements, and new areas of exploration are continuously emerging.