登录查看更多内容

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

发布日期: 2024年6月12日

ANN — Approximate Nearest Neighbors —? is at the core of fast vector search, itself central to GenAI, especially GPT and LLM. My new methodology, abbreviated as PANN, has many other applications: clustering, classification, measuring the similarity between two datasets (images, soundtracks, time series, and so on), tabular data synthetization (improving poor synthetizations), model evaluation, and even detecting extreme observations.

Just to give an example, you could use it to categorize all time series without statistical theory. Statistical models are redundant and less explainable, leading to definitions less useful to developers, and math-heavy.? PANN avoids that.

Fast and simple, PANN (for Probabilistic ANN) does not involve training or neural networks, and it is essentially math-free. Its versatility comes from four features:

Most algorithms aim at minimizing a loss function. Here I also explore what you can achieve by maximizing the loss.
Rather than focusing on one set of datasets, I use two sets S and T. For instance, K-NN looks for nearest neighbors within a set S. What about looking for nearest neighbors in T, to observations in S? This leads to far more applications than the one-set approach.
Some algorithms are very slow and may never converge. No one looks at them. But what if the loss function drops very fast at the beginning, fast enough that you get better results in a fraction of the time, by stopping early, compared to using the “best” method?
In many contexts, a good approximate solution obtained in little time from an otherwise non-converging algorithm, may be as good for practical purposes as a more accurate solution obtained after far more steps using a more sophisticated algorithm.

Jacques Ludik 1 年前

?? Fixing AI's Energy Consumption

Pascal Biese 1 个月前

Unveiling the Current Depths of AI's Mathematical…

Jen Zhu Scott 1 个月前

The figure below shows how quickly the loss function drops at the beginning. In this case, the loss represents the average distance to the approximate nearest neighbor, obtained so far in the iterative algorithm. The X-axis represents the iteration number. Note the excellent curve fitting (in orange) to the loss function, allowing you to predict its baseline (minimum loss, or optimum) even after a small number of iterations. To see what happens if you maximize the loss instead, read the full technical document.

Fast convergence of PANN, at the beginning (Y-axis is loss function

Read the full article on GitHub, here . It is included as project 8.1 in my coursebook State of the Art in GenAI & LLMs — Creative Projects, with Solutions available here . This 200+ pages coursebook features many related projects, case studies, Python code, and datasets.

GenAI and Machine Learning

204,532 位关注者

Prasenjit Singh

Technologist | Digital Innovation & Management

5 个月

Intricate graph! The loops and twists represent the complex relationships within knowledge graphs. Fascinating to see how 346.18 and 346.21 are connected.

1 次回应

Abdelhakim M.

Research And Development Engineer

5 个月

Spoiler alert: maximizing a loss function is the same as minimizing the negative of that loss function.

Stanley Waite - Inventor

Time is Everything and I Help Teach that You can take control of it, for better Mental & Physical Wellbeing - Thinking outside the Box with Business, Relationships and Pleasure.

5 个月

without understanding the context of the question which gives the answers/data points, all AI generative results are just garbage in the end, not able to create, just give answers at the mean.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Probabilistic Nearest Neighbors: The Swiss Army Knife of GenAI

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

领英推荐

GenAI and Machine Learning

204,532 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Vectors are over, hashes are the future of AI

The New Intelligence

AI's Historical Precedents and Lessons Learned

How to Improve Small Object Detection Accuracy Without Increasing Latency

AI, Artificial General Intelligence, and Intuition

AI/ML Digest | Issue 37

Enhancing Vector Database Storage in GPT through Simulated Acetylcholine: A Stigmergetic Approach to Memory Consolidation

Top AI/ML Papers of the Week [29/04 - 05/05]

2019 Predictions: How We Learned to Stop Worrying and Love AI

AI has much to offer humanity

领英推荐

GenAI and Machine Learning

204,532 位关注者

New LLM & RAG Courses and Certifications

2024年11月14日

Optimizing AI Systems: Fintech Case Study

2024年11月5日

LLM, RAG, GPT & GenAI: Free Certifications and Courses from Leading Experts

2024年11月1日

Building a GenAI/LLM app on AWS with Anthropic Claude

2024年10月28日

AI/RAG Tutorial: Building Enterprise-Grade, Secure, Scalable Data APIs

2024年10月22日

AI, GenAI, LLM, Prompt Engineering, NLP: Review of the Ecosystem

2024年10月18日

New Book: Building Disruptive AI & LLM Technology from Scratch

2024年10月15日

Building an Enterprise-Grade Agentic RAG

2024年10月14日

Databases For AI, GenAI & RAG/LLMs: Vendor Comparison

2024年10月9日

Building a Ranking System to Enhance Prompt Results: The New PageRank for RAG/LLM

2024年10月8日

社区洞察

其他会员也浏览了

Vectors are over, hashes are the future of AI

The New Intelligence

AI's Historical Precedents and Lessons Learned

How to Improve Small Object Detection Accuracy Without Increasing Latency

AI, Artificial General Intelligence, and Intuition

AI/ML Digest | Issue 37

Enhancing Vector Database Storage in GPT through Simulated Acetylcholine: A Stigmergetic Approach to Memory Consolidation

Top AI/ML Papers of the Week [29/04 - 05/05]

2019 Predictions: How We Learned to Stop Worrying and Love AI

AI has much to offer humanity