登录查看更多内容

All Databases are Equal, but Some Databases are More Equal than Others

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

发布日期: 2024年9月26日

+ 关注

In this session, we will discuss the following, with an emphasis on AI applications:

Advanced features to optimize performance, security, and scalability.
How to seamlessly migrate to modern databases, with a case study: MariaDB
A live demonstration of using modern database solutions to unlock these advanced capabilities.

Today, it is possible to switch to a different platform, keep your data and slow queries unchanged, even if written in traditional SQL or dealing with JSON.

Quick tips to increase performance

Switch to different architecture with better query engine, for instance from JSON or SQL to vector DB. The new engine may also optimize configuration parameters.
Efficiently encode your fields, with minimum or no loss, especially for long text elements. This is done automatically when switching to a high-performance database.
Eliminate features or rows that are never used. Work with smaller vectors.
Leverage the cloud, distributed architecture, and GPU.
Optimize queries to avoid expensive operations. This can be done automatically with AI, transparently to the user. For instance, when switching to this platform.
Use cache for common queries or rows/columns most frequently accessed.
Load parts of the database in memory and perform in-memory queries. That's how I get queries running at least 100 times faster in my LLM app, compared to vendors.
Use techniques such as approximate nearest neighbor search for faster retrieval, especially in RAG apps. This is done automatically when switching to a high-performance platform.

Pavan Belagatti 12 个月前

Customize Your Own Data Science Platform

Kate Strachnyi 2 年前

The Case for Shared Nothing

Ricardo Jimenez-Peris 3 年前

Some common types of databases

Vector and graph databases are among the most popular these days, especially for GenAI and LLM apps. Most can also handle tasks performed by traditional databases and understand SQL and other languages (NoSQL, NewSQL). Some are optimized for fast search and real time. See here for one of the most efficient and versatile.
In vector DBs, features (the columns in a tabular dataset) are processed jointly and encoded, rather than column by column. Graph DBs store information as nodes and node connections. For instance, knowledge graphs and taxonomies with related categories and sub-categories. JSON and bubble databases deal with unstructured data such as text and web content. In my case, I use key-value schemas, also known as hash tables or dictionaries in Python.
Some DBs are column-oriented while the standard is based on rows. Some fit in memory: they are called in-memory databases, achieving faster execution. Another way to increase performance is via distributed architecture, for instance Hadoop.
In object-oriented databases, data is stored as objects, similar to object-oriented programming languages. It allows for direct mapping of objects in code to objects in the database.
Hierarchical databases are good at representing tree structures, a special kind of graph. Network databases go one step further, allowing more complex relationships than hierarchical databases, in particular multiple parent-child relationships.
For special needs, consider time series, geospatial and multimodel databases (not to be confused with multimodal). Multimodel DBs support multiple data models (document, graph, key-value) within a single engine. Image and soundtrack repositories can also be organized as databases.

This hands-on workshop is for developers and AI professionals, featuring state-of-the-art technology, case studies, code-share, and live demos. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.

GenAI and Machine Learning

205,016 位关注者

Hamza Bakh

Seeking PFE Internship | Data Analyst | Aspiring Data Engineer | Expertise in Data Warehousing, Cloud Platforms (AWS, Azure), Python, SQL, and Data Visualization (Power BI).

1 个月

Invaluable insights, Mr Vincent Granville! The emphasis on modern databases, particularly vector and graph DBs, resonates strongly with the AI and data-driven workflows I’ve been involved in. The move from traditional SQL or JSON to vector databases, especially in the context of LLM apps and RAG, is transformative for optimizing speed and accuracy in real-time applications. Your point on leveraging in-memory databases for increased query efficiency mirrors some of the breakthroughs we’ve seen with cloud-based architectures. I’m especially interested in the live demo and learning how to seamlessly migrate to MariaDB while preserving query integrity. Thanks for the opportunity to explore cutting-edge database solutions!

1 次回应

Nigel Goodwin

Looking for new opportunities.

2 个月

It is important to understand the type of queries needed. for example, i do a lot of work with arrays of floats, and generally there is no need to query within that array. So I encode the array and store as text. I transfer to the front end using the encoded text and decode at the front end. This gives amazing performance and low storage. I also do further compaction by noting repeated values and storing a value and the number of repeats. So PostGreSQL is ideal for this kind of work, no need to go to the latest toys. We can use the same ideas to store segments of a time series. For example, each day of 10 minute data is stored in a separate encoded text. They can be put together at the front end. I am staggered how often I find float arrays stored in json strings or files which have huge overheads and bring down distributed systems. Worse you see garbage like [{time=0.4, value = 0.98765}{.....}]. What were they thinking?

5 次回应

Vamsi Kethu

2 个月

Vincent Granville I like the article content and the title. ?? Thanks for sharing!!.

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

All Databases are Equal, but Some Databases are More Equal than Others

Vincent Granville

AI/LLM Disruptive Leader | GenAI Tech Lab

领英推荐

GenAI and Machine Learning

205,016 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

The Case for Shared Nothing

Navigating the Landscape of Vector Databases: A Comprehensive Analysis of Approaches and Capabilities

DoubleCloud’s 14th Product Update

Architecture Powering Down Stream System with CDC from HUDI Transactional Datalake

Candlestick Pattern Analysis with MongoDB Vector?Search

Databricks Virtual Event - The Lakehouse & Careers in Data+AI

Best Practices and Spark optimisation Tips for Data engineers

Bloom Filter Index in Apache Spark: Boosting Query Performance with Probabilistic Magic

Apache Spark 3.0 for Data Scientists : Best Practices

Unpacking Lazy Evaluation in Apache Spark: A Deep Dive

领英推荐

GenAI and Machine Learning

205,016 位关注者

New LLM & RAG Courses and Certifications

2024年11月14日

Optimizing AI Systems: Fintech Case Study

2024年11月5日

LLM, RAG, GPT & GenAI: Free Certifications and Courses from Leading Experts

2024年11月1日

Building a GenAI/LLM app on AWS with Anthropic Claude

2024年10月28日

AI/RAG Tutorial: Building Enterprise-Grade, Secure, Scalable Data APIs

2024年10月22日

AI, GenAI, LLM, Prompt Engineering, NLP: Review of the Ecosystem

2024年10月18日

New Book: Building Disruptive AI & LLM Technology from Scratch

2024年10月15日

Building an Enterprise-Grade Agentic RAG

2024年10月14日

Databases For AI, GenAI & RAG/LLMs: Vendor Comparison

2024年10月9日

Building a Ranking System to Enhance Prompt Results: The New PageRank for RAG/LLM

2024年10月8日

社区洞察

其他会员也浏览了

The Case for Shared Nothing

Navigating the Landscape of Vector Databases: A Comprehensive Analysis of Approaches and Capabilities

DoubleCloud’s 14th Product Update

Architecture Powering Down Stream System with CDC from HUDI Transactional Datalake

Candlestick Pattern Analysis with MongoDB Vector?Search

Databricks Virtual Event - The Lakehouse & Careers in Data+AI

Best Practices and Spark optimisation Tips for Data engineers

Bloom Filter Index in Apache Spark: Boosting Query Performance with Probabilistic Magic

Apache Spark 3.0 for Data Scientists : Best Practices

Unpacking Lazy Evaluation in Apache Spark: A Deep Dive