In this session, we will discuss the following, with an emphasis on AI applications:
- Advanced features to optimize performance, security, and scalability.
- How to seamlessly migrate to modern databases, with a case study: MariaDB
- A live demonstration of using modern database solutions to unlock these advanced capabilities.
Today, it is possible to switch to a different platform, keep your data and slow queries unchanged, even if written in traditional SQL or dealing with JSON.
Quick tips to increase performance
- Switch to different architecture with better query engine, for instance from JSON or SQL to vector DB. The new engine may also optimize configuration parameters.
- Efficiently encode your fields, with minimum or no loss, especially for long text elements. This is done automatically when switching to a high-performance database.
- Eliminate features or rows that are never used. Work with smaller vectors.
- Leverage the cloud, distributed architecture, and GPU.
- Optimize queries to avoid expensive operations. This can be done automatically with AI, transparently to the user. For instance, when switching to this platform.
- Use cache for common queries or rows/columns most frequently accessed.
- Load parts of the database in memory and perform in-memory queries. That's how I get queries running at least 100 times faster in my LLM app, compared to vendors.
- Use techniques such as approximate nearest neighbor search for faster retrieval, especially in RAG apps. This is done automatically when switching to a high-performance platform.
Some common types of databases
- Vector and graph databases are among the most popular these days, especially for GenAI and LLM apps. Most can also handle tasks performed by traditional databases and understand SQL and other languages (NoSQL, NewSQL). Some are optimized for fast search and real time. See here for one of the most efficient and versatile.
- In vector DBs, features (the columns in a tabular dataset) are processed jointly and encoded, rather than column by column. Graph DBs store information as nodes and node connections. For instance, knowledge graphs and taxonomies with related categories and sub-categories. JSON and bubble databases deal with unstructured data such as text and web content. In my case, I use key-value schemas, also known as hash tables or dictionaries in Python.
- Some DBs are column-oriented while the standard is based on rows. Some fit in memory: they are called in-memory databases, achieving faster execution. Another way to increase performance is via distributed architecture, for instance Hadoop.
- In object-oriented databases, data is stored as objects, similar to object-oriented programming languages. It allows for direct mapping of objects in code to objects in the database.
- Hierarchical databases are good at representing tree structures, a special kind of graph. Network databases go one step further, allowing more complex relationships than hierarchical databases, in particular multiple parent-child relationships.
- For special needs, consider time series, geospatial and multimodel databases (not to be confused with multimodal). Multimodel DBs support multiple data models (document, graph, key-value) within a single engine. Image and soundtrack repositories can also be organized as databases.
This hands-on workshop is for developers and AI professionals, featuring state-of-the-art technology, case studies, code-share, and live demos. Recording and GitHub material will be available to registrants who cannot attend the free 60-min session.
Seeking PFE Internship | Data Analyst | Aspiring Data Engineer | Expertise in Data Warehousing, Cloud Platforms (AWS, Azure), Python, SQL, and Data Visualization (Power BI).
1 个月Invaluable insights, Mr Vincent Granville! The emphasis on modern databases, particularly vector and graph DBs, resonates strongly with the AI and data-driven workflows I’ve been involved in. The move from traditional SQL or JSON to vector databases, especially in the context of LLM apps and RAG, is transformative for optimizing speed and accuracy in real-time applications. Your point on leveraging in-memory databases for increased query efficiency mirrors some of the breakthroughs we’ve seen with cloud-based architectures. I’m especially interested in the live demo and learning how to seamlessly migrate to MariaDB while preserving query integrity. Thanks for the opportunity to explore cutting-edge database solutions!
Looking for new opportunities.
2 个月It is important to understand the type of queries needed. for example, i do a lot of work with arrays of floats, and generally there is no need to query within that array. So I encode the array and store as text. I transfer to the front end using the encoded text and decode at the front end. This gives amazing performance and low storage. I also do further compaction by noting repeated values and storing a value and the number of repeats. So PostGreSQL is ideal for this kind of work, no need to go to the latest toys. We can use the same ideas to store segments of a time series. For example, each day of 10 minute data is stored in a separate encoded text. They can be put together at the front end. I am staggered how often I find float arrays stored in json strings or files which have huge overheads and bring down distributed systems. Worse you see garbage like [{time=0.4, value = 0.98765}{.....}]. What were they thinking?
Start-Up Spirit & Mindset | Microsoft Azure OpenAI | Power Platform | Azure IoT | Edge | TinyML | DS | Data Analytics | Computer Vision | ML | Deep Learning | NLP | RPA | QA Automation | Gen AI | AI Multi-Agents | AiBots
2 个月Vincent Granville I like the article content and the title. ?? Thanks for sharing!!.