In the rapidly evolving AI era with large language models (LLMs) at the core, making LLMs more trustworthy and efficient, especially in output generation (inference), has gained significant attention. This is to reduce plausible but faulty LLM outputs (a.k.a hallucinations) and meet the highly increased inference demands. Our work explores such efforts and makes them transparent to the database community. Understanding these efforts is essential in harnessing LLMs in database tasks and adapting database techniques to LLMs. Furthermore, we delve into the synergy between LLMs and databases, highlighting new opportunities and challenges in their intersection.We also exploit a broader view of integrating traditional AI and machine learning (ML) with databases, enhancing database performance using AI/ML, and improving AI/ML experience using databases.“LLM Inference Systems Behave as Database Systems”OpenAI and other LLM service providers spend millions of dollars per month to serve LLM inferences on GPUs. While they process tremendous LLM requests every second, due to the limited resources of GPU and GPU memory, not all requests can be readily processed, and some of them must be evicted and restarted. Following the database principles, we model the cost of LLM inference as the cost models inside database query optimizers, analyze various scheduling policies, and solve the problem of finding the best policy as a multi-query optimization problem. This results in significant savings of GPU hours.Not only the scheduling of LLM requests, but also how we leverage the underlying hardware, CPU, GPU, and SSD, overlapping/hiding their overheads for a seamless pipelining without bubbles, is a critical challenge. We can also adapt similar ideas from the database community which have been studied for years.“New Type of Database-Aware LLM Inference”With the rise of retrieval-augmented generation (RAG), it is now popular to add examples and documents in our LLM prompts. We can do similar things in databases to harness both the information in relational databases and the world knowledge in LLMs. Semantic operators extend relational operators to filter, join, or aggregate multiple rows using natural language predicates, instead of traditional relational predicates. For example, one can join a table of animals and a table of countries, where LLMs relate which animal lives in which countries. This widens the usage of databases and trigger new applications on top of databases.Such a semantically rich queries and query plans can be automatically generated to answer complex user questions. For example, we can make multiple reasoning paths, where one path corresponds to a sequence of queries with semantic operators, and extend the most promising path as in recent reasoning LLMs. We have a strong advantage here compared to typical natural language tasks, that label annotation using query execution does not require any human labor.However, all these new queries are extremely slower than database queries we have used for decades, which we need a careful optimization and integration alongside database systems.“Vector Database to Enhance Inference”If natural language predicates are simple as finding similar items between two tables, we can instead use a more efficient vector similarity search than the expensive LLMs. This is also a standard in RAG and gaining significant attention in the database community. As current vector searches using vector indexes and databases are largely disconnected from relational database systems and facing huge overheads, especially when combining with relational operators, our goal is to seamlessly integrate vector operations inside database systems and mitigate those bottlenecks. We also leverage GPUs to enjoy their massive parallelism, with careful load balancing and memory access mechanisms.“Broader View of AI/ML for Databases”We also adapt more lightweight AI/ML models to optimize real-time workloads in databases. Besides the examples below, we have expertise in applying AI/ML to other parts of databases, such as query optimization and indexing.
Dalton: Learned Partitioning for Distributed Data Streams
2022. International Conference on Very Large Databases (VLDB 2022), Sydney, Australia, September 5-9, 2022. p. 491 – 504. DOI : 10.14778/3570690.3570699.In our previous work, we proposed a scalable yet lightweight partitioning operator for distributed data streams that leverages contextual bandits. We show that contextual bandits can be used to learn and quickly adapt partitioning policies at runtime with reduced memory requirements.
Scalable Multi-Query Execution using Reinforcement Learning
2021. ACM SIGMOD International Conference on Management of Data, Virtual, China, June 20-25, 2021. DOI : 10.1145/3448016.3452799.RouLette is a specialized intelligent engine for multi-query execution that adapts execution plans at runtime. RouLette scales by replacing sharing-aware optimization with adaptive query processing, and it chooses opportunities to explore and exploit by using reinforcement learning.