SIGMOD/PODS is the flagship ACM conference on data management. SIGMOD 2019 was held in the lively Dutch city of Amsterdam. The venue was packed with over 1000 participants registering for the conference. Along with the research paper presentations and demonstrations, the conference had outstanding keynotes, a lively panel discussion and an excellent banquet accompanied by a canal cruise. Anastasia received the SIGMOD E.F. Codd Innovation Award. She gave an inspirational talk detailing her journey as a database researcher.
Some of the key research areas that were highlights of the conference were the following.
Modern Hardware
The DaMoN workshop was well attended and featured several excellent talks. Michaela Blott’s keynote focused on using custom architecture to achieve more scalable performance given the explosion of data and the end of Moore’s law. She discussed the implementation of a key-value store using customized network stack and system architectures and a deep learning system that uses customized compute. The workshop had two Fresh Thinking Talks. Holger Pick’s talk advocating the use of dark/dim silicon for data management. Tianzheng Wang discussed the challenges in building database systems on non-volatile memory. The main focus of the presentations in the DaMoN workshop was query processing with the help of GPUs and on the usage of non-volatile memory for data management. Georgios presented his paper on improving latency for NVM by interleaving parallel work using coroutines.
The research track on Modern hardware in the main conference had six papers. An interesting paper was based on designing tree-based indices for using RDMA. Since large datasets are stored distributed across multiple nodes, the paper distributes the index too across nodes and accesses the index efficiently using RDMA. They present different index designs and discuss trade-offs among them. The Border-Colllie paper discusses an algorithm for database logging on multicore hardware. They showed that their algorithm is wait-free and read-optimal and is applicable both for centralized and decentralized logging. The Concurrent Prefix Recovery paper shows a scalable recovery mechanism for multicore key-value stores that does not rely on write ahead log. Since write ahead logging is expensive for update intensive workload, this paper uses asynchronous incremental checkpointing and instead of maintaining each commit, only maintains the all operations up to a given time t.
BlockChains
Mohan’s keynote provided a good overview of the current state of blockchains and attempted to distinguish the facts from the hype surrounding public blockchains. His presentation was mostly focused on permissioned blockchains and the concept of smart contracts. Mohan highlighted how blockchain systems are different from databases and how techniques used in databases could be adapted to the blockchain systems. He mentioned that IBM is working on the open-source Hyperledger Fabric blockchain (https://www.hyperledger.org/projects/fabric)
There was a blockchain tutorial which was divided into two parts with one part focused on public blockchains and the other on permissioned blockchains. The tutorial explained technical details on consensus protocols in blockchains, proof of work based block creation, branching and reconciliation in blockchains, and smart contracts. The presenters also discussed off-chain solutions to alleviate some bottlenecks of blockchains and techniques for transacting across different blockchains.
Query Processing and Optimization
With two sessions in SIGMOD on query processing and optimization remains a hot topic. Since cardinality estimates are often not accurate leading to suboptimal query plans. Two paper tried to address this issue. One paper proposed pessimistic cardinality estimation to find upper bounds for cardinalities for intermediate join results. Upper bounds on cardinalities can be injected in the optimizer which helps the optimizer avoid expensive physical join plans. Another paper focused on performing exact cardinality query optimization (ECQO) at runtime. To reduce the overheads of ECQO, the author proposes techniques to prune relations that are too large to be part of an optimal plan.
One paper focused on using interpolation search instead of binary search in databases. The authors showed that interpolation search can take advantage of the modern hardware trends by performing complex calculations efficiently. They discuss three variations of interpolation search to cater to non-uniform data distribution. Another paper focused on iterative query processing for queries that are executed from imperative code. They discuss techniques to move SQL queries in and out of loops and define a cost model for such execution.
Machine Learning
Some papers at SIGMOD were aimed at improving machine learning workflow. The BlinkML paper uses sampling techniques similar to the one used in online approximate query processing to sample data in order to train models. BlinkML allows users to select the tradeoff between execution time to train models and the accuracy of prediction. BlinkML also provides quality guarantees for the trained model. The Alipine Meadow paper described an automated system for interactive machine learning. It adapts several ideas from query optimization and uses cost based pruning strategies. The authors have done a comprehensive evaluation on over 300 datasets and show that their system significantly outperforms similar systems.
A few papers also focused on using machine learning for query execution in databases. SkinnerDB uses reinforcement learning to learn optimal join orders during query execution. To learn the join order the system divides the execution into time slices, uses different join orders for different time slices and by measuring the progress speed per slice finds promising join orders. The CBTune paper discussed deep reinforcement learning techniques for finding database configuration parameters to perform end-to-end automated tuning of database systems.
Text by Bikash Chandra