The past decade has witnessed the proliferation of new ways to ingest, store, index, and query data. This explosion was driven by the needs of the modern applications, including social media, popular web services, and IoT sensors, characterized by high volumes and a rapid rate of incoming data. To cope with such high arrival rates, modern systems rely on the Log-Structured Merge Tree (LSM) storage model that uses sequential I/O instead of in-place updates. In addition to handling incoming data, LSM-based systems should provide useful analytics for their users, which in turn requires accurate statistics.
The first contribution of this thesis is developing a lightweight approach to collect and maintain concise statistical representations of the data in LSM-based systems so that it can be later used to drive cost-based optimizer decisions. In particular, we consider two problems, collecting statistics on indexed columns (which orders the stream of records based on the index key), as well as on non-indexed (unordered) columns. For each case, appropriate statistical summaries are considered so that the overall overhead on the system’s critical path remains low, thus not affecting the ingestion process.
Recent hardware trends such as growing main memory capacity, stagnating CPU speeds, and increasing usage of parallel architectures have also influenced the design of data processing systems. These advances in hardware allow analyzing large datasets entirely in memory, which requires
specialized algorithms to process memory-resident data. Apart from the algorithms implemented on traditional CPU architectures, implementations leveraging fine-grained hardware multithreading on FPGAs have been recently proposed. However, up to this moment, there has been a lack of studies directly comparing the performance of these two approaches.
The second part of the thesis is devoted to a comparative study of common analytical operations (joins, aggregations, selections) for in-memory workloads using both multicore CPU and hardware multithreading approaches. We present a thorough experimental evaluation and show that implementations that use hardware multithreading outperform state-of-the-art CPU-based algorithms in terms of raw throughput in many cases while achieving much better memory bandwidth utilization.