Skip to main content
eScholarship
Open Access Publications from the University of California

DART

Published Web Location

https://sdm.lbl.gov/pdc/pubs/201811_PACT_final_DART_Wei.pdf
No data is associated with this publication.
Abstract

Affix-based search is a fundamental functionality for storage systems. It allows users to find desired datasets, where attributes of a dataset match an affix. While building inverted index to facilitate efficient affix based keyword search is a common practice for standalone databases and for desktop file systems, building local indexes or adopting indexing techniques used in a standalone data store is insufficient for highperformance computing (HPC) systems due to the massive amount of data and distributed nature of the storage devices within a system. In this paper, we propose Distributed Adaptive Radix Tree (DART), to address the challenge of distributed affix-based keyword search on HPC systems. This trie-based approach is scalable in achieving efficient affix-based search and alleviating imbalanced keyword distribution and excessive requests on keywords at scale. Our evaluation at different scales shows that, comparing with the "full string hashing" use case of the most popular distributed indexing technique - Distributed Hash Table (DHT), DART achieves up to 55× better throughput with prefix search and with suffix search, while achieving comparable throughput with exact and infix searches. Also, comparing to the "initial hashing" use case of DHT, DART maintains a balanced keyword distribution on distributed nodes and alleviates excessive query workload against popular keywords.

Item not freely available? Link broken?
Report a problem accessing this item