- Main
Grid Collector: Using an event catalog to speed up user analysis in distributed
environment
Abstract
Nuclear and High Energy Physics experiments such as STAR at BNL are generating millions of files with PetaBytes of data each year. In most cases, analysis programs have to read all events in a file in order to find the interesting ones. Since the interesting events may be a small fraction of events in the file, a significant portion of the computer time is wasted on reading the unwanted events. To address this issue, we developed a software system called Grid Collector. The core of Grid Collector is an Event Catalog. This catalog can be efficiently searched with compressed bitmap indices. Tests show that Grid Collector can index and search STAR event data much faster than database systems. It is fully integrated with an existing analysis framework so that aminimal effort is required to use Grid Collector. In addition, by taking advantage of existing file catalogs, Storage Resource Managers (SRMs) and GridFTP, Grid Collector automatically downloads the needed files anywhere on the Grid without user intervention. Grid Collector can significantly improve user productivity. For a user that typically performs computation on 50 percent of the events, using Grid Collector could reduce the turn around time by 30 percent. The improvement is more significant when searching for rare events, because only a small number of events with appropriate properties are read into memory and the necessary files are automatically located and down loaded through the best available route.
Main Content
Enter the password to open this PDF file:
-
-
-
-
-
-
-
-
-
-
-
-
-
-