Accurate modeling of cache replacement policies in a Data-Grid.
Caching techniques have been used to improve the performance gap of storage hierarchies in computing systems. In data intensive applications that access large data files over wide area network environment, such as a data grid,caching mechanism can significantly improve the data access performance under appropriate workloads. In a data grid, it is envisioned that local disk storage resources retain or cache the data files being used by local application. Under a workload of shared access and high locality of reference, the performance of the caching techniques depends heavily on the replacement policies being used. A replacement policy effectively determines which set of objects must be evicted when space is needed. Unlike cache replacement policies in virtual memory paging or database buffering, developing an optimal replacement policy for data grids is complicated by the fact that the file objects being cached have varying sizes and varying transfer and processing costs that vary with time. We present an accurate model for evaluating various replacement policies and propose a new replacement algorithm referred to as "Least Cost Beneficial based on K backward references (LCB-K)." Using this modeling technique, we compare LCB-K with various replacement policies such as Least Frequently Used (LFU), Least Recently Used (LRU), Greedy DualSize (GDS), etc., using synthetic and actual workload of accesses to and from tertiary storage systems. The results obtained show that (LCB-K) and (GDS) are the most cost effective cache replacement policies for storage resource management in data grids.