UC San Diego
Opportunistically Reconstructing a Network's Failure History /
- Author(s): Turner, Daniel Joseph
- et al.
Of the major factors affecting end-to-end service availability, network component failure is perhaps the least well understood. How often do failures occur, how long do they last, what are their causes, and how do they impact customers? Traditionally, answering questions such as these has required dedicated (and often expensive) instrumentation broadly deployed across a network which is rarely available in practice. This dissertation demonstrates that a combination of common data sources can be substituted instead. In particular, opportunistically stitching together data from router configuration files, syslog messages, and trouble tickets allows for the reconstruction of an accurate picture of historical network failures. We support our claim through a detailed evaluation of the fidelity of this approach, by comparing with high-quality "ground truth" data derived from an analysis of contemporaneous IS-IS routing protocol messages. In doing so we highlight areas of agreement and disparity between these data sources, as well as potential ways to correct disparities when possible