In-place reconstruction of version differences

In-place reconstruction of differenced data allows information on devices with limited storage capacity to be updated efficiently over low-bandwidth channels. Differencing encodes a version of data compactly as a set of changes from a previous version. Transmitting updates to data as a version difference saves both time and bandwidth. In-place reconstruction rebuilds the new version of the data in the storage or memory the current version occupies-no scratch space is needed for a second version. By combining these technologies, we support highly mobile applications on space-constrained hardware. We present an algorithm that modifies a differentially encoded version to be in-place reconstructible. The algorithm trades a small amount of compression to achieve this property. Our treatment includes experimental results that show our implementation to be efficient in space and time and verify that compression losses are small. Also, we give results on the computational complexity of performing this modification while minimizing lost compression.


Introduction
We develop a system for data distribution and version management to be used in highly mobile and resourcelimited computers operating on low-bandwidth networks. The system combines differencing with a technology called in-place reconstruction. Differencing encodes a file compactly as a set of changes from a previous version. The system sends the difference encoding to a target computer in order to update the file, saving bandwidth and transfer time when compared with transmitting the whole file. In-place reconstruction brings the benefits of differencing to the computers that need it the most-resource-constrained devices such as wireless handhelds and celluar phones.
Differencing has been widely used to reduce latency and lower bandwidth requirements in distributed systems. The original applications of differencing focused on reducing the storage required to maintain sequences of versions. Examples include source code control systems [24], [29], [18], editors [9], and databases [25]. In the last decade, researchers have realized that these algorithms compress data quickly and can be used to reduce bandwidth requirements and transfer time for applications that exchange data across networks. Examples unclude backup and restore [4], database consistency [6], and Internet protocols [2], [21], [5].
For completeness, we often group delta compression [11], [12], [5] with differencing [1], [18]. Delta compression is a generalization of differencing and data compression [31], in that a version of a file may be compressed with respect to matching strings from within the file being encoded, as well as from the other version. Although the results of this paper concern differential compression, our methods apply to delta encoding as well.
To date, differencing has not been employed effectively for resource-constrained mobile and wireless devices. While the problme space is ideal, it has not been used because reconstructing a differential encoding requires storage space (disk or memory) to manifest a new version of data while keeping the old version as a reference. This problem is particularly acute for mass-produced devices that use expensive nonvolatile memories, such as personal digital assistants, wireless handhelds, and celluar phones. For these devices, it is important to keep manufacturing costs low. Therefore, it is not viable to add storage to a device solely for the purpose of differencing.
In place reconstruction makes differential compression available to resource-constrained devices on any network. Mobile and wireless networks are the most natural and interesting application. In-place reconstruction allows a version to be updated by a differential encoding in the memory or storage that it currently occupies; reconstruction does not need additional scratch space for a second copy. An in-place reconstructible differential encoding is a permutation and modification of the original encoding. This conversion comes with a small compression penalty. In-place reconstruction brings the latency and bandwidth benefits of differencing to the space-constrained, mass-produced devices that need them the most. The combination of differencing and inplace reconstruction keeps the cost of manufacturing mobile devices low by reducing the demand on networking and storage hardware.
For one example application, we choose updating/patching the operating system of phones in a celluar network. Currently, the software and firmware in celluar phones remains the same over the life of the phone, or at least until the customer brings a phone in for service. Suppose that the authentication mechanism in the phone was compromised-perhaps the crypto was broken [15] or more likely keys were revealed [27]. In either case, updating software becomes essential for correct operation of the system. In particular, without trustworthy authentication, billing cannot be performed reliably. Using in-place reconstruction, the system patches the software quickly over the celluar network. The update degrades performance minimally by making the update size as small as possible. This example fits our system model well. Phones are mass-produced and, therefore, resource-constrained in order to keep manufacturing costs low. Also, celluar networks are low-bandwidth and celluar devices compete heavily for bandwidth. In-place reconstruction makes these devices manageable over networks, instead of immutable.
For another example, we choose a distributed inventory management system based on mobile-handheld devices. Many limited-capacity devices track quantities throughout an enterprise. To reduce latency, these devices cache portions of the database for read-only and update queries. Each device maintains a radio link to update its cache and runs a consistency protocol. In-place reconstruction allows the devices to keep their copies of data consistent using differencing without requiring scratch space, thereby increasing the cache utilization at target devices. We observe that in-place reconstruction applies to both structured data (databases) and unstructured data (files) because they manipulate a differential encoding, as opposed to the original data. Algorithms for differencing structured data [6] employ encodings that are suitable for in-place techniques.
Any application that has multiple resource-constrained computers sharing data interactively will want to use this technology and, in particular, applications that involve computer-human workflows using celluar or radiofrequency devices. Examples include security and law enforcement, property managements, airport services, health care, and shipping/delivery.

Differencing and In-Place Reconstruction
We modify a differentially encoded file so that it is suitable for reconstructing the new version of the file inplace. A difference file encodes a sequence of instructions, or commands, for a coputer to materialize a new file version in the presence of a reference version, the old version of the file. When rebuilding a version encoded by a difference file, data are both copied from the reference version to the new version and added explicity when portions of the new version do not appear in the reference version.
If we were to attempt naively to reconstruct an arbitrary difference file in-place, the resulting output would often be corrupt. This occurs when the encoding instructs the computer to copy data from a file region where new file data has already been written. The data the algorithm reads have been altered and the algorithm rebuilds an incorrect file.
We present a graph-theoretic algorithm for modifying difference files that detects situations where an ecndoing attempts to read from an already written region an dpermutes the order that the algorithm applies commands in a difference file to reduce the occurrence of such conflicts. The algorithm elimindates any remaining conflicts by removing commands that copy data and adding these data to the encoding explicitly. Eliminating data copied between versions increases the side of the encoding but allows the algorithm to output an in-place reconstructible difference file.
Experimental results verify the viability and efficiency of modifying difference files for in-place reconstruction. Our findings indicate that our algorithms exchange a small amount of compression for in-place reconstructibility.
Experiments also reveal an interesting property of these algorithms not expressed by algorithmic analysis. We show in-place reconstruction algorithms to be I/O bound. In practice, the most important performance factor is the output size of the encoding. Two heuristics for eliminating data conflicts were studied in our experiments, and they show that the heuristic that loses less compression is superior to the more time-efficient heuristic that loses more compression.
The graphs constructed by our algorithm form an apparently new class of directed graphs, which we call CRWI (conflicting read-write interval) digraphs. Our modification algorithm is not guaranteed to minimize the amount of lost compression, but we do not expect an efficient algorithm to have this property because we show that minimizing the lost compression is an NP-hard problem. We also consider the complexity of finding an optimally compact, in-place reconstructible difference "from scratch," i.e., directly from a reference file and a version file. We show that this problem is NP-hard. In contrast, without the requirement of in-place reconstructibility, an optimally compact difference file can be found in polynomial time [28], [20], [23].

Related Work
Encoding versions compactly by detecting altered regions of data is a well-known problem. The first applications of differential compression found changed lines in text data for analyzing the recent modifications to files [13]. Considering data as lines of text fails to encode minimum sized difference, as it does not examine data at a fine granularity and finds only matching data that are aligned at the beginning of a new line.
The problem of representing the changes between versions of data was formalized as string-to-string correction with block move [28]-detecting maximally matching regions of a file at an arbitrarily fine granularity without alignment. However, differencing continued to rely on the alignment of data, as in database records [25], and the grouping of data into block or line granules, as in source code control systems [24,29], to simplify the combinatorial task of finding the common and different strings between versions.
Efforts to generalize delta compression to unaligned data and to minimize the granularity of the smallest change resulted in algorithms for compressing data at the granularity of a byte. Early algorithms were based upon either dynamic programming [20] or the greedy method [23] and performed this task using time quadratic in the length of the input files.
Differential compression algorithms were improved to run in linear time and linear space. Algorithms with these properties have been derived from suffix trees [30,19,17].. Like algorithms based on greedy methods and dynamic programming, these algorithms generate optimally compact delta encodings [28].
Delta compression is a more general form of differencing. It includes the concept of finding matching data within the file being encoded as well as comparing that file to other similar files [11], [12], [5]. Delta compression runs in linear time. Related to delta compression is a coding technique that unifies differential and general compression [16].
Recent advances produced differencing algorithms that run in linear time and constant space [1]. These algorithms trade a small amount of compression in order to improve performance.
Any of the linear runtime algorithms allow differencing to scale to large inputs without known structure and permit the application of differential compression to data management systems. These include binary source code control [18] and backup and restore restore [4].
Applications distributing HTTP objects using delta files have emerged [21,2]. They permit Web servers to both reduce the amount of data transmitted to a client and reduce the latency associated with loading Web pages. Efforts to standardize delta files as part of the HTTP protocol and the trend toward making small network devices HTTP compliant indicate the need to distribute data to network devices efficiently.

Encoding Delta Files
Differencing algorithms encode the changes between two file versions compactly by finding strings common to both versions. We term the first file a version file that contains the data to be encoded and the second a reference  Figure 1: Encoding difference files. Common strings are encoded as copy commands 〈 f , t , l 〉 and new strings in the new file are encoded as add commands 〈t , l 〉 followed by the string of length l of added data.

Reference File Version File
file to which the version file is compared. Differencing algorithms encode a file by partitioning the data in the version file into strings that are encoded using copies from the reference file and strings that are added explicitly to the version file ( Figure 1). Having partitioned the version file, the algorithm outputs a difference that encodes this version. This encoding consists of an ordered sequence of copy commands and add commands. An add command is an ordered pair, 〈t , l 〉, where t (to) encodes the string offset in the file version and l (length) encodes the length of the string. The l bytes of data to be added follow the command. A copy command is an ordered triple, 〈 f , t , l 〉 where f (from) encodes the offset in the reference file from which data are copied, t encodes the offset in the new file where the data are to be written, and l encodes that length of the data to be copied. The copy command moves the string data in the interval [ f , f + l − 1] in the reference file to the interval [t , t + l − 1] in the version file.
In the presence of the reference file, a difference file rebuilds the version file with add and copy commands. The intervals in the version file encoded by these commands are disjoint. Therefore, any permutation of the command execution order materializes the same output version file.

In-Place Modification Algorithms
An in-place modification algorithm changes an existing difference file into a difference file that reconstructs correctly a new file version in the space the current version occupies. At a high level, our technique examines the input difference file to find copy commands that conflict, in which one command reads data from the write interval (file address range to which the command writes data) of other copy commands. It topologically sorts the digraph to produce an ordering on copy commands that reduces conflicts. It eliminates the remaining conflicts by converting copy commands to add commands. The algorithm outputs the permuted and converted commands as an in-place reconstructible difference. Actually, as described in more detail below, the algorithm performs permutation and conversion of commands concurrently.

Conflict Detection
Since we reconstruct files in-place, we concern ourselves with ordering commands that attempt to read a region to which another command writes. For this, we adopt the term write before read (WR) conflict [3]. For copy In other words, copy command i and j conflict if i writes to the interval from which j reads data. By denoting, for each copy command 〈 f k , t k , l k 〉, the command's read interval as Read k = [ f k , f k + l k − 1] and its write interval as Write k = [t k , t k +l k −1], we write the condition (1) for a WR conflict as Write i ∩Read j ̸ = . In Figure 2, coammnds C1 and C2 executed in that order generate a (blacked area) that corrupts data were the file reconstructed in place. This definition considers only WR conflicts between copy commands and neglects add commands. Add commands write data to the version file; they do not read data from the reference file. Consequently, an algorithm avoids all potential WR conflicts from adding data by placing add commands at the end of an encoding. In this way, the algorithm completes all reads from copy commands before executing the first add command.
Additionally, we define WR conflicts so that a copy command cannot conflict with itself, even though a single copy command's read and write intervals intersect sometimes and would seem to cause a conflict. We deal with read and write intervals that overlap by performing the copy in a left-to-right or right-to-left manner. For command 〈 f , t , l 〉, if f ≥ t , we copy the string byte by byte starting at the left-hand side when reconstructing a file. Since, the f (from) offset always exceeds the t (to) offset in the new file, a left-to-right copy never reads a byte overwritten by a previous byte in the string. When f < t , a symmetric argument shows that we should start our copy at the right-hand edge of the string and work backward. For this example, we performed the copies in a byte-wise fashion. However, the notion of a left-to-right or right-to-left copy applies to moving a read/write buffer of any size.
A difference file suitable for in-place reconstruction obeys the property indicating the absence of WR conflicts. Equivalently, it guarantees that a copy command reads and transfers data from the original file.

CRWI Digraphs
To find a permutation that reduces WR conflicts, we represent potential conflicts between the copy commands in a digraph and topologically sort this digraph. A topological sort on digraph G = (V, E ) produces a linear order on all vertices so that if G contains edge → uv, then vertex u precedes vertex v in topological order. Our technique constructs a digraph so that each copy command in the difference file has a corresponding vertex in the digraph. On this set of vertices, we construct an edge relation with a directed edge → uv from vertex u to vertex v when copy command u's read interval intersects copy command v's write interval. Edge → uv indicates that by performing command u before command v, the difference file avoids a WR conflict. We call a digraph obtained from a delta file in this way a conflicting read write interval (CRWI) digraph. A topologically sorted version of this graph adheres to the requirement for in-place reconstruction (Equation 2). To the best of our knowledge, the class of CRWI digraphs has not been defined previously. While we know little about its structure, it is clearly smaller than the class of all digraphs. For example, the CRWI class does not include any complete digraphs with more than two vertices.

Strategies for Breaking Cycles
As total topological orderings are possible only on acyclic digraphs and CRWI digraphs may contain cycles, we enhance a standard topological sort to break cycles and output a total topological order on a subgraph. A depth-first search implementation of topological sort [7] is modified to detect cycles. Upon detecting a cycle, our modified sort breaks the cycle by removing a vertex. The sort outputs a digraph containing a subset of all vertices in topological order and a set of vertices that were removed. The algorithm re-encodes the data contained in the copy commands of the removed vertices as add commands in the output.
We define the amount of compression lost upon deleting a vertex to be the cost of deletion. Based on this cost function, we formulate the optimization problem of finding the minimum cost set of vertices to delete to make a digraph acyclic. Replacing a copy command (〈 f , t , l 〉) with an add command (〈t , l 〉) increases the encoding size by l − ∥ f ∥, where ∥ f ∥ is the size of the encoding of offset f . Thus, the vertex that corresponds to the copy command When turning a digraph into an acyclic digraph by deleting vertices, an in-place conversion algorithm could minimize the amount of compression lost by selecting a set of vertices with the smallest total cost. This problem, called the FEEDBACK VERTEX SET problem, was shown by Karp [14] to be NP-hard for general digraphs. In Section 8 we show that it remains NP-hard even when restricted to CRWI digraphs. Thus, we do not expect an efficient algorithm to minimize the cost in general. In our implementation, we examine two efficient, but not optimal, policies for breaking cycles. The constant-time policy picks the "easiest" vertex to remove, based on the execution order of the topological sort, and deletes this vertex. This policy performs no extra work when breaking cycles. The locally-minimum policy detects a cycle and loops through all vertices in the cycle to determine and then delete the minimum cost vertex. The local-minimum policy may perform as much additional work as the total length of cycles found by the algorithm. Although these policies perform well in our experiments, we note in Section 4.7 that they do not guarantee that the total cost of deletion is within a constant factor of the optimum.

Generating Conflict Free Permutations
Our algorithm for converting difference files into in-place reconstructible difference files takes the following steps to find and eliminate WR conflicts between a reference file and a version file. Algorithm 1. Given an input difference file, we partition the commands in the file into a set C of copy commands and a set A of add commands.
2. Sort the copy commands by increasing write offset, C sorted = {c 1 , c 2 , ..., c n }. For c i and c j , this set obeys: Sorting the copy commands allows us to perform binary search when looking for a copy command at a given write offset.
3. Construct a digraph from the copy commands. For the copy commands c 1 , c 2 , ..., c n , we create a vertex set V = {v 1 , v 2 , ..., v n }. Build the edge set E by adding an edge from vertex v i to vertex v j when copy command c i reads from the interval to which c j writes: 4. Perform a topological sort on the vertices of the digraph. This sort also detects cycles in the digraph and breaks them. When breaking a cycle, select one vertex on the cycle, using either the local-minimum or constant-time policy and remove it. Replace the data encoded in its copy command with an equivalent add command, which is put into set A.

5.
Output the remaining copy commands to the difference file in toplogoically sorted order.
6. Output all add commands in the set A to the difference file.
The resulting difference file reconstructs the new version out of order, both out of write order in the version file and out of the order that the commands appeared in the original delta file.
For completeness, we give a brief description of how a standard depth-first search (DFS) algorithm was modified to perform Step 4 in our implementation, as these details affect both the results of our experiments and the asymptotic worst-case time bounds. As described, the algorithm outputs the unremoved copy commands in reverse topologically sorted order; to output them in topologically sorted order simply reverse the edge relation. A DFS algorithm outputs the unremoved copy commands in reverse topologically sorted order. A topological order is achieved by reversing the output of the DFS algorithm. A DFS algorithm uses a stack to visit the vertices of a digraph in a certain order. The algorithm marks each vertex either unvisited, on-stack, or finished. Initially, every vertex is marked unvisited. Until no more unvisited vertices exist, the algorithm chooses a unvisited vertex u and calls VISIT(u). The procedure VISIT(u) marks u as on-stack, pushes u on the stack, and examines each vertex w which there is an edge → uw in the graph. For each such w: (1) if w is marked finished, then w is not processed further; (2) if w is marked unvisited, then VISIT(w) is performed; (3) if w is marked on-stack, then the vertices between u and w on the stack form a directed cycle, which must be broken. For the constant-time policy, u is popped from the stack and removed from the graph. Letting p denote the new top of the stack, the execution of VISIT(p) continues as though u were marked finished. For the local-minimum policy, the algorithm loops through all vertices on the cycle to find one of minimum cost, that is, one whose removal causes the smallest increase in the size of the difference file; call this vertex r . Vertices r through u are popped from the stack and marked unvisited, except r which is removed. If there is a vertex p on the top of the stack, then the execution of VISIT(p) continues as though r were marked finished. Recall that we are describing an execution of VISIT(u) by examining all w such that there is an edge → uw. After all such w have been examined, u is marked finished, u is popped from the stack, and the copy command corresponding to vertex u is written in reverse sorted order. Using the constant-time policy, this procedure has the same running time as DFS, namely, O(|V | + |E |). Using the local-minimum policy, when the algorithm removes a vertex, it retains some of the work (marking) that the DFS has done. However, in the worst case, the entire stack pops after each vertex removal, causing running time proportional to |V | 2 (While we can construct examples where the time is proportional to |V | 2 , we do not observe this worst-case behavior in our experiments.)

Algorithmic Performance
Suppose that the algorithm is given a difference file consisting of a set C of copy commands and a set A of add commands. The presented algorithm uses time O(|C | log |C |) both for sorting the copy commands by write order and for finding conflicting commands, using binary search on the sorted write intervals for the |V | vertices in V -recall that |V | = |C |. Additionally, the algorithm separates and outputs add Letting n denote the total number of commands in the difference file, the graph contains as many vertices as copy commands. Therefore, |V | = |C | = O(n). The same is true of add commands, |A| = O(n). However, we have no bound for the number of edges, except the trivial bound O(|V | 2 ) for general digraphs. (In Section 4.6, we demonstrate by example that our algorithm can generate a digraph having a number of edges meeting this bound.) On the other hand, we also show that the number of edges in digraphs generated by our algorithm is linear in the length of the version file V that the delta file encodes (Lemma 1). We denote the length of V by L V .
After substituting these bounds on |E | into the performance expressions, for an input difference file containing n commands encoding a version file of length L V , the worst-case running time of our algorithm is O(n log n + min(L V , n 2 )) using the constant-time policy and O(n 2 ) using the locall-minimum policy. In either case, the space is O(n + min(L V , n 2 )).

Bounding the Size of the Digraph
The performance of digraph construction, topological sorting, and cycle breaking depends upon the number of edges in the digraphs our algorithm constructs. We asserted previously (Section 4.5) that the number of edges in a CRWI digraph constructed grows quadratically with the number of copy commands and is bounded above by the length of the version file. We now verify these assertions.
No digraph has more than O(|V | 2 ) edges. To establish that this bound is tight for CRWI digraphs, we show an example of a difference file whose CRWI digraph realizes this bound. Consider a version file of length L that is broken up into blocks of length L (Figure 3). There are L such blocks, b 1 , b 2 , ..., b L . Assume that all blocks excluding the first block in the version file, b 2 , b 2 , ..., b L , are all copies of the first block in the reference file.
Also, the first block in the version file consists of L copies of length 1 from any location in the reference file. A difference file for this reference and version file consists of L "short" copy commands, each of length 1, and L − 1 "long" copy commands, each of length L. Since each short command writes into each long command's read interval, a CRWI digraph for this difference file has an edge from every vertex representing a long command to every vertex representing a short command. This digraph has L − 1 vertices each with out-degree L for total edges in Ω(L) = Ω(|C | 2 ).
The Ω(L) bound also turns out to be the maximum possible number of edges.
Lemma 1 For a difference file that encodes a version file V of length L V , the number of edges in the digraph representing potential WR conflicts at most L V .
Proof. The CRWI digraph has an edge representing a potential WR conflict from copy command i to copy command j when The copy command i has a read interval of length l i . Recalling that the write intervals of all copy commands are disjoint, there are at most l i edges directed out of copy command i -this occurs when the region [ in the version file is encoded by l i copy commands of length 1. We also know that, for any encoding, the sum of the lengths of all read intervals is less than or equal to L V . As all read intervals sum to ≤ L V , and no read interval generates more out-edges than its length, the number of edges in the digraph from a difference file encoding V is less than or equal to L V . ■ If each copy command in the delta file encodes a string of length at least ℓ, then a similar proof shows that there are at most L V /ℓ edges.
Bounding the number of edges in CRWI digraphs, we verify the performance bounds presented in Section 4.5.

Nonoptimality of the Local-Minimum Policy
An adversarial example shows that the the cost of a solution (a set of deleted vertices) found using the localminimum policy is not bounded above by any constant times the optimal cost. Consider the digraph of Figure 4; Lemma 2 in Section 7 shows that this is a CRWI digraph. The local-minimum policy for breaking cycles looks at the k cycles (v 0 , . . . , v i , v 0 ) for i = 1, 2, . . . , k. For each cycle, it chooses to delete the minimum cost vertexvertex v i with cost C . As a result, the algorithm deletes vertices v 1 , v 2 , . . . , v k , incurring total cost kC . However, deleting vertex v 0 , at cost C + 1, is the globally optimal solution. If we further assume that the original difference file contains only the 2k −1 copy commands in Figure 4 and that the size of each copy command is c, then the size of the difference file generated by the local-minimum solution is (2k − 1)c + kC , the size of the optimal difference file is (2k − 1)c + C + 1, and the ratio of these two sizes approaches 1 + C /(2c) for large k. As C /c can be arbitrarily large, this ratio is not bounded by a constant.
The merit of the local-minimum solution, as compared to breaking cycles in constant time, is difficult to determine. On difference files whose digraphs have sparse edge relations, cycles are infrequent and looping through cycles saves compression at little cost. However, worst-case analysis indicates no preference for the local-minimum solution when compared to the constant-time policy. This motivates a performance investigation of the runtime and compression associated with these two policies (Section 5).

Experimental Results
As in-place reconstruction is used for distributing data to mobile and resource-limited devices, we extracted a large body of experimental data that consists of versions of software intended for handhelds and personal digital assistants. Files include applications, boot loaders, and operating system components. In-place differencing was measured against these data with the goals of: • determining the compression loss due to making difference files in-place reconstructible, • comparing the constant-time and local-minimum policies for breaking cycles, • showing in-place conversion algorithms to be efficient when compared with differencing algorithms, and • characterizing the graphs created by the algorithm.
In all cases, we obtained the original difference files using the correcting 1.5-pass differential compression algorithm [1].   The experimental data we collected and employed are characteristic of the intended application. Because our interest lies in distributing files to resource-limited devices, we collected versions of open-source software intended for the Compaq iPAQ handheld device, a personal digital assistant that can run versions of the Linux operating system. Data were obtained in April 2002 from www.handhelds.org, a Web site designed to facilitate the "creation of open source software for use on handheld and wearable computers." To collect data, we downloaded the software archive and ran scripts that search the archive for multiple versions of the same files. The original and processed data are available from the Hopkins Storage Systems lab at http://hssl.cs.jhu.edu/ipdata/. All experimental data are files that are distributed to handheld devices: boot loaders, applications, flash updates, and their associated data files. We did not include source code or other data not intended for distribution to handhelds.
We categorize the difference files in our experiments into three groups that describe what operations were required to make files in-place reconstructible. Experiments were conducted on 1,959 files files totaling more than 87.4 Megabytes -an average file size of approximately 44 kilobytes. Of these files ( Figure 5), 33 percent of the files contained cycles that needed to be broken. Sixty-five percent did not have cycles, but needed to have copy commands reordered. The remaining two percent of files were trivially in-place reconstructible; i.e., none of the copy commands conflicted. For trivial files, performing copies before adds creates an in-place difference.
The amount of data in files is distributed differently across the three categories than are the file counts. The distribution of files and data across the three categories confirms that efficient algorithms for cycle breaking and command reordering are needed to deliver differentially compressed data in-place. While most difference files do not contain cycles, those that do have cycles contain the majority of the data.  We group compression results into the same categories. Figure 6(a) shows compression (size of difference files as a fraction of the original file size) and Figure 6(b) shows the totail size of the difference fiels. For each category and for all files, we report data for three algorithms, all of which are derived from the correcting 1.5-pass differencing algorithm (HPDelta) [1]. These algorithms are: the correcting 1.5-pass differencing algorithm modified so that codewords are in-place reconstructible (IP-HPDelta), the in-place modification algorithm using the local-minimum cycle breaking policy (IP-LMin), and the in-place modification algorithm using the constant-time cycle breaking policy (IP-Const). The HPDelta algorithm is a linear-time, constant-space algorithm for generating differentially compressed files.
The IP-HPDelta algorithm is a modification of HPDelta to output codewords that are suitable for in-place reconstruction. Throughout this paper, we have described add commands 〈t , l 〉 and copy commands 〈 f , t , l 〉, where both commands encode explicitly the "to" t or write offset in the version file. However, differencing algorithms, such as HPDelta, reconstruct data in write order and do not encode a write offset-an add command can simply be 〈l 〉 and a copy command 〈 f , l 〉. Since commands are applied in write order, the end offset of the previous command implies the write offset of the current command implicitly. The codewords of IP-HPDelta are modified to make the write offset explicit, allowing our algorithm to reorder commands. This extra field in each codeword introduces a per-command overhead in a difference file. The amount of compression loss varies, depending upon the number of commands and the original size of the difference file. Overhead in these experiments ran to more than 4.4 percent-which corresponds to putput delta files that are 16 percent larger than with HPDelta. The codewords used in these experiements are not well tuned for in-place reconstruction, spending 4 bytes per codeword to describe a write offset. In the future, in-place differencing will require the careful codeword design that has been done for delta compression [16]to minimize these losses. For now our experiemnts focus on compression loss from cycle breaking, i.e., compression loss attributable to in-place algorithms.  From the IP-HPDelta algorithm, we derive the IP-Const and IP-LMin algorithms. They run the IP-HPDelta algorithm to generate a difference file and then permute and modify the commands according to our techniques. The IP-Const algorithm implements the comstant-time policy and the IP-LMin algorithm implements the localminimum policy.
Experimental results indicate the amount of compression lost due to in-place reconstruction. Over all files, IP-HPDelta compresses data to 31.1 percent their original size (Figure 6a). This number does not include data compression, which be performed after the difference is taken. Compared to IP-HPDelta, IP-Const output is 3.6 percent larger, 28.10 MB as compared to 27.14 MB. The loss is attributed to breaking cycles. In contrast, IP-LMin generates output only 0.5 percent larger, 27.26 MB versus 27.14. The local-minimum policy performs excellently in practice-compression losses are one seventh that of the constant-time policy.
Because files with cycles contain the majority of the data (Figure 7b), the results for files with cycles dominate the results for all files. In reorder and trivially in-place difference files, no cycles are present and no compression is lost. The class of files that are trivially in-place are incompressible using differencing. This class is dominated by few large files with little similarity between versions.
In-place algorithms incur execution time overheads when performing additional I/O and when permuting the commands in a difference file. An in-place algorithm generate a difference file and then modifies the file to have the in-place property. In-place algorithms create an intermediate file that contains the output of the differential compression algorithm. This intermediate output serves as the input for the algorithm that modifies/permutes commands. We present execution-time results in Figure 7(a) for both in-place algorithms -IP-Const and IP-LMin. Figure 7b includes 95 percent confidence intervals, which are barely discernible. IP-LMin and IP-Const perform all of the steps of the base algorithm (IP-HPDelta) before manipulating the intermediate file. Results show that the extra work incurs an overhead of about 6 percent, i.e., the total run takes 20 seconds longer. Almost all of this overhead comes from additional I/O. We conclude that task for in-place reconstruction are small when compared with the effort of compressing data-the algorithmic tasks take only two seconds of additional time over the whole experiment. Despite inferior worst-case runtime bounds, the local-minimum policy performs nearly identically to (and marginally better than) the constant-time policy in practice.
Examining run-time results in more detail continues to show that IP-LMin traks the performance of IP-Const, even for the largest and most complex inputs. In Figure 8, we see how run-time performance varies with the size of the graph with the size of the graph the algorithm creates (number of edges and vertices); these plots measure data rate -file size (bytes) divided by run time (seconds). Graph size is the complexity measure for which IP-Const and IP-Lmin should vary, but no such variance can be seen. Results show that in-place conversion algorithms are I/O bound, as are differencing algorithms [1]. Reducing computational effort when breaking cycles benefits an algorithm very little, as computation is a small fraction of total performance; whereas minimizing the size of the output benefits an algorithm more, as I/O dictates overall performance.  In Figure 9, we look at some statistical measures of graphs constructed when creating in-place difference files. While graphs can be quite large, a maximum of 26,626 vertices and 40,950 edges, the number of edges scales linearly with the number of vertices and less than linearly with input file size. The constructed graphs do not exhibit edge relations that approach the O(|V | 2 ) upper bound. Therefore, data rate performance should not degrade as the number of edges increases. To illustrate, consider two pairs of versions as inputs to the IP-LMin algorithm in which one pair of versions generates a graph that contains twice the edges of the other. Based on Figure 9, we expect the larger graph to have twice as many vertices and encode twice as much data. While the larger instance does twice the work breaking cycles, it benefits from reorganizing more than twice as much data.
The linear scaling of edges with vertices and file size matches our intuition about the nature of delta compressed data. Delta compression encodes multiple versions of the same data. Therefore, we expect matching regions between these files (encoded as edges in a CRWI graph) to have spatial locality; i.e., the same string often appears in the same portion of a file. These input data do not, in general, exhibit correlation between all regions of a file that results in dense edge relations. Additionally, delta compression algorithms localize matching between files, correlating or synchronizing regions of file data [1]. All of these factors result in the linear scaling that we observe.

Generalization to In-Place Delta Compression
As mentioned in the Introduction, delta compression permits to be copied from the version file, as well as from the reference file. Parts of the version file that have already been materialized during the reconstruction may be copied to other parts of the version file. Although in-place delta compression is not a subject of this paper, we note that the conversion of an arbitrary delta encoding to an in-place reconstructible delta encoding fits within our framework. We assume that the input delta encoding is designed to materialize the version file in space that is separate from the space occupied by the reference file. Thus, the copy commands can be partitioned into copyfrom-V commands that read from the version file. For in-place reconstruction, as before, no part of the read interval of a copy-from-R command may be overwritten before the command is performed. But, for a copy-from-V command, all of its read intervals must be overwritten with that part of the version file before the command is performed.
An algorithm that converts an arbitrary delta encoding to an in-place reconstructible delta encoding proeeds as follows: First, apply the algorithm of Section 4 to the copy-from-R commands and the add commands in the input delta encoding. The output is a sequence of copy-from-R commands and the add commands in the input delta encoding. The output is a sequence of copy-from-R commands followed by add commands (including the add commands that were created by replacing a copy-from-R command by an equivalent add command). By the correctness of our algorithm, when this command sequence is applied in-place to the reference file, it materialized the version file except for thos intervals that are write intervals of copy-from-V commands. The in-place reconstructible delta encoding is completed by placing the copy-from-V commands, in the same order that they appear in the input delta encoding, after the add commands.

A Sufficient Condition for CRWI Digraphs
Sections 7, 8, and 9 contain our results on the graph theory and computational complexity of in-place differential encoding. All proofs are available in a companion technical report [22].
In this section we give a simple sufficient condition (Lemma 2) for a digraph to be a CRWI digraph. We use this result to prove the theorems in Sections 8 and 9. We begin by recalling the definition of a CRWI digraph and  W). Furthermore, G is a disjointread CRWI digraph if in addition the intervals in R are pairwise disjoint. The motivation for this restriction is that if a version string V is obtained from a reference string R by moving, inserting and deleting substrings, then a delta encoding of V could have little or no need to copy data from the same region of R more than once. An NP-hardness result with the disjoint-read restriction tells us that the ability of a delta encoding to copy data from the same region more than once is not essential to the hardness of the problem. Let N + denote the positive integers. A digraph G with cost function Cost : V → N + is a length-cost CRWI digraph if there is an RWIS (R, W) such that G = graph(R, W) and |R(v)| = Cost(v) for all 1 ≤ v ≤ n. The motivation for the length-cost restriction is that replacing a copy of a long string s by an add of s causes the length of the delta encoding to increase by approximately the length of s. We let (G, Cost) denote the digraph G with cost function Cost.
For a digraph G and a vertex v of G, let indeg(v) (resp., outdeg(v)) denote the number of edges directed into (resp., out of) v. Define indeg(G) (resp., outdeg(G)) to be the maximum of indeg(v) (resp., outdeg(v)) over all vertices v of G. The digraph G has the 1-or-1 edge property if, for each edge → v w of G, either out d eg (v) = 1 or indeg(w) = 1 (or both).

Let G be a digraph. If G has the 1-or-1 edge property then G is a CRWI digraph. If in addition indeg(G) ≤ 2,
then G is a disjoint-read CRWI digraph.

Let G = (V, E ) be a digraph and let
If G has the 1-or-1 edge property and outdeg(G) ≤ 2, then (G, Cost) is a length-cost CRWI digraph. If in addition indeg(G) ≤ 2, then (G, Cost) is a disjoint-read length-cost CRWI digraph.
We give the formal proof of this lemma, which is somewhat tedious, in the Appendix. Here we briefly outline how the assumption that G has the 1-or-1 edge property is used in the proof. Suppose that indeg(w) ≥ 2 and let v 1 , v 2 , . . . , v d be the vertices such that there is an edge Figure 11 in the Appendix). By the 1-or-1 edge property, outdeg(v i ) = 1 for all i . Then we choose the read intervals R(v 1 ), R(v 2 ), . . . , R(v d ) consecutively and choose the write interval W (w) so that it intersects all of these read intervals. Because outdeg(v i ) = 1 for all i , there does not exist a vertex w ′ ̸ = w such that R(v i ) intersects W (w ′ ). Therefore, the order of the intervals R(v 1 ), R(v 2 ), . . . , R(v d ) does not matter, and we are not forced to choose W (w ′ ) so that it intersects W (w). Similarly, suppose that outdeg(v) ≥ 2 and let w 1 , w 2 , . . . , w d be the vertices such that there is an edge Figure 11). By the 1-or-1 edge property, indeg(w i ) = 1 for all i , so there does not exist a v ′ ̸ = v such that R(v ′ ) intersects W (w i ). Therefore, we can choose W (w 1 ),W (w 2 ), . . . ,W (w d ) consecutively and their order does not matter.
(b) (a) Figure 10: (a) A disjoint-read length-cost CRWI digraph that does not have the 1-or-1 edge property. (b) A graph with outdeg ≤ 2 and indeg ≤ 2 that is not a CRWI digraph.
While Lemma 2 shows that the 1-or-1 edge property is a sufficient condition for a digraph to be a CRWI digraph, it is not necessary. This is shown by the graph in Figure 10(a), which does not have the 1-or-1 edge property but is a CRWI digraph, in fact, a disjoint-read length-cost CRWI digraph for any cost function with Cost(v) ≥ 2 for all v. On the other hand, the conditions outdeg(G) ≤ 2 and indeg(G) ≤ 2 alone are not sufficient. This is shown by the graph in Figure 10(b), which is not a CRWI digraph.

Optimal Cycle Breaking on CRWI Digraphs is NP-hard
In this section we prove the result mentioned in Section 4.3, that given a CRWI digraph G and a cost function on its vertices, finding a minimum-cost set of vertices whose removal breaks all cycles in G is an NP-hard problem. Moreover, NP-hardness holds even when the problem is restricted to the case that (G, Cost) is a disjoint-read length-cost CRWI digraph and all costs are the same.
For a digraph G = (V, E ), a feedback vertex set (FVS) is a set S ⊆ V such that the digraph obtained from G by deleting the vertices in S and their incident edges is acyclic. Define φ(G) to be the minimum size of an FVS for G. Karp [14] has shown that the following decision problem is NP-complete.

FEEDBACK VERTEX SET
Instance: A digraph G and a K ∈ N + . Question: Is φ(G) ≤ K ?
His proof does not show that the problem is NP-complete when G is restricted to be a CRWI digraph. Because we are interested in the vertex-weighted version of this problem where G is a CRWI digraph, we define the following decision problem. WEIGHTED CRWI FEEDBACK VERTEX SET Instance: A CRWI digraph G = (V, E ), a function Cost : V → N + , and a K ∈ N + . Question: Is there a feedback vertex set S for G such that v∈S Cost(v) ≤ K ?
The following lemma is the basis for the proof of NP-completeness of this problem.
Proof. Let G ′ = (V ′ , E ′ ). The digraph G contains the directed subgraph D v for each v ∈ V ′ . The subgraph D v consists of the vertexṽ, a directed binary in-tree T in,v with rootṽ and indeg(v) leaves (i.e., all edges are directed from the leaves toward the rootṽ), and a directed binary out-tree T out,v with rootṽ and outdeg(v) leaves (i.e., all edges are directed from the rootṽ toward the leaves). If indeg(v) = 0 (resp., outdeg(v) = 0) then T in,v (resp., T out,v ) is the single vertexṽ. For each edge → x y of G ′ , add to G an edge from a leaf of T out,x to a leaf of T in,y , such that each leaf is an endpoint of exactly one such "added edge". By construction, outdeg(G) ≤ 2 and indeg(G) ≤ 2.
To see that the 1-or-1 edge property holds: Let e = → v w be an arbitrary edge of G; if e is an edge of some in-tree, then outdeg(v) = 1; if e is an edge of some out-tree, then indeg(w) = 1; and if e is an added edge, then It remains to show that φ(G) = φ(G ′ ). Say first that S ′ is a FVS for G ′ with |S ′ | = φ(G ′ ). It is clear that S = {ṽ | v ∈ S ′ } is a FVS for G with |S| = |S ′ |, because every path from a leaf of T in,v to a leaf of T out,v must pass throughṽ. Therefore, φ(G) ≤ |S| = |S ′ | = φ(G ′ ). Say now that S is a FVS for G with |S| = φ(G). Define S ′ ⊆ V ′ by placing v in S ′ iff at least one vertex of D v is in S. Obviously, |S ′ | ≤ |S|. It is easy to see that S ′ is a FVS for G ′ , because if C ′ is a cycle in G ′ that passes through vertices v 1 , . . . , v m and none of these vertices belong to S ′ , then no vertex of D v i for 1 ≤ i ≤ m can belong to S. So there is a cycle in G, obtained from C ′ , that passes through no vertex of S; this contradicts the assumption that S is a FVS for G. Therefore, φ(G ′ ) ≤ |S ′ | ≤ |S| = φ(G). ■

Theorem 1 WEIGHTED CRWI FEEDBACK VERTEX SET is NP-complete. Moreover, for each constant C ≥ 2, it remains NP-complete when restricted to instances where (G, Cost) is a disjoint-read length-cost CRWI digraph,
Proof. The problem clearly belongs to NP. To prove NP-completeness we give a polynomial-time reduction from FEEDBACK VERTEX SET to WEIGHTED CRWI FEEDBACK VERTEX SET. Let G ′ and K ′ be an instance of FEEDBACK VERTEX SET, where G ′ is an arbitrary digraph. Transform G ′ to G using Lemma 3. Let Cost ≡ C . Because G has the 1-or-1 edge property, outdeg(G) ≤ 2, and indeg(G) ≤ 2, Lemma 2 says that (G, Cost) is a disjoint-read length-cost CRWI digraph. Clearly the minimum cost of an FVS for G is C ·φ(G), and C ·φ(G) = C ·φ(G ′ ) by Lemma 3. Therefore, the output of the reduction is (G, Cost) and C K . ■ Given an NP-hard optimization problem, it is natural to ask whether the problem can be approximately solved by a polynomial-time algorithm. The worst-case approximation performance is typically measured by the worstcase ratio of the cost of the solution found by the algorithm to the optimum cost; see, for example, [10]. The currently best known polynomial-time approximation algorithm for the min-cost FVS problem on general digraphs has ratio O(log n log log n) where n is the number of vertices in the input digraph; this is shown by Even et al. [8], building on work of Seymour [26]. An obvious question is whether this ratio can be improved, perhaps to a constant, by restricting G to CRWI digraphs. Unfortunately, the restriction to CRWI digraphs cannot help much, in the sense that an improvement in r (n) for CRWI G would give a related improvement in r (n) for general G. A modification to the proof of Theorem 1, again using Lemma 3, shows the following: If there is a polynomial-time approximation algorithm with ratio r (n) for the min-cost FVS problem where the input (G, Cost) is restricted to be a disjoint-read length-cost CRWI digraph, then there is a polynomial-time approximation algorithm with ratio r ′ (n) = r (4n 2 ) for the min-cost FVS problem where (G, Cost) is arbitrary. For example, if r is constant then r ′ is constant, and if r (n) = O(log n log log n) and r is sufficiently smooth then r ′ (n) = O(r (n)).

Complexity of Finding Optimal In-Place Difference Files
The subject of the paper up to this point has been the problem of postprocessing a given differential encoding of a version file V so that V can be reconstructed in-place from the reference file R using the modified differential encoding. A more general problem is to find an in-place reconstructible differential encoding of a given version file V in terms of a given reference file R. Thus, this paper views the general problem as a two-step process and concentrates on methods for and complexity of the second step.

Two-Step In-Place Differential Encoding
Input: A reference file R and a version file V.
1. Using an existing differencing algorithm, find an encoding ∆ of V in terms of R.
2. Modify ∆ by permuting commands and possibly changing some copy commands to add commands so that the modified delta encoding is in-place reconstructible.
A practical advantage of the two-step process is that we can utilize existing differencing algorithms to perform Step 1. A potential disadvantage is the possibility that there is an efficient (in particular, a polynomial-time) algorithm that finds an optimally-compact in-place reconstructible encoding for any input V and R. Then, the general problem would be made more difficult by breaking it into two steps as above, because solving the second step optimally is NP-hard. However, we show that this possibility does not occur: Finding an optimally-compact in-place reconstructible encoding is itself an NP-hard problem. For this result, we define an in-place reconstructible encoding ∆ to be one that contains no WR conflict. It is interesting to compare the NP-hardness of minimum-cost in-place differential encoding with the fact that minimum-cost differential encoding (not necessarily in-place reconstructible) can be solved in polynomial time [20,23].
This NP-hardness result is proved using the following simple measure for the cost of a delta encoding. This measure simplifies the analysis while retaining the essence of the problem.
Simple Cost Measure: The cost of a copy command is 1, and the cost of an add command 〈 t , l 〉 is the length l of the added string.

BINARY IN-PLACE DELTA ENCODING
Instance: Two strings R and V of bits, and a K ∈ N + . Question: Is there a differential encoding ∆ of V in terms of R such that ∆ contains no WR conflict and the simple cost of ∆ is at most K ?
Taking R and V to be strings of bits means that copy commands in ∆ can copy any binary substrings from R; in other words, the granularity of change is one bit. This makes our NP-completeness result stronger, as it easily implies NP-completeness of the problem for any larger (constant) granularity.

Theorem 2 BINARY IN-PLACE DELTA ENCODING is NP-complete.
Proof. In this proof, "cost" means "simple cost", and a "conflict-free" ∆ is one containing no WR conflict. It suffices to give a polynomial-time reduction from FEEDBACK VERTEX SET to BINARY IN-PLACE DELTA ENCODING. Let G ′ and K ′ be an instance of FEEDBACK VERTEX SET. We describe binary strings R and V and an integer K such that φ(G ′ ) ≤ K ′ iff there is a conflict-free delta encoding ∆ of V in terms of R such that the cost of ∆ is at most K .
The binary strings R and V are of the form R = P R α 1 0 α 2 0 . . . α n 0, 0 * α ρ(1) 1 0 0 0 * α ρ(2) 1 0 0 0 * . . . α ρ(n−1) 1 0 0 0 * α ρ(n) 1 where 0 * (resp., 1 * ) denotes a string of zero or more 0's (resp., 1's), and where these "rubber-length" strings are adjusted so that: (i) the prefix P R of R does not overlap the suffix S V of V, and (ii) for all v, w ∈ V , the substring α v 1 of R overlaps the substring α w 1 of V iff → v w is an edge of G. That (ii) can be accomplished follows from the facts G = graph (R, W), all read and write intervals have length L = 3l + 6 (which equals the length of α v 1 for all v), and the read intervals are at least distance 3 apart so we can insert at least two zeroes between α ρ(i ) 1 and α ρ(i +1) 1 for 1 ≤ i < n.
Three properties this R and V will be used: (P1) R contains no occurrence of the substring 11; (P2) for each v ∈ V , the string α v 1 appears exactly once as a substring of R; (P3) for each v ∈ V with v ̸ = σ(n), the string α v 1 always appears in V in the context . . . 1α v 11 . . .. Property P1 is obvious by inspection. Property P2 follows from the facts: (i) 101 appears as a substring of R only as the first three symbols of α w for each w ∈ V ; and (ii) if v ̸ = w then α v ̸ = α w . Property P3 follows because, for each w ∈ V , the string α w 1 both begins and ends with 1, and there are only 1's between α σ(i ) 1 and α σ(i +1) for 1 ≤ i < n.
Let L V denote the length of V, and define K = L V − nL + n + K ′ . We show that φ(G) ≤ K ′ ⇔ there is a conflict-free delta encoding ∆ of V such that the cost of ∆ is at most K .
(⇒) Let φ(G) ≤ K ′ and let S be a FVS for G with |S| ≤ K ′ . We first describe an encoding ∆ ′ of V that is not necessarily conflict-free. Each substring represented by 1 * is encoded by an add command; the total cost of these add commands is L V − nL. If v ∈ V − S, then α v 1 is encoded by a copy of α ρ(i ) 1 in R, where i is such that ρ(i ) = v; the total cost of these copy commands is |V − S| = n − |S|. If v ∈ S, then α v 1 is encoded by a copy of α v from P R followed by an add of "1"; the total cost of these commands is 2|S|. Therefore, the total cost of ∆ ′ is L V − nL + n + |S| ≤ L V − nL + n + K ′ = K . For each v ∈ S, the read interval of the copy command that copies α v from P R does not intersect the write interval of any copy command in ∆ ′ . Therefore, the CRWI digraph of ∆ ′ is a subgraph of the graph obtained from G by removing, for each v ∈ S, all edges directed out of v. Because S is an FVS for G, the CRWI digraph of ∆ ′ is acyclic. Therefore, a conflict-free delta encoding ∆ of the same cost can be obtained by permuting the copy commands of ∆ ′ and moving all add commands to the end.
(⇐) Let ∆ be a conflict-free delta encoding of V having cost at most K = L V − nL + n + K ′ . By properties P1 and P3, it follows that no copy command in ∆ can encode a prefix (resp., suffix) of a substring α v 1 together with at least one of the 1's preceding it (resp., following it). Therefore, using property P1 again, the commands in ∆ that encode substrings denoted 1 * must have total cost equal to the total length of these substrings, that is, cost L V − nL. The remaining commands can be partitioned into sets C 1 ,C 2 , . . . ,C n such that the commands in C v encode α v 1 for each v ∈ V . Let S be the set of v ∈ V such that C v contains at least two commands. We first bound |S| and then argue that S is a FVS for G. By definition of S, the cost of ∆ is at least L V − nL + |V − S| + 2|S|. Because the cost of ∆ is at most L V − nL + n + K ′ by assumption, we have |S| ≤ K ′ . To show that S is a FVS, assume for contradiction that there is a cycle in G that passes only through vertices in V −S. If v ∈ V −S then C v contains one command γ v , so γ v must be a copy command that encodes α v 1. By property P2, the copy command γ v must be to copy the substring α v 1 from the unique location where it occurs in R as α ρ(i ) 1 where i is such that v = ρ(i ).
The strings R and V have been constructed such that, if → v w is an edge of G (in particular, if → v w is an edge on the assumed cycle through vertices in V − S), then the substring α v 1 of R overlaps the substring α w 1 of V. So the existence of this cycle contradicts the assumption that ∆ is conflict-free. ■

Conclusions
We have presented algorithms that modify difference files so that the encoded version may be reconstructed in the absence of scratch memory or storage space. Such an algorithm facilitates the distribution of software to network-attached devices over low bandwidth channels. Differential compression lessens the time required to transmit files over a network by encoding the data to be transmitted compactly. In-place reconstruction exchanges a small amount of compression in order to do so without scratch space.
Experimental results indicate that converting a differential encoding into an in-place reconstructible encoding has limited impact on compression. We also find that, for bottom line performance, keeping difference files small to reduce I/O matters more than execution time differences in cycle breaking heuristics because in-place reconstruction is I/O bound. The algorithm to convert a difference file to an in-place reconstructible difference file requires less time than generating the difference file in the first place.
Our results also add to the theoretical understanding of in-place reconstruction. We have given a simple sufficient condition, the 1-or-1 edge property, for a digraph to be a CRWI digraph. Two problems of maximizing the compression of an in-place reconstructible difference file have been shown NP-hard: first, when the input is a difference file and the objective is to modify it to be in-place reconstructible; and, second, when the input is a reference file and a version file and the objective is to find an in-place reconstructible difference file for them. The first result justifies our use of efficient, but not optimal, heuristics for cycle breaking.
In-place reconstructible differencing provides the benefits of differening for data distribution to an important class of applications-devices with limited storage and memory. In the current network computing environment, this technology greatly decreases the time to distribute software without increasing the development cost or complexity of the receiving devices. Differential compression provides Internet-scale file sharing with improved version management and update propagation, and in-place reconstruction delivers the technology to the resource-constrained computers that need it most.

Future Directions
Detecting and breaking conflicts at a finer granularity can reduce lost compression when breaking cycles. In our current algorithms, we eliminate cycles by converting copy commands into add commands. However, typically only a portion of the offending copy command actually conflicts with another command; only the overlapping range of bytes. We propose, as a simple extension, to break a cycle by converting part of a copy command to an add command, eliminating the graph edge (rather than a whole vertex as we do today), and leaving the remaining portion of the copy command (and its vertex) in the graph. This extension does not fundamentally change any of our algorithms, only the cost function for cycle breaking.
As a more radical departure from our current model, we are exploring reconstructing difference files with bounded scratch space, as opposed to zero scratch space as with in-place reconstruction. This formulation, suggested by M. Abadi, allows an algorithm to avoid WR conflicts by moving regions of the reference file into a fixed size buffer, which preserves reference file data after that region has been written. The technique avoids compression loss by resolving data conflicts without eliminating copy commands.
Reconstruction in bounded space is logical, as target devices often have a small amount of available space that can be used advantageously. However, in-place reconstruction is more generally applicable. For bounded space reconstruction, the target device must contain enough space to rebuild the file. Equivalently, an algorithm constructs a difference for a specific space bound. Systems benefit from using the same difference file to update software on many devices. For example, distributing an updated calendar program to many PDAs. In such cases, in-place reconstruction offers a lowest common denominator solution in exchange for a little lost compression.
Although departing from our current model could yield smaller difference files, the message of this paper remains that the compression loss due to in-place reconstructibility is modest even within this simple model. outdeg(v) = 1 (i.e., v ∈ T 1 ). Let H 2 be the set of w such that there exists a v with Illustration. In Figure 12, T 0 = {4} and H 0 = {2}.