Bounding the load capacitance at gate outputs is a standard element in today's electrical correctness methodologies for high-speed digital very large scale integration design. Bounds on load caps improve coupling-noise immunity, reduce degradation of signal transition edges, and reduce delay uncertainty due to coupling noise (Kahng et al. 1998). For clock and test distribution, an additional design requirement is bounding the buffer skew, i.e., the difference between the maximum and the minimum number of buffers over all of the source-to-sink paths in the routing tree, since buffer skew is one of the main factors affecting delay skew (Tellez and Sarrafzadeh 1997). In this paper, we consider algorithms for buffering a given tree with the minimum number of buffers under given load cap and buffer skew constraints. We show that the greedy algorithm proposed by Tellez and Sarrafzadeh is suboptimal for nonzero buffer-skew bounds and give examples showing that no bottom-up greedy algorithm can achieve optimality. The main contribution of the paper is an optimal dynamic programming algorithm for the problem. Experiments on test cases extracted from recent industrial designs show that the dynamic programming algorithm has practical running time and saves up to 37.5% of the buffers inserted by Tellez and Sarrafzadeh's algorithm.