The efficiency of modern video compression is largely a result of motion-compensated prediction (MCP) that exploits temporal correlation in the signal. Multiple reference frames for MCP improve compression efficiency and error resilience and require new paradigms in (a) reference frame selection, (b) bit rate allocation among frames, (c) application to scalable video codecs, and (d) delay requirements of these codecs. We study all four of these aspects in this dissertation. A dual-frame video coder employs two past reference frames (one short-term frame and one long-term frame available for prediction) for MCP. Dual-frame video codecs benefit greatly from near -optimal intra/inter mode switching within a rate- distortion framework. We show that such a scheme improves the error resilience of the coder. We improve the mode- switching algorithm with the use of half-pel motion vectors. Furthermore, we investigate the effect of feedback in making more efficient mode-switching decisions. In previous work, it was shown that uneven assignment of quality to frames, to create high-quality (HQ) long-term reference frames, can enhance the performance of a dual-frame encoder. Here, we demonstrate the performance advantages of optimal mode selection among such HQ frames for video transmission over noisy channels. We investigate dual frame prediction for both base and enhancement layers of an SNR scalable video coder, with pulsed quality allocation in the base layer. Furthermore, a per-pixel drift estimation algorithm is introduced, where the encoder estimates the potential drift at the enhancement layer recursively and chooses coding modes accordingly. Real-time video applications require tight bounds on end-to-end delay. Hierarchical bi-directional prediction requires buffering in the encoder input and output. Dual frame prediction with pulsed quality requires buffering at the encoder output. Both codecs involve uneven bit rate allocation that affects the encoder and decoder buffering requirements. We derive an efficient rate allocation for hierarchical B-pictures, and investigate the trade-off between delay and compression efficiency. Furthermore, we discuss effect of the temporal prediction distance and prediction branch truncation