The Discrete Cosine IVansform (DCT) is used in the MPEG and JPEG compression standards. Thus, the DCT component has stringent timing requirements. The high performance which is required cannot be achieved by a sequential implementation of the algorithm. In this report, we explore different optimization techniques to improve the performance of the DCT. We discuss various pipelining options to further reduce the latency. We present a transformation of the algorithm that reduces the memory requirements and hence, reduces the cost of the implementation. We also describe RT-level implementations of the sequential, pipelined and memory optimized designs.