A New Direction in Tree Based Search Engine Architectures Using Balanced Single Port Memories
Skip to main content
Open Access Publications from the University of California

A New Direction in Tree Based Search Engine Architectures Using Balanced Single Port Memories


This paper examines the microarchitecture of a novel network search processor which provides both high execution throughput and balanced memory distribution. Pipelined forwarding engines are used in core routers to meet speed demands. Most algorithmic-based solutions for these engines use tree based search structures. The tree traversal is pipelined across a number of stages to achieve high throughput. Prior work has shown that the pipelining of these trees results in unevenly distributed memory. To address this imbalance, conventional approaches use either complex dynamic memory allocation schemes (which dramatically increase hardware complexity) or over provision each of the pipeline stages (which results in memory waste). This paper has three primary contributions: i) a novel logical pipeline architecture in which search operations can start execution at {\em any} stage, ii) a new allocation algorithm that leverages this degree of freedom to eliminate memory imbalance and thus memory waste, iii) a practical implementation of our logical pipeline which eliminates non-neighbor communication and guarantees in-order completion without using a dedicated task scheduler. The implementation also minimizes interconnect complexity by having searches enter and exit the pipeline through one location (while still allowing the search to begin at any stage). We validate our new scheme by implementing and simulating state of the art solutions for IPv4 lookup, VPN forwarding and packet classification. In our simulation we use both real life and synthetically generated routing tables and classifiers. We show that our new pipeline scheme and memory allocator can provide searches with a memory allocation efficiency that is within 1\% of non-pipelined schemes. This allows us to obtain a forwarding rate of 1 packet every 6 ns using memories with 2 ns cycle time, with a constant latency of 48 ns and near-perfect memory efficiency.

Pre-2018 CSE ID: CS2004-0799

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View