Over the past decades, the memory hierarchy has increasinglybecome the bottleneck in general-purpose processors due to
a widening gap between the growing demand for large data and the
much slower scaling of conventional memory hierarchies.
Therefore, conventional in-core computing
suffers from increasingly expensive overheads such as excessive
request messages, unnecessary data movement and coherence traffic,
as well as limited off-chip bandwidth, to bring the data from
memory to computing cores.
To continue the performance and energy efficiency scaling,
architects propose near-data computing (NDC) in which computations
are offloaded to where the data is. However, existing NDC
techniques fall short of providing generality and flexibility
across different application domains, programming paradigms,
computing substrates, which are crucial to the wide adoption
of NDC.
Our key insight is that the critical missing cornerstone forgeneral and flexible near-data computing is a novel rich-semantic
memory abstraction. Unlike existing byte-grained load/store
operations, the new interface should express a wide range of
rich semantics such as the access pattern, reuse distance,
near-data computations, etc. Such high-level information is
essential for the system to promptly recognize the program's
long-term behavior and adjust accordingly to reach optimal states.
More importantly, the new interface should be as transparent
as possible to programmers with automatic compiler analysis
and runtime library support. Based on this, we can fundamentally
revolutionize the memory interface and co-optimize computation
and data together.
This dissertation explores a new ISA interface - streams- to precisely capture the program's long-term memory and compute activities.
Streams are incorporated into the program's functional semantics
and are exposed to the entire system stack to guide various policies.
Our evaluation and analysis suggest serval key findings. First,
a set of useful and prevalent stream patterns cover a wide range of
program behaviors and can be embedded into the program in a
lightweight way while still maintaining the sequential ordering.
Second, streams naturally decouple the address generation and
computation from the core pipeline and can be offloaded as the
basic unit for near-data computing. Third, by exposing high-level
semantics to the system, we can unify different computing paradigms
and codesign the software and data structure.
Overall, this dissertation aims to enable a general and
end-to-end near-data computing system that wipes out
the boundary between computation and data -- the
computation is freely scheduled in the system near the data, and
the data is carefully mapped to the memory resources to provide
maximal locality and parallelism. Such data-computation orchestration
is the key to continuing the performance and energy efficiency scaling.