The Explicitly Parallel Instruction Computing (EPIC) architecture
has been put forth as a viable architecture for achieving the instruction level
parallelism (ILP) needed to keep increasing future processor performance. The
Itanium processor developed at Intel is an example of an EPIC architecture.
One of the new features of the EPIC architecture is its support for predicated
execution. Predicated execution is a process that can replace branches with
statements defining 2 predicate registers (one true and one false), depending
on the condition in the replaced branch. Subsequent statements are then guarded
by one of the predicates, depending upon whether they would have been on the
taken or fall-through path of the branch. All statements begin execution, but
an operation is committed only if the value of its guarding predicate is true.
An advantage of predicated execution is that it can eliminate hard-to-predict
branches by combining both paths of a branch into a single path. However, data
dependence analysis (for the purpose of maintaining definition-use information)
is significantly more complex for the resulting code. When the two paths of a
branch are combined, definitions of the same logical registers (originally from
different paths) are intermingled. This makes it difficult to determine which
definition a use is actually dependent on. This dissertation presents both
hardware (Disjoint Path Analysis) and compiler (Predicated Static Single
Assignment) solutions for improving the data dependence analysis for predicated
regions of code by collecting information on predicate relationships. Another
feature of the EPIC architecture is the reduced hardware complexity. The EPIC
philosophy is that the compiler should handle most of the dependence analysis
and scheduling in order to simplify the processor, and at the same time the
compiler has a broader view of the code. However, the compiler cannot fully
anticipate run-time events such as cache misses. Consquently, it cannot always
create a static schedule to mitigate the effects of the increased latency that
might result. In this dissertation, we introduce Pending Functional Units (PFU)
which allow a limited amount of dynamic scheduling with minimal additional
hardware overhead.
Pre-2018 CSE ID: CS2002-0700