Parallelism, Patterns, and Performance in Iterative MRI Reconstruction
Magnetic Resonance Imaging (MRI) is a non-invasive and highly flexible
medical imaging modality that does not expose patients ionizing radiation.
MR Image acquisitions can be designed by varying a large number of
contrast-generation parameters, and many clinical diagnostic
However, imaging speed is a fundamental limitation to many potential
Traditionally, MRI data have been collected at Nyquist sampling rates to produce
However, many recent scan acceleration techniques produce sub-Nyquist
For example, Parallel Imaging is a well-established acceleration technique
that receives the MR signal simultaneously from multiple receive channels.
Compressed sensing leverages randomized undersampling and the compressibility
(e.g. via Wavelet transforms or Total-Variation) of medical images
to allow more aggressive undersampling.
Reconstruction of clinically viable images from these highly accelerated
acquisitions requires powerful, usually iterative algorithms.
Non-Cartesian pulse sequences that perform non-equispaced sampling of k-space
further increase computational intensity of reconstruction, as they
preclude direct use of the Fast Fourier Transform (FFT).
Most iterative algorithms can be understood by considering the MRI
reconstruction as an inverse problem, where measurements of un-observable
parameters are made via an observation function that models the acquisition
Traditional direct reconstruction methods attempt to invert this
observation function, whereas iterative methods require its repeated computation
and computation of its adjoint.
As a result, na\"ive sequential implementations of iterative reconstructions
produce unfeasibly long runtimes. Their computational intensity is a
substantial barrier to their adoption in clinical MRI practice.
A powerful new family of massively parallel microprocessor architectures has
emerged simultaneously with the development of these new reconstruction
Due to fundamental limitations in silicon fabrication technology,
sequential microprocessors reached the power-dissipation limits of
commodity cooling systems in the early 2000's.
The techniques used by processor architects to extract instruction-level
parallelism from sequential programs face ever-diminishing returns, and further
performance improvement of sequential processors via increasing clock-frequency
has become impractical.
However, circuit density and process feature sizes still improve at
Moore's Law rates. With every generation of silicon fabrication technology,
a larger number of transistors are available to system architects.
Consequently, all microprocessor vendors now exclusively produce multi-core
Additionally, the move towards on-chip parallelism has allowed processor
architects a larger degree of freedom in the design of multi-threaded pipelines
and memory hierarchies.
Many of the inefficiencies inherent in superscalar out-of-order design are
being replaced by the high efficiency afforded by throughput-oriented designs.
The move towards on-chip parallelism has resulted in a vast increase
in the amount of computational power available in commodity systems.
However, this move has also shifted the burden of computational performance
towards software developers.
In particular, the highly efficient implementation of MRI reconstructions
on these systems requires manual parallelization and optimization.
Thus, while ubiquitous parallelism provides a solution to the computational
intensity of iterative MRI reconstructions, it also
poses a substantial software productivity challenge.
In this thesis, we propose that a principled approach to the design and
implementation of reconstruction algorithms can ameliorate this software
We draw much inspiration from developments in the
field of computational science, which has faced similar parallelization
and software development challenges for several decades.
We propose a Software Architecture for the implementation of reconstruction
algorithms, which composes two Design Patterns that originated in
the domain of massively parallel scientific computing.
This architecture allows for the most computationally intense operations
performed by MRI reconstructions to be implemented as re-usable libraries.
Thus the software development effort required to produce highly efficient
and heavily optimized implementations of these operations can be amortized
over many different reconstruction systems.
Additionally, the architecture prescribes several different strategies for
mapping reconstruction algorithms onto parallel processors, easing the
burden of parallelization.
We describe the implementation of a complete reconstruction, $\ell_1$-SPIRiT,
according to these strategies.
$\ell_1$-SPIRiT is a general reconstruction framework that seamlessly integrates
all three of the scan acceleration techniques mentioned above.
Our implementation achieves substantial performance improvement over baseline,
and has enabled substantial clinical evaluation of its approach to combining
Parallel Imaging and Compressive Sensing.
Additionally, we include an in-depth description of the performance optimization
of the non-uniform Fast Fourier Transform (nuFFT), an operation used in all
This discussion complements well our description of $\ell_1$-SPIRiT, which we
have only implemented for Cartesian samplings.