Open Access Publications from the University of California

## Parallelism, Patterns, and Performance in Iterative MRI Reconstruction

• Author(s): Murphy, Mark
• Keutzer, Kurt
• et al.
Abstract

Magnetic Resonance Imaging (MRI) is a non-invasive and highly flexible

medical imaging modality that does not expose patients ionizing radiation.

MR Image acquisitions can be designed by varying a large number of

contrast-generation parameters, and many clinical diagnostic

applications exist.

However, imaging speed is a fundamental limitation to many potential

applications.

Traditionally, MRI data have been collected at Nyquist sampling rates to produce

alias-free images.

However, many recent scan acceleration techniques produce sub-Nyquist

samplings.

For example, Parallel Imaging is a well-established acceleration technique

Compressed sensing leverages randomized undersampling and the compressibility

(e.g. via Wavelet transforms or Total-Variation) of medical images

to allow more aggressive undersampling.

Reconstruction of clinically viable images from these highly accelerated

acquisitions requires powerful, usually iterative algorithms.

Non-Cartesian pulse sequences that perform non-equispaced sampling of k-space

further increase computational intensity of reconstruction, as they

preclude direct use of the Fast Fourier Transform (FFT).

Most iterative algorithms can be understood by considering the MRI

reconstruction as an inverse problem, where measurements of un-observable

parameters are made via an observation function that models the acquisition

process.

Traditional direct reconstruction methods attempt to invert this

observation function, whereas iterative methods require its repeated computation

As a result, na\"ive sequential implementations of iterative reconstructions

produce unfeasibly long runtimes. Their computational intensity is a

substantial barrier to their adoption in clinical MRI practice.

A powerful new family of massively parallel microprocessor architectures has

emerged simultaneously with the development of these new reconstruction

techniques.

Due to fundamental limitations in silicon fabrication technology,

sequential microprocessors reached the power-dissipation limits of

commodity cooling systems in the early 2000's.

The techniques used by processor architects to extract instruction-level

parallelism from sequential programs face ever-diminishing returns, and further

performance improvement of sequential processors via increasing clock-frequency

has become impractical.

However, circuit density and process feature sizes still improve at

Moore's Law rates. With every generation of silicon fabrication technology,

a larger number of transistors are available to system architects.

Consequently, all microprocessor vendors now exclusively produce multi-core

parallel processors.

Additionally, the move towards on-chip parallelism has allowed processor

architects a larger degree of freedom in the design of multi-threaded pipelines

and memory hierarchies.

Many of the inefficiencies inherent in superscalar out-of-order design are

being replaced by the high efficiency afforded by throughput-oriented designs.

The move towards on-chip parallelism has resulted in a vast increase

in the amount of computational power available in commodity systems.

However, this move has also shifted the burden of computational performance

towards software developers.

In particular, the highly efficient implementation of MRI reconstructions

on these systems requires manual parallelization and optimization.

Thus, while ubiquitous parallelism provides a solution to the computational

intensity of iterative MRI reconstructions, it also

poses a substantial software productivity challenge.

In this thesis, we propose that a principled approach to the design and

implementation of reconstruction algorithms can ameliorate this software

productivity issue.

We draw much inspiration from developments in the

field of computational science, which has faced similar parallelization

and software development challenges for several decades.

We propose a Software Architecture for the implementation of reconstruction

algorithms, which composes two Design Patterns that originated in

the domain of massively parallel scientific computing.

This architecture allows for the most computationally intense operations

performed by MRI reconstructions to be implemented as re-usable libraries.

Thus the software development effort required to produce highly efficient

and heavily optimized implementations of these operations can be amortized

over many different reconstruction systems.

Additionally, the architecture prescribes several different strategies for

mapping reconstruction algorithms onto parallel processors, easing the

burden of parallelization.

We describe the implementation of a complete reconstruction, $\ell_1$-SPIRiT,

according to these strategies.

$\ell_1$-SPIRiT is a general reconstruction framework that seamlessly integrates

all three of the scan acceleration techniques mentioned above.

Our implementation achieves substantial performance improvement over baseline,

and has enabled substantial clinical evaluation of its approach to combining

Parallel Imaging and Compressive Sensing.

Additionally, we include an in-depth description of the performance optimization

of the non-uniform Fast Fourier Transform (nuFFT), an operation used in all

non-Cartesian reconstructions.

This discussion complements well our description of $\ell_1$-SPIRiT, which we

have only implemented for Cartesian samplings.