We investigate the OpenMP parallelization and optimization
of two novel data classification algorithms. The new algorithms are
based on graph and PDE solution techniques and provide significant accuracy and performance advantages over traditional data classification
algorithms in serial mode. The methods leverage the Nystrom extension
to calculate eigenvalue/eigenvectors of the graph Laplacian and this
is a self-contained module that can be used in conjunction with other
graph-Laplacian based methods such as spectral clustering. We use performance
tools to collect the hotspots and memory access of the serial
codes and use OpenMP as the parallelization language to parallelize the
most time-consuming parts. Where possible, we also use library routines.
We then optimize the OpenMP implementations and detail the performance
on traditional supercomputer nodes (in our case a Cray XC30),
and test the optimization steps on emerging testbed systems based on Intel’s
Knights Corner and Landing processors. We show both performance
improvement and strong scaling behavior. A large number of optimization
techniques and analyses are necessary before the algorithm reaches
almost ideal scaling.