A parallel algorithm for pulsed laser infrared tomography

Ž . A three-dimensional Tomographic Reconstruction Algorithm 3D TRA has been developed and implemented on a Ž . parallel machine at the Maui High Performance Computing Center MHPCC . Given a time sequence of infrared images, this algorithm provides information to determine the location, physical dimensions, and initial space-dependent temperature increase of laser heated blood vessels. In this approach the serial computation of a 3D TRA is decomposed into multiple tasks which can be executed in parallel to reduce the computation time. q 1998 Elsevier Science B


Introduction
An important clinical problem for improved laser Ž .treatments of PWS Port Wine Stains is three-dimensional reconstruction of the initial temperature increase of laser heated discrete blood vessels in human skin recorded by infrared imaging radiometry Ž .Milner et al., 1996 .PWS is a congenital vascular malformation that leaves the affected skin areas with a red discoloration.The preferred method of treatment uses a laser to photocoagulate the blood vessels comprising a PWS.Heat generated by absorption of pulsed laser radiation can diffuse to adjacent skin structure to cause scarring or pigmentary changes.Knowledge of the location and size of blood vessels prior to institution of laser therapy would aid the ) Corresponding author.clinician to select the appropriate laser treatment parameters for controlled destruction of targeted PWS Ž .blood vessels Nelson et al., 1995 .A three-dimensional Tomographic Reconstruction Ž .Algorithm 3D TRA is a computational technique which determines position and size of blood vessels from the infrared emission at the skin surface following laser radiation.Heat generated due to light absorption by subsurface blood vessels diffuses to the surface and results in increased infrared emission levels.Measuring the emitted radiation at the skin surface with a high speed infrared focal plane array Ž .IR-FPA generates a time sequence of infrared emission.An infrared imaging radiometry record of skin in response to pulsed laser exposure is a time sequence of gray-level frames which are stored and accessed using a computer.This data is used as input data in the 3D TRA.0167-8655r98r$19.00q 1998 Elsevier Science B.V. All rights reserved.
Execution of the 3D TRA is computationally intensive and requires a large size main memory.To decrease the computation time, we developed a parallelized version of a 3D TRA and implemented the code on the massively parallel machine in Maui Ž .1 High Performance Computing Center MHPCC .In the serial implementation of the algorithm, one instruction is executed at a time using one processor, however, a parallelized version can be executed using multiple processors.Parallelism is done by decomposing a task into smaller units and assigning them to multiple coordinated processors working Ž .simultaneously Lester, 1993 .We used data parallelism in order to improve performance.This technique is useful in numerical algorithms that deal with Ž .large arrays and vectors i.e., 3D TRA where a large number of different data items are subjected to identical or similar processing.
In Section 2, the 3D TRA is described.The parallelized algorithm and experimental results are given in Section 3. Section 4 is the conclusion.

Model, 3D TRA
At time t s 0, a space-dependent temperature in-Ž .crease, DT x, y, z , is generated in skin by absorption of laser radiation which causes an increase in Ž .infrared emission at the surface, D M x, y,t .Infrared emission is calculated from a multi-dimensional convolution integral equation: where 1 ( x,j ( m, 1( y,h ( n, 1( t, z,z ( p and Ž .K xy j , y y h,z ,t represents the thermal point T spread function deduced from the bioheat equation.Ž .Eq. 1 represents a forward problem in which the measured time sequence of infrared emission Ž .frames, D M x, y,t , may be computed from the initial space-dependent temperature increase, Ž .DT x, y, z , and known biophysical properties of the laser irradiated tissue.
Solving for DT given D M will provide the necessary information to choose appropriate laser parameters.This constitutes an inverse, or adjoint, problem,

T
The 3D TRA is an iterative method which computes 3D array DT from given 3D array D M as the Ž .input Smithies et al., 1996 .For an input size of m = n = p the computational complexity for serial Ž .case is O m P log m P n P log n P p P g , where g represents the total number of iterations completed by the algorithm.
The 3D TRA is computationally intensive and requires a large size main memory.We implemented a parallelized version to reduce the computation time.

The parallel algorithm
Communication and synchronization delay limit the performance of a parallel program.Communication delay overhead occurs because processors interact through a message-passing communication mechanism.In a parallel machine, processors are synchronized; meaning one processor may be forced to wait for another.The resultant delay is synchronization delay.In our approach by using a data parallelism technique synchronization delay is minimized.
To perform the parallel algorithm, N processors, P , P , . . ., P , are allocated.One processor, P , con-  Shoari et al.r Pattern Recognition Letters 19 1998 521-526 525 the initial space-dependent temperature increase, Ž .DG iq 1 , and send the array back to P .k 1 6.A revised estimate of DT is calculated by P .If i 1 is not greater than a pre-specified value max_iteration, i is incremented by one, and algorithm continues from Step 1.
In MHPCC processors are IBM RSr6000 Scal-Ž able POWER Performance Optimized With En-.chanced RISC with clock speed of 66.5 MHz.They have their own local memory and data is shared across a communications network using message passing.Bi-directional, any-to-any intercode connection -allows all processors to send messages simultaneously.To minimize the overhead and latency, for 80% we used non-blocking communication to overlap communication and computation.For an input size of m = n = p the complexity of the parallel ŽŽ . .algorithm is O m P log m P n P log n P p P g rN q communication-overhead, where g represents the total number of iterations completed by the algorithm.
The parallel and serial code were tested on the Ž .Chick Chorioallantoic Membrane CAM .
Fig. 1 presents the input image which is a time sequence of recorded infrared emission images of laser-heated blood vessels in the CAM.The output image, a space-dependent temperature increase, is shown in Fig. 2.
To analyze the performance of the parallel algorithm, both parallel and serial 3D TRA were run on the parallel machine at MHPCC with an input and output size of 64 = 64 = 128.
The parallel algorithm was run on different num-Ž .ber of processors IBM RSr6000 SP .The runtime, the time from start to end of execution, of serial and parallel algorithms with different number of itera-Ž .Ž .tions are shown in Fig. 3 Karp and Flatt, 1990 , scaling speed-up by the number of processors used, are Ž .shown in Fig. 3 c .
By running the parallel algorithm on more processors, the data is decomposed into smaller parts.As the results demonstrate, the communication overhead between processors increases.As shown in Fig. 3, running the parallel algorithm on 15 processors gives the optimum speedup, for an input size of 64 = 64 = 128.
Large size of the problem restricts selection of computing platforms to implement the 3D TRA.Therefore, solving a problem with large data structure may not be feasible.By using multiple processors, data is distributed in local memory of individual processors, and nonlocal data may be accessed via communications.That is another advantage of using parallel programming which gives a solution of the problem with large data structures.

Conclusion
Given a time sequence of infrared images, the 3D TRA provides information to determine the location, physical dimensions, and initial space-dependent temperature increase of laser heated blood vessels.They aid the clinician to select the appropriate laser Ž .treatment parameters, optimal pulse duration t and p Ž .light dosage D , for controlled and irreversible destruction of targeted PWS blood vessels.
The 3D TRA requires intensive computations.One method of speeding up the algorithm is to implement a parallelized version on a high perfor-Ž .mance parallel machine i.e., MHPCC .The serial algorithm is formulated in a way that produces many parallel streams of operations to be executed by different processors.The parallel algorithm increased the computational speed as demonstrated in this paper.Also, implementing the 3D TRA on a parallel machine makes it possible to efficiently execute the algorithm for problems with large data structures which may not be feasible using serial machines.
of Health and Dermatology Foundation.Institutional support from the Office of Naval Research, Department of Energy, National Institutes of Health, and the Beckman Laser Institute and Medical Clinic Endowment is also gratefully acknowledged.
Through the use of the Maui High Performance Ž .Computing Center MHPCC , this research was sponsored in part by the Phillips Laboratory, Air Force Materiel Command, USAF, under cooperative agreement number F29601-93-2-0001.The views and conclusions contained in this document are those of Ž .the author s and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Phillips Laboratory, the U.S. Government, The University of New Mexico, or the MHPCC.
a .Fig. 3 b represents the speedup of serial algorithm over parallelized version with different number of processors running on MH-Ž .PCC.The efficiency