Face recognition in hyperspectral images

Hyperspectral cameras provide useful discriminants for human face recognition that cannot be obtained by other imaging methods. We examine the utility of using near-infrared hyperspectral images for the recognition of faces over a database of 200 subjects. The hyperspectral images were collected using a CCD camera equipped with a liquid crystal tunable filter. Spectral measurements over the near-infrared allow the sensing of subsurface tissue structure, which is significantly different from person to person but relatively stable over time. The local spectral properties of human tissue are nearly invariant to face orientation and expression, which allows hyperspectral discriminants to be used for recognition over a large range of poses and expressions. We describe a face recognition algorithm that exploits spectral measurements for multiple facial tissue types. We demonstrate experimentally that this algorithm can be used to recognize faces over time in the presence of changes in facial pose and expression.


INTRODUCTION
S PECTROSCOPY is a valuable tool for a large number of applications.Spectral measurements from human tissue, for example, have been used for many years for characterization and monitoring applications in biomedicine.In remote sensing, researchers have shown that hyperspectral data are effective for material identification in scenes where other sensing modalities are ineffective [1].The introduction of hyperspectral cameras has led to the development of techniques that combine spectral and spatial information.As hyperspectral cameras have become accessible, computational methods developed initially for remote sensing problems have been transferred to biomedical applications [2].Considering the vast person-to-person spectral variability for different tissue types, hyperspectral imaging has the ability to improve the capability of automated systems for human identification.
Current face recognition systems primarily use spatial discriminants that are based on geometric facial features [3], [4], [5], [6], [7].Many of these systems have performed well on databases acquired under controlled conditions [8], [9].However, these approaches often exhibit significant performance degradation in the presence of changes in face orientation.The study in [10], for example, showed that there is significant degradation in recognition performance for images of faces that are rotated more than 32 degrees from a frontal image that is used to train the system.A more recent study in [11], which uses a light-fields model for pose-invariant face recognition, showed promising recognition results for probe faces rotated more than 60 degrees from a gallery face.The approach currently requires the manual determination of the 3D transformation required to register face images.Algorithms that use geometric features can also perform poorly when subjects are imaged at different times.For example, recognition performance can degrade by as much as 20 percent when imaging sessions are separated by a two week interval [10].Partial face occlusion also often brings poor performance.A method [12] that divides the face into regions for isolated analysis can tolerate up to 1/6 face occlusion without losing accuracy.Thermal infrared imaging provides an alternative imaging modality that has been used for face recognition [13], [14], [15].However, techniques based on thermal images use spatial features and have difficulty recognizing faces after pose changes.A 3D morphable face model has been used for face identification across different poses [16].This approach has provided promising performance on a 68 subject data set.At the current time, however, this system is computationally intensive and requires considerable manual intervention.
Several of the limitations of current face recognition systems can be overcome by using spectral information.The interaction of light with human tissue has been studied extensively by various researchers [17], [18], [19] and determines tissue spectral properties.The epidermal and dermal layers of human skin constitute a scattering medium that contains several pigments such as melanin, hemoglobin, bilirubin, and -carotene.Small changes in the distribution of these pigments induce significant changes in the skin's spectral reflectance [20].The effects are large enough, for example, to enable algorithms for the automated separation of melanin and hemoglobin from RGB images [21].Recent research [22] has measured skin reflectance spectra over the visible wavelengths and proposed models for the spectra.Other researchers [23] have used a skin reflectance model over the 0.3m-0.8mrange to propose a method for skin detection under varying lighting conditions.A skin reflectance model has also been used to synthesize face images after changes in lighting and viewpoint [24].
In the near-infrared (NIR), skin has a larger penetration depth than for visible wavelengths enabling the imaging of subsurface characteristics that are difficult for a person to modify [25].The optical penetration depth is defined as the tissue thickness that reduces the light intensity to 37 percent of the intensity at the surface.The optical penetration depth is defined as 1= ffiffiffiffiffiffiffiffiffiffiffiffi 3 a 0 s p , where a and 0 s are the absorption coefficient and reduced scattering coefficient of the tissue, respectively.For a typical person, we have a ¼ 0:77mm À1 and 0 s ¼ 1:89mm À1 in the visible (550nm) and a ¼ 0:02mm À1 and 0 s ¼ 1:31mm À1 in the NIR (850nm) [26].This leads to an optical penetration depth of 3.57mm at 850nm and 0.48mm at 550nm.In addition, observed spectral signatures have little dependence on skin temperature over the NIR, whereas measured radiance in the thermal infrared (8m-12m) has a strong dependence on skin temperature [27].
Fig. 1 presents an example of the spectral variability in human skin using measurements obtained at our laboratory.The reflectance spectra in the figure were measured from the right cheek of four subjects over the NIR (700nm-1000nm).In Fig. 2, four reflectance spectra were acquired from different facial locations for one subject in order to compare withinclass and between-class variability.We see that there are significant differences in both the amplitude and spectral shape of the reflectance curves for the different subjects, while the spectral reflectance for one subject remains similar from trial-to-trial.Similar results were obtained for other facial skin samples.
Spectral variation for a single subject is also typically small over a range of poses.Fig. 3 plots spectral measurements derived from hyperspectral images.In Fig. 3a, NIR skin and hair reflectance spectra are plotted for two subjects as acquired in a front-view hyperspectral image.In Fig. 3b, reflectance spectra for the same subjects are plotted as acquired in a side-view (profile) image.While the subject was rotated, the camera and illumination configuration were the same for both images.We see that there is significant spectral variability from one subject to the other, while the spectral characteristics of the subjects remain stable over a large change in face orientation.The differences in the skin spectra between the two subjects are more pronounced, but the hair spectra also have discernible differences that are valuable for recognition.In this paper, we consider the use of spectral information for face recognition.We present experimental results on recognizing 200 human subjects using hyperspectral face images.For each subject, several NIR images were acquired over a range of poses and expression.Recognition is achieved by combining spectral measurements for different tissue types.Several of the subjects were imaged multiple times over several weeks to evaluate the stability of the hyperspectral measurements over time.

DATA COLLECTION AND CAMERA CALIBRATION
Our data collection utilizes a hyperspectral camera from Opto-Knowledge Systems, Inc. (OKSI) that is based on a liquid crystal tunable filter [28] made by Cambridge Research Instruments (CRI).The full-width at half-maximum (FWHM) of the filter is 10nm when the center wavelength is 850nm and the FWHM is proportional to the center wavelength squared.All images were captured with 31 spectral bands with center wavelengths separated by 0.01m over the NIR (0.7m-1.0m) with 468 Â 494 spatial resolution.A full 31-band hyperspectral image is acquired in about ten seconds.Fig. 4 shows the imaging setup with a subject and two light sources.Each source is a 750W halogen lamp with a white diffuser screen.The two sources provide approximately uniform illumination on the subject.Fig. 5 displays all 31 bands for one subject.The 31 bands are shown in ascending order from left to right and from top to bottom.All 31 bands are used by our face recognition algorithm.The spectral channels have unknown gains due to filter transmission and CCD response and unknown offsets due to dark current and stray light.These gains and offsets may change over time.Therefore, we devised a method to convert the raw images acquired by the hyperspectral camera to spectral reflectance images for analysis.Two spectralon panels were used during calibration.A panel with approximately 99 percent reflectance is referred to as white spectralon and a panel with approximately 2 percent reflectance is referred to as black spectralon.Both spectralon panels have nearly constant reflectance over the 0.7m-1.0mspectral range.The calibration of spectralon is traceable to the US National Institute of Standards and Technology (NIST).
The raw measurement obtained by the hyperspectral imaging system at spatial coordinate ðx; yÞ and wavelength k is given by where Lðx; y; k Þ is the illumination, Sðx; y; k Þ is the system spectral response, Rðx; y; k Þ is the reflectance of the viewed surface, and Oðx; y; k Þ is the offset which includes dark current and stray light.For the image of white spectralon, we have and for the image of black spectralon, we have where R W ð k Þ and R B ð k Þ are the reflectance functions for the two spectralon panels, respectively.We average 10 images of the white and black spectralon panels to obtain estimates of I W ðx; y; k Þ and I B ðx; y; k Þ: These estimates are used together with ( 2) and (3) to estimate Lðx; y; k ÞSðx; y; k Þ according to and this estimate can be substituted into (3) to obtain an estimate for Oðx; y; k Þ: With these estimates, (1) can be solved for reflectance to give We performed this calibration step at the beginning of each imaging session.The reflectance Rðx; y; k Þ is invariant to the illumination and our experiments do not consider illumination variability.In Section 5, we suggest a method that can be used to extend this work for illuminationinvariant recognition.We collected hyperspectral face images for a total of 200 human subjects.All these subjects were studied under protocol HS# 2000-1449, which was approved by the Institutional Review Board (IRB) of UC Irvine.As shown in Fig. 6, the 200 subject database has a diverse composition in terms of gender, age, and ethnicity.Images of all human subjects were acquired in sets of seven images per subject.Fig. 7 shows the seven images for one subject.Two front-view images were taken with neutral expression (fg and fa).
Another front-view image, fb, was taken with a different expression.Four other images were taken with face orientations of -90 degrees, -45 degrees, 45 degrees, and 90 degrees, as shown in Fig. 7.These images are referred to as fr2, fr1, fl1, and fl2, respectively.Twenty of the 200 subjects were imaged at different times separated by up to five weeks from their initial imaging session.Fig. 8 shows the front-view images of one subject taken at four different visits.

SPECTRAL METRIC FOR FACE RECOGNITION
In order to test the feasibility of hyperspectral face recognition, we represent each face image using spectral reflectance vectors that are extracted from small facial regions.Squares overlayed on the images in Fig. 7 indicate the size and location of the regions that are considered for each subject.The regions are selected manually, but we describe a method later in this section that is used to reduce dependence on the particular location of the region.For the frontal images (fg, fa, fb), five facial regions corresponding to the forehead, left cheek, right cheek, hair, and lips are used.For images acquired at other facial orientations, the subset of these facial regions that are visible are used, as shown in Fig. 7.The forehead, for example, is not visible for a facial orientation of 90 degrees.
For each facial region, the spectral reflectance vector T is estimated by averaging over the N pixel squares shown in Fig. 7 according to where the sum is over the N pixels in the square, B is the number of spectral bands, and t is one of the following tissue types: f (forehead), lc (left cheek), rc (right cheek), h (hair), or l (lip).The normalized spectral reflectance vector R t is defined by The distance between face image i and face image j for tissue type t is defined by the square of the Mahalanobis distance [29].
where AE t is the B Â B covariance matrix for the distribution of the vector R t for a subject.In our experiments, we use a single AE t to represent variability for tissue type t over the entire database.Since the amount of data available to estimate the covariance matrix is limited, we approximate AE t by a diagonal matrix L t with elements that correspond to the variance at each k .The matrix L t ðiÞ is estimated for each subject i using the vectors R t ðiÞ from each image of subject i that contains tissue type t: The overall matrix L t which is used to approximate AE t in ( 8) is obtained by averaging the L t ðiÞ matrices over all subjects.Fig. 9 plots the diagonal elements of L t as a function of wavelength for the forehead tissue type.The corresponding functions for the left cheek and right cheek are similar while the functions for the lips and hair have a similar shape but larger values of the variance at each wavelength.As seen in Fig. 9, the variance has larger values at the low and high ends of the 700-1000 nm wavelength range.This is due primarily to a lower signal-to-noise ratio for the sensing system for wavelengths near the ends of the spectral range.Since tissue spectral reflectance can have spatial variability, the distance D 0 t ði; jÞ will have some dependence on the locations of the squares used to compute R t ðiÞ and R t ðjÞ.We address this issue by defining a set S t ðiÞ ¼ fR In our experiments, we consider M ¼ 5 adjacent square regions of size 17 Â 17 pixels arranged in a cross pattern to define the sets S t ðiÞ for each tissue type except the lips.Smaller regions of size 9 Â 9 pixels are used to represent the smaller spatial extent of the lips.Recognition performance can be enhanced by utilizing all visible tissue types.Thus, the distance between a frontal face image i and a test face image j is defined as where !t is 1 if tissue type t is visible in the test image, and 0 otherwise.

EXPERIMENTAL RESULTS
We conducted a series of recognition experiments using an image database consisting of C ¼ 200 subjects.At each imaging session, seven images of each subject were acquired, as shown in Fig. 7. Image fg is used to represent the subject in the gallery set which is the group of hyperspectral images of known identity [9].The remaining images are used as probes to test the recognition algorithm.Thus, the experiments follow the closed universe model [9], where the subject in every image in the probe set is present in the gallery.Twenty of the 200 subjects participated in imaging sessions which occurred after their initial session.The images taken in the second and subsequent sessions are called duplicates.The results of the experiments will be presented using cumulative match scores [9].For a probe image j, the image in the gallery which corresponds to the same subject is denoted by T j .Given a probe image j, we can compute Dði; jÞ for each of the C images i in the gallery.Probe j is correctly recognized if DðT j ; jÞ is the smallest of the C distances.Given a set of probes, the total number of correctly recognized probes is denoted as M 1 .Similarly, M n is the number of probes for which DðT j ; jÞ is one of the n smallest of the C distances.Thus, M n is a monotonically nondecreasing function of n and we say that the algorithm correctly recognizes M n of the probes at rank n.The cumulative match score (CMS) function for an experiment is defined by R n ¼ M n =P where P is the total number of probes used in the experiment and n denotes rank.Note that if all of the probes are in the gallery, then R n equals 1 when n equals the size of gallery.We first consider the use of the frontal fa and fb probes to examine the utility of the various tissue types for hyperspectral face recognition.Fig. 10 presents the cumulative match scores R n as a function of the rank n that are obtained when using D t ði; jÞ for each of the tissue types individually and Dði; jÞ for the combination of all tissue types.We see that skin is the most useful tissue type for recognition while the hair and lips are less useful.The top curve in Fig. 10 shows that the best performance is achieved by combining all of the tissue types.We see that, for this case, more than 90 percent of the 400 probes are correctly identified in the 200 subject database.Fig. 11 compares recognition performance when using probes fa and fb separately with the algorithm that considers all tissue types.The fa images have the same facial expression as the gallery images, while the fb images have different expressions.We see that accurate recognition is achieved in both cases which suggests that recognition using hyperspectral discriminants is not impacted significantly by changes in facial expression.Nevertheless, probes with different facial expressions are somewhat harder to identify.Fig. 12 compares the performance for fa and fb probes for individual tissue types.There is little change in the forehead geometry for an expression change and the degradation in performance from fa to fb probes is the smallest over the four facial tissue types for the forehead.As seen in Fig. 12, recognition performance degrades more significantly for the left cheek and right cheek for the fb probes since an expression change can significantly change the local surface geometry for the cheek areas.We also see in Fig. 12 that the performance for the lips is significantly worse than for the other tissue types for both the fa and fb probes.Fig. 13 examines the impact of changes in face orientation on recognition performance over the 200 subject database.Current face recognition systems experience significant difficulty in recognizing probes that differ from a frontal gallery image by more than 32 degrees [10].As expected, however, hyperspectral images can be used to achieve accurate recognition results for larger rotations.In Fig. 13, we see that for probes that are rotated 45 degrees to the left or right from the frontal gallery image, 75 percent of the probes are recognized correctly and 94 percent of the probes have the correct match ranked in the top 5.For the difficult case of probes that are rotated 90 degrees about 80 percent of the probes have the correct match ranked in the top 10.These results utilize the distance function defined in terms of all visible tissue types.This distance function assumes that tissue spectral reflectance does not depend on the photometric angles.This is an approximation and leads to degradation in performance as the probe image is rotated with respect to the gallery image.
Fig. 14 plots CMS curves showing the effect of changes in face orientation on recognition for experiments where the probes are restricted to specific subsets of the database defined by age group, ethnicity, and gender.The full gallery of 200 subjects was used for these experiments.We see that recognition performance typically degrades as the size of the subset considered increases.One exception is that performance degrades from the 18-20 age group (86 subjects) to the 21-30 age group (67 subjects) for 45 degrees and 90 degrees face rotations.We speculate that additional facial 3D geometric structure may be starting to appear for the subjects in the 21-30 age group that leads to stronger bidirectional reflectance effects for the rotated face images.Tables 1, 2, 3, and 4 analyze the probes that are not identified correctly at rank 1 for the experiments described by Fig. 11 and 13.The first column in each table is the probe category according to gender, age group, or ethnicity and the second column indicates the number of probes in that category that are not identified correctly.The remaining columns in each row describe the distribution of the best match in the gallery for the incorrectly identified probes.If we consider the first row of Tables 1, for example, we see that four female probes were incorrectly identified in this experiment and that for three of these probes the top match in the gallery was female and for the other probe the top match in the gallery was male.We see from Tables 1, 2, 3, and 4, that female probes tend to false match with female images in the gallery and that male probes tend to false match with male images in the gallery.We also see that Asian probes tend to false match with images of the same ethnicity in the gallery.
Fig. 15 shows the recognition performance for duplicate probes, i.e., probe images taken on different days than the gallery image of the same subject.This experiment considers 98 probes acquired from 20 subjects at times between three days and five weeks after the gallery image was acquired.The same 200 subject gallery is used as in the other experiments.We see that 92 percent of the probes have the correct match ranked in the top 10.Fig. 15 also compares the recognition performance for duplicate probes acquired over different time intervals.We see that performance for duplicates acquired within one week (40 probes) is similar to performance for duplicates acquired at an interval of over one week (58 probes).We note that there is a significant reduction in recognition accuracy for the duplicate probes considered in Fig. 15 compared to the results for images acquired on a single day, as shown in Figs. 10, 11, 12, and 13.This can be attributed to changes in subject condition including variation in blood, water concentration, blood oxygenation, and melanin concentration.Drift in sensor characteristics and calibration accuracy is another possible source of day-to-day variation in spectral measurements.However, the experiments with duplicates indicate that hyperspectral imaging has potential for face recognition over time.We have demonstrated the utility of hyperspectral imaging for face recognition over time in the presence of changes in facial pose and expression.The experiments consider a database of calibrated NIR (0.7m-1.0m) hyperspectral images for 200 subjects.A face recognition algorithm based on the spectral comparison of combinations of tissue types was applied to the images.The results showed that the algorithm performs significantly better than current face recognition systems for identifying rotated faces.Performance might be further improved by modeling spectral reflectance changes due to face orientation changes.The algorithm also provides accurate recognition performance for expression changes and for images acquired over several week time intervals.Since the algorithm uses only local spectral information, we expect that additional performance gains can be achieved by incorporating spatial information into the recognition process.Previous work [1] has shown that the high-dimensionality of hyperspectral data supports the use of subspace methods for illuminationinvariant recognition.A similar method can be used for face recognition under unknown illumination.

Fig. 3 .
Fig. 3. Skin and hair reflectance spectra for two subjects.(a) Front view images.(b) 90 degree side view images.

ð1Þt
ðiÞ; R ð2Þ t ðiÞ; . . .; R ðMÞ t ðiÞg of normalized spectral reflectance vectors where each R ðkÞ t ðiÞ is derived from a different N-pixel square region in the image of subject i for tissue type t.A similar set S t ðjÞ is defined for subject j: The distance D t ði; jÞ is defined as the smallest squared Mahalanobis distance between an element of S t ðiÞ and an element of S t ðjÞ D t ði; jÞ ¼ min k2½1;M;l2½1;M

Fig. 8 .
Fig. 8. Examples of images taken at different times.

TABLE 1
Analysis of Incorrectly Identified, Front View, Neutral Expression Probes

TABLE 2
Analysis of Incorrectly Identified, Front View, Changed Expression Probes

TABLE 3
Analysis of Incorrectly Identified, 45 Degree Rotation Probes

TABLE 4
Analysis of Incorrectly Identified, 90 Degree Rotation Probes