Evolving Attractive Faces Using Morphing Technology and a Genetic Algorithm: A New Approach to Determining Ideal Facial Aesthetics

Objectives: The objectives of this study were to: 1) determine if a genetic algorithm in combination with morphing software can be used to evolve more attractive faces; and 2) evaluate whether this approach can be used as a tool to define or identify the attributes of the ideal attractive face.


INTRODUCTION
In our culture, being beautiful has its advantages, as we are a society prone to judge a book by its cover. Beautiful people are invested by others with a plethora of desirable characteristics such as warmth, sensitivity, poise, and kindness. Attractive people receive preferential treatment and have intrinsic social, marital, and occupational success as a consequence of winning a genetic lottery. Despite the importance of beauty in our cultural, social, and economic fabric, rigorous definitions of beauty are lacking, and this is particularly true with respect to facial esthetics. 1 Defining beauty remains elusive, though operationally, to paraphrase Supreme Court Justice Potter Stewart: "You know it when you see it." Quantitative approaches to defining beauty are rooted in morphometric techniques largely aimed at identifying geometric relationships between facial features and subunits or defining specific linear and angular measurements. Da Vinci and Durer independently developed the classical canons of facial beauty that have permeated art, science, fashion, and popular culture, and set forth the basis for the rules of thirds and fifths and other strategems. Their work has stood the test of time and remains in good agreement with most modern studies on facial proportion. With the rise of mass media through the 20th century, art, popular culture, and fashion converged, and defining facial beauty became relevant to marketing and advertising. The economic impact spurred serious academic inquiry. 2 With the rise and increasing acceptance of cosmetic surgery during the 1990s, defining beauty has become even more relevant to surgeons. 3 Modern approaches combine anthropomorphic methods with focus group ratings of facial beauty. 4 Focus groups are formed using either expert or lay groups of evaluators who score, rank, or segregate faces based subjectively on appearance. These beauty scores are then tabulated and may be correlated with linear or angular measurements of either the face or photographs of the face taken from different vantage points. 5 Farkas has done the most detailed and comprehensive work using this basic methodology, 6 and has published more than 100 articles on this topic alone. His studies are extremely meticulous and involve the use of intricate and innovative devices and techniques for obtaining facial measurements. He has performed studies using widely divergent study subjects across ethnicities, racial groups, and genders, and he has used different types of focus groups as well. By necessity, these comprehensive studies are labor and time intensive, thus limiting the scope and extent of both study subjects and evaluators. Others have adopted his general approach and have teased out cultural influences, segregated focus groups, and further explored demographic influences. While rigorously quantitative, these measures are of limited practical value to the artist, esthetician, marketing executive, or surgeon.
In the 1990s, growing interest in the hypothesis that beauty is rooted in the genetic makeup of the individual and is an indirect measure of overall health, and perhaps more accurately, reproductive fitness, spurred biologists and experimental psychologists to explore this concept in greater detail. 7 The most celebrated examples of this hy-pothesis are the studies that examined cross-cultural preferences of men for specific hip-waist-bust ratios in women. The hip-waist-bust ratio is believed to be an indirect link to the subject's hormonal mileau, secondary sexual characteristics, and more broadly, to fertility. In the face, the appeal for men for women with full lips and small jawlines has been hypothesized to correlate with hormonal changes in postadolescent females (again a fertility cue), while in men, a strong jaw and prominent brow ridge are characteristics associated with testosterone surges at maturity. 8 -12 Thus, what we may consider "attractive or beautiful" may be related to structural or functional consequences that are rooted in evolution. 7,13 These hypotheses rooted in evolutionary biology are speculative, but have been the intriguing subjects of intense academic and popular cultural debate. 14 More recently, digital image processing techniques have been used to alter images and refine which features are found appealing to study populations. 15 The pioneering work of Johnston incorporated custom software with an on-line voting system used to rate faces, and marked a novel approach to identifying the specific appearance of a beautiful face while avoiding the labor intensive approach of traditional morphometric approaches. 13,16,17 Johnston's landmark body of work identified the "most beautiful face" which was evolved from an expansive on-line voting scheme. Johnston's software "drew" faces based essentially on the outcome of on-line voting, created new faces, collected new votes, and reposted the faces again using an iterative process. While the results of this study are compelling, the software drew faces that provide quite a bit of detail on facial shape and specific features such as eyes and lips; but fell short in terms of producing a realistic digital simulacrum, and was limited because the software made changes in discrete rather than continuous increments. In contrast, others have focused on generating images using advanced image processing or morphing technology, and have examined the impact of specific changes in facial features such as facial shape, and deviation from classical canons in terms of facial proportions, using photographs of real subjects. These studies have required investigators to alter physical features in an ad hoc manner and allowed identification of whether focus groups prefer specific features such as a larger or smaller jaw.
Digital image manipulation in this arena has not been fully exploited and used in defining what is facial beauty, particularly now with the availability of low-cost software and high-powered computing. 18 Currently, digital photographs can be morphed with one another using consumer-level software to produce extremely realistic synthetic faces. Johnston's work is the closest to what may be described as an evolutionary biology approach toward identifying the features of an attractive face, but does not incorporate the randomness inherit in natural selection.
In this study, we used morphing software to create realistic appearing synthetic faces from digital photographs of volunteers. Selection of morphing pairs was accomplished using a genetic algorithm with facial beauty as the only selection pressure. The digital "breeding process" aimed to evolve progressively more attractive facial cohorts with each iteration of the algorithm. The objectives of this study were to: 1) determine if a genetic algorithm in combination with morphing software can be used to evolve more attractive faces; and 2) evaluate whether this approach can be used as a tool to define or identify the attributes of the ideal attractive face.

Photography and Subject Population
Digital portraits were taken of women between ages 18 and 25 with the approval of the Institutional Review Board at the University of California Irvine. This study is exclusively focused on female faces; companion studies to follow will examine male faces. No candidates were rejected on the basis of ethnicity or race. Volunteers were rejected if they had obvious craniofacial abnormalities such as cleft lip and cleft palate deformities. Volunteers were solicited from various courses, student associations, sororities, medical student associations, and also from placement of a booth within the University of California Irvine Student Center. A total of 250 volunteers were photographed. Participants were photographed under standard conditions with the face oriented along the Frankfort plane against a neutral blue background. The hair was pulled back with a headband to fully expose the entire face, including the ears and trichial line. A black barber's cape encircled the neck at the level of the sternal notch. At most, only scant natural makeup was permitted, and most subjects were asked to remove their cosmetics and appear clean scrubbed. Only photographs with neutral facial expression (repose) were used. Volunteers were asked to remove earrings and other facial piercings. Digital cameras (Rebel XT, 100 mm Macro Lens, Cannon USA, Lake Success, NY) were used to obtain all images, and faces were photographed at a distance of approximately 6 feet with either flash or ambient artificial lighting.

Morphing Approach
Morphing is the processing of digitally transforming one image into another. Morphing algorithms work by marking prominent features or registry points, such as tips and corners, on each of the images. Algorithms are then used to map the movements of these points from one object to the other. The morphing process can be stopped at any point to get different proportions of the first and second image. In this study, we selected Morphman 2000 (STOIK Imaging, LTD, Moscow, Russia) because of its low-cost, ease of use, and capability to use polygonal regions of interest to outline detailed structures such as the eyes, nose, and lips. The software also provided dynamic visualization of both parent images during the registration process. Each synthetic image was a 50:50 morph of two other images.
Through trial and error, we determined that to create highly realistic faces, the pupils, iris, lid crease, eyelashes, vermillion border, eyebrows, alar crease, nasal tip, and ala needed to be identified and outlined with extreme precision on both parent images (Fig. 1). Further polygonal regions of interest over broad featureless regions such as the cheeks, forehead, and chin needed to be encircled, as did the melolabial folds and mental crease. Research assistants constructed preliminary morphing templates initially for each pair of faces. The authors then optimized the templates to improve registration around key features such as the eyes, eyelids, brows, and ears.

Construction of the Parent Generation
Development of the parent generation (P) for morphing presented a logistical challenge. The facial photographs used in this work are part of a larger photograph database managed by the lead author under approval of the Institutional Review Board at the University of California Irvine. This database is being used for several facial analysis projects involving hundreds of subjects.
The presentation of actual subject photographs in public venues such as conferences or in publications would require execution of a lengthy written informed consent document. The time required for informed consent would severely limit the accrual of subjects and decrease the number of photographs within our overall database. However, the Institutional Review Board at our institution permitted the use of photographs that have been digitally altered to produce synthetic images such as those created during the morphing process. Hence, we opted to use synthetic faces for the original parent generation of faces in this study.
The parent generation of faces were produced by first segregating faces into four ethnic groups: 1) white, 2) Asian, 3) Latino, and 4) Middle Eastern. (There were few African-Americans student volunteers in the study, as they make up less than 2% of our county's population.) Photographs within a specific ethnic group were then randomly selected to form pairs and morphed. Thirty pairs of faces were used to generate the parent generation.

Initial Focus Group Evaluations
The parent generation morphs were evaluated and scored from 1 (unattractive) to 10 (attractive) by undergraduate students (n ϭ 17) during a one-semester esthetic surgery seminar taught by the lead author over a 12-week period. This small focus group was used to provide facial attractiveness scores because the same students would be available over the full 12-week term. The demographics of this evaluator group reflect the socioeconomic and ethnic composition of undergraduates at our institution and mirror the demographics of our geographic region. Two-thirds of the students were women. Prior to scoring faces, each student evaluator spent a week developing a visual analogue scale for facial beauty with a face (culled from the Internet) representing each score from 1 (unattractive) to 10 (attractive). The use of the visual analogue scale was aimed at encouraging a more consistent approach to scoring faces by each evaluator. The scoring of each face was performed using a classical focus group approach. Images of each face were presented one at a time onto a projection screen using an liquid crystal display (LCD) projector for approximately 45 seconds. Only 30 faces were presented on any given day. Scores for each face were tabulated and averaged, thus providing an average facial attractiveness score for each face in the parent generation. Images for each new generation of evolved morphs were later presented on three additional occasions approximately 3 weeks apart. Of note, this focus group did not evaluate the fourth generation of morphed faces.

Genetic Algorithm
Natural selection is the foundation of biology. It is the process by which favorable traits that are heritable become more common in successive generations, and unfavorable traits become less common. Natural selection acts on the observable characteristics of an organism, favoring individuals with the traits that favor survival and reproduction in a given environment. Over time, this process can result in adaptations that optimize organisms for specific environmental conditions; in humans, evidence of this can be seen in the evolution of different racial groups. Evolving more attractive faces in this study requires the adoption of a heuristic that emulates the process of natural selection. The trait we seek to amplify is facial attractiveness. This cannot be achieved by simply morphing images randomly together as there is no selection pressure. The absence of selection pressure in any combinatorial schema would result in an image with average features. Therefore, we introduced a selection pressure into our algorithm that biased the digital "breeding" process toward selecting more attractive faces.
The basic algorithm is illustrated in Figure 2. First, faces are randomly selected from the parent generation of faces (P). Each face has an attractiveness score associated with it determined by the initial focus group evaluation (see above). Each generation of new faces has a mean, maximum, and minimum attractiveness score, which were produced by the initial evaluation group. In P, the initial focus group produced mean values trending toward a value of 5, and scores close to 1 (profoundly unattractive) or 10 (profoundly attractive) were nonexistent. Second, a random number generator (continuous uniform distribution) returns a value that lies between the minimum and maximum attractiveness score for the parent generation of faces. Thus, each P face has an attractiveness score and a random number associated with it. Third, each face's attractiveness score is compared to its paired random number. If the attractiveness score exceeds the value produced by the random number generator, then the face can be morphed with another face that also satisfies this condition (i.e., the face is fit to morph). Faces selected where the attractiveness score is less than the value produced by the random number generator do not go on to morph, though they may still be selected again later on.
It must be emphasized that "fitness" to morph or digitally breed is a function of the attractiveness score and the probability that this score exceeds a random number. Hence, unattractive faces can be selected for morphing if paired with a very low random number, and attractive faces may be rejected if paired with a random number higher than its attractiveness score. But to be sure, the bias is toward the best looking faces. The initial P generation consisted of 30 faces (see above). The algorithm executes until 30 new pairs of faces are generated. Notably, some faces in the original P generation may be represented more than once in this new parent breeding generation (P b ). Likewise, some faces may not be included within any of the 30 pairs in P b .
The thirty facial pairs in P b were then morphed to produce the first generation of synthetic faces (F1). F1 faces were then evaluated by the same initial focus group of evaluators, and attractiveness scores were obtained for these new synthetic faces. The genetic algorithm was run, and a subset of F1 faces was selected, namely the breeding cohort F1 b . The 60 F1 b faces were then morphed using the genetic algorithm to produce a new second generation of synthetic faces (F2). This process was repeated, producing a third (F3) and fourth (F4) generation. It must be emphasized that while breeding pairs are randomly selected, each face is subject to a selection pressure. The approach mimics the concept of a predator and prey inasmuch as the survival of the prey depends on the fitness of the predator as much as its own. Notably, as in nature, faces are not eliminated from potential "breeding"/morphing after selection and return to the facial "gene pool." Average attractiveness scores for each new generation (F1-F3) were calculated via evaluation by the initial focus group. As noted above, 2 to 3 weeks elapsed between the evaluation of each new generation, as that was the time required to produce high quality facial morphs.

Morphometric Measurements
All faces (P-F4) were scaled to the same size using software (Powerpoint, Microsoft, Redmond, WA) with the constraint that distance from the trichial line to the lowest point on the chin (menton) was identical on each image when ported into a Powerpoint slide. This served as a normalization factor. Then each slide of the Powerpoint file was printed using a color laser printer. Thirty-three linear measurements of specific linear features (Table I) on the face were measured. The location of these measurements is noted in the set of diagrams in Figure 3. To increase clarity, some symmetric measurements were not labeled (i.e., only left or right side features were labeled). The measured features are rudimentary and are derived from basic facial proportions described in most plastic surgery textbooks. A particular emphasis has been placed on the eyes and lips as qualitative trends were observed with each generation (see results section). Measurements were obtained using a digital micrometer (Mitutoyo-USA, Aurora, IL), and tabulated for each face.

Evaluation of All Generations Using Additional Focus Groups (Final Focus Groups)
All 150 images for the five generations of morphs were presented in random order to three distinct focus groups for evaluation. The order of the images was randomized. Focus groups consisted of: 1) undergraduate student volunteers (n ϭ 44); 2) attending surgeons, fellows, and residents in the Department of Otolaryngology-Head and Neck Surgery at The University of California Irvine (n ϭ 12); and 3) cosmetology school students at a local beauty school (n ϭ 44). The undergraduate students were selected because their age distribution was similar to that of the study subjects, and they were readily accessed through an experimental psychology research participation pool offered by the School of Social Sciences at our institution. The latter two groups were selected as they were thought to have some formal expertise with respect to facial analysis. The undergraduates and the cosmetology students did not know that all of the images were synthetic. The surgeons were aware of image processing, but unaware of the precise details, algorithms, software, or intent of the study. For each of the three groups, images were presented on a projection screen using an LCD projector. In an effort to reduce arbitrary assignment of attractiveness scores,

B D C
Faces are randomly selected from the available pool Each visage has it's own intrinsic facial attractiveness score determined by focus group evaluations A random number is generated and paired with each selected face. The random number functions to introduce a selection pressure into the morphing process and biases selection toward more attractive faces.
In general, attractive faces have a higher probability of success in terms of morphing. Less attractive faces still may morph, though probability of this occurring is much lower If the attractiveness score exceeds the selected random number, the face is used for morphing in the next generation. If the attractiveness score is less than the selected random number, it is rejected. a visual analogue scale was presented before the actual scoring commenced. The visual analogue scale was produced by constructing a montage of faces for each point on the scale from 1 to 10. Source images for the scale were taken from each of the visual analogue scales developed by the original evaluating group described above. The morphed images were then presented, and evaluators recorded their attractiveness scores on a score sheet. There were no incomplete score sheets as each evaluator scored all 150 faces.

Statistical Methods
Univariate analysis was performed for data collected from each of the secondary rater groups (undergraduates, surgeons, and cosmetologists). For each rater group, the average beauty score for each of the 150 faces was computed, the distribution of average beauty scores was examined, and descriptive statistics were computed (median, mean, standard deviation, minimum, and maximum). Similarly, descriptive statistics were computed to examine the distribution across the 150 faces of quantitative measurements for each of 32 quantitative characteristics.
Pairwise, correlations of average beauty scores between the pairs of secondary rater groups were assessed using Pearson's correlation coefficient. Within each rater group, for each facial characteristic the correlation between the average beauty score and quantitative measurement was assessed. The objective was to find the measurements that have the highest or lowest correlations with average beauty score. The multivariate method, stepwise linear regression, was used to select the set of quantitative characteristics most predictive of average beauty score. Because of the high correlation between beauty scores for rater groups, the average scores from 100 raters were analyzed. Criteria for variable selection included assessment of the multiple correlation coefficient and application of a significance level of .05 for variable entry and retention. The objective was to choose the model with the highest multiple correlation coefficients with statistically significant coefficients for all predictors in the model. Figure 4, A-E are montages of the 30 faces created for each generation. Notably, in the parent generation, the faces are heterogeneous in distinct contrast to the later generations where there is profound convergence of features. The P and F1 generations demonstrate diversity with respect to most facial features. Faces are asymmetric and there is a wide variation in facial shape and proportion. With each successive generation, symmetry becomes more prevalent and clarity of skin increases, which is a product of image averaging. In the F3 and F4 generations (Fig. 4, D-E), oval faces clearly predominate; lips are fuller, and eyebrows more distinct and arched. There is significant similarity in terms of the size and shapes of the lips, nose, and eyes. All faces are symmetrical, and the brow shape is arched and nearly identical to the "ideal" brow shape described in most plastic surgery and cosmetology texts. Notably, in the P (Fig. 4A) and F1 (Fig. 4B) images, some semblance of ethnic diversity is maintained, but the repetitive morphing process eliminates this with successive generations. Figure 5 depicts the average attractiveness scores for each generation, P-F3 (white bars) and the average attractiveness scores of the subset of faces that were selected by the algorithm for morphing P b -F3 b (darker, shaded bars), as determined by the original initial focus group of 17 trained student evaluators. In each successive generation and its corresponding breeding cohort, attractiveness scores increase each generation through the F3 generation. Notably, the standard deviation (SD) bars narrow slightly, thus further underscoring the convergence of features observed in Figure 4 D-E above. The initial focus group did not evaluate the F4 generation, as there was no intent to morph/breed an F5 generation. Figure 6 depicts the average attractiveness scores for P-F4 (white bars) and the average attractiveness scores of the subset of faces that were selected by the algorithm for morphing P b -F4 b (darker, shaded bars), as determined by the final focus group, which did evaluate the final generation (F4). The data represents the average attractiveness scores of all three final evaluation focus groups (undergraduates, surgeons, and cosmetology students) whose results were pooled for this analysis. Attractiveness scores increased with each generation. Notably, the average score for the P generation in Figure 6 was significantly lower than that of the initial student evaluator group illustrated in Figure 5. Histograms for each generation (Fig. 7 A-E) show the distribution of attractiveness scores and demonstrate the dramatic shift in terms of average score, but also show movement of the median and alteration in the skew. Each histogram shows the frequency of each score 1-10 for a specific generation (total of approximately 3,000 votes per generation). The observation that the subset of faces which went on to morph or digitally "breed" in each case had a slightly higher beauty score for each generation, demonstrating the effect of introducing a selection pressure into the genetic algorithm. The most and least attractive face for each generation is depicted along with the corresponding score in Figure 8. Of note,  Table I. Note, not all symmetric measurements are labeled to preserve clarity.  Table I. It must be emphasized that these are relative measurements in arbitrary units and are measured from images that have all been scaled so that the distance from the trichial line to the lowest point of the chin is the same in each image. Since it is generally acknowledged that there is at least a loose relationship between facial proportions or distances and attractiveness, statistical analysis was performed to determine whether any relationships existed between any  of these measured values and attractiveness score. The final focus group evaluations were used for this analysis. Within rater groups, statistically significant correlations between average beauty score and quantitative measurements were identified only for three facial characteristics (nose width, right eyebrow peak, and upper left Cupid's bow), and these were notable for weak correlations (see Table II). Surprisingly, no correlations were identified for the more germane measurements such as nasal height, facial thirds, facial fifths etc. On the basis of 149 faces, pairwise Pearson correlation coefficients for average beauty score varied from 0.964 to 0.972 when pairs of the three rater groups were compared, strongly indicating that these groups define and evaluate beauty similarly despite different training and professional background.

RESULTS
Characteristics most predictive of average facial attractiveness score were selected using stepwise linear regression analysis. The model with three characteristics, the height of the upper left Cupid's bow (P ϭ .002), the height of the right eyebrow arch (P ϭ .032), and the height of the right eyebrow at its most medial point (P ϭ 0.031), was significant (overall model F-value, 0.0003). The multiple correlation coefficient for this model was 0.12, indicating that 12% of the variability in average facial attractiveness score was explained by the regression on these predictors.

DISCUSSION
In this study, realistic appearing synthetic facial images were created using morphing software across all generations. The key facial elements (eyes, lips, etc.) were distinctly preserved through each generation, though normal features of human skin such as blemishes, nevi, and acne were averaged out. Overall, with each successive generation, faces became more symmetric, and the overall appearance of the faces assumed a more multi-racial appearance with honey-colored skin, intermediate facial features, in contrast to previous reported studies, which usually focused on subjects of only European extraction. Likewise, vestiges of frank ethnicity drop out with the F2 generation. This investigation did not seek to examine or eliminate the impact of ethnicity in either the subject population that was photographed and used to produce morphs or in the focus groups used to score each synthetic image. Ongoing work in the lead researcher's group is currently focused on examining these factors. Parallel studies will examine only morphs derived from individuals of European heritage.
The montages (Fig. 4) demonstrate that the similarity in facial features increased with each successive generation. This suggests that the algorithm iterates to a solution or stable point, even when using the small sample size (n ϭ 30) employed in this study. The observed general trends include a greater prevalence of oval shaped faces, fuller lips with distinct Cupid's bows, and defined and arched eyebrows. The nose does not form a prominent feature on any of the later generation faces. The improvement in overall attractiveness is supported by both the increase in average attractiveness score with each generation ( Fig. 5 and Fig. 6) and the shift and change in median values and skew for the corresponding histograms (Fig. 7). The increase in the average value of more than one attractiveness point is a profound shift and indicative of the impact of using a genetic algorithm focused on  cultivating beauty. The reduction in the SD of each average beauty score with each successive generation also underscores how this algorithm produces a modest degree of convergence as well.
On an individual basis, each F3 and F4 face is distinct and unique, but when examined collectively as in the montages, patterns and trends do emerge. The faces look eerily similar as they share virtually identical facial shape, lip fullness, nasal contour, and brow shape. This effect can, in part, be attributed to the innate averaging process that occurs with morphing 19 in combination with the selection pressure exerted on the population using the genetic algorithm. Likewise, the use of a small parent population of only 30 faces can limit diversity and bias results. In such a small population, the impact of one or two extremely attractive faces may have a significant impact. For example, the F3 cohort was the product of three generations of algorithm execution. One very highscoring F3 morph had the same "great-grandmother" on three separate branches of its family tree. In nature, classic examples of this effect are Darwin's finches where unique selection pressures, very small populations, isolation, and time led to very distinct species occupying unique niches.
The statistical analysis reveals some interesting quantitative trends. The attractiveness scores produced by final focus group (undergraduates, surgeons, and cosmetology students) correlated very well with one another, indicating general agreement in terms of what each group defines as facial attractiveness. Experts (surgeons and cosmetologists) rated the morphs the same as lay persons (undergraduates). The significant correlations between attractiveness score and the three facial measurements listed in Table II are intriguing. Nasal width, the height of the right eyebrow arch, and the height of the left lower lip within the saggital plane of the Cupid's bow peak all negatively correlated with facial attractiveness score. The identification of nasal width as a key factor agrees with what one would intuitively believe, namely that narrower noses are more attractive. The identification of laterality with respect to brow and lip dimensions is perplexing. In this setting, asymmetry of the brow and lip in a morphed face may be attractive over a face in symmetric repose because of some subtle cue related to a suggestive facial expression. 20 However, with the small sample size, this finding may be spurious and related to asymmetry in extremely attractive faces in the P generation, and the effect of this finding propagating with each successive generation. Surprisingly, facial attractiveness did not correlate with more traditional measures such as facial thirds or fifths.
The low magnitude of these correlations may be a consequence of several factors, including: 1) the small number of faces used in each generation; 2) the trend that images tended to converge in appearance with each successive generation; 3) the fact that only linear measurements were recorded; and 4) the fact that images were scaled to the same relative dimensions based on facial height, rather than focusing on absolute measurements. The first two points are important as with each successive generation, faces look more and more similar and attractiveness scores are high. There simply is less spread in the data than compared to a study with 150 randomly selected faces.
The morphing process at this time remains a laborintensive endeavor requiring 30 to 60 minutes for each pair of images, and there is a substantial learning curve. Significant diligence is required around the eyes, eyelids, brows, and lips to achieve realistic morphs. Optimal morph construction requires attention to detail when constructing templates, and templates are best drawn using a very large monitor in combination with a digital graphics table, which affords finer control than either just a mouse or touchpad. The morphing process itself introduces artifacts in that average features result in increasing the clarity of skin, removing blemishes, altering color, and increasing symmetry. Also, in later generation morphs (i.e., F3 and F4), the features of the face are less sharp and distinct as if they are photographed using soft lighting and a diffusion filter. This effect alone may introduce a small bias in these later generations.
The similarity in appearance observed in the F3 and F4 generations may be a consequence of the algorithm iterating toward what might be considered the ideal face as determined by the initial focus group of evaluators. These 17 evaluators scored each of the faces, and these scores are the basis for the selection pressure within the genetic algorithm. It is important to note that the scores of these 17 evaluators were used to generate the selection pressure, and that it was critical to use the same 17 for each successive generation. However, the small sample size of faces (n ϭ 30) used in this study may introduce a bias to these results. Premature convergence to a nonoptimal phenotype or, in mathematical parlance, convergence to a local maximum, may occur due to this sample size. Expanding the population to a larger number such as 300 subjects might aid in clarifying whether this indeed has occurred and is the focus of our ongoing investigations.
The genetic algorithm is not only limited by the size of the sample populations, but also by the biases intrinsic to the focus group used to assign attractiveness scores. In this study, the initial focus group determined "genetic" fitness. Ideally, focus group size should be massive to provide better reliability of scores. In this study, the initial focus group consisted of students enrolled in a seminar taught by the lead author. The disadvantage was: 1) the size of this group (n ϭ 17); 2) the preponderance of women in the group; and 3) the fact that the ethnic composition of the evaluator group was not identical to the subject populations. The advantages were: 1) the same 17 evaluators saw each new generation of faces; 2) the evaluators each had an individual visual analogue scale to aid in maintaining reproducibility of their scores; 3) to some degree, evaluators attempted to spread their scores across the scale in a logical rather than arbitrary manner; and 4) in theory, as the students were in a seminar focused on beauty and esthetic surgery, they have a more erudite approach to gauging attractiveness. By necessity, having the same group of evaluators is critical as 30 morphs take numerous man-hours to generate and a delay of 1 to 2 weeks was needed to morph each new generation. The second or final focus group that evaluated all 150 images at one setting were likely more prone to arbitrary scoring of faces and being randomly more or less charitable in attractiveness assessment (i.e., calling a modestly unattractive face a "1" and an attractive face a "10"). Currently we have ongoing investigations that compensate for focus group size and composition effects by using a novel Webbased approach to overcome these limitations.
Planned investigations will focus on increasing the sample population by a factor of 10, and also normalizing all facial dimensions with respect to the interpupillary distance. Regardless, there are numerous methods used to measure the face, many of which are more complex than the simple measurements used in this study. Despite the relative paucity of statistical data, in general, symmetry, oval-shaped faces, defined and arched brows, full lips, and small non-prominent noses remained consistent features in the highest rated faces, and at least on inspection, the rule of thirds and fifths is generally preserved.
The genetic algorithm in this study used facial attractiveness as the fitness function. The genetic representation is the appearance of each facial image. The selection process is a fitness-proportional selection model (also known as roulette-wheel selection) and is stochastic in that a small proportion of less attractive faces reproduce/ morph in each round of algorithm execution. This approach enhances the diversity of the breeding populations, and presumably reduces the chance of premature convergence to a local maximum. Regardless, small sample sizes may still result in convergence to a local maxima rather than iterating toward a universal/global solution. There are obvious practical limitations in our approach in that only a microscopic subset of the U.S. female population is used, and time and manpower requirements reduce the number of morphs that can be created and the number of selection rounds in which to execute the algorithm. Perhaps one advantage of using a diverse multi-ethnic population in this study is that the facial features are quite diverse, enabling creation of "larger mutations" by morphing faces with very different features.
On the other hand, traditional approaches to identify "the perfect face" or defining facial beauty using quantitative methods have relied on correlating focus groups' facial attractiveness scores primarily with morphometric measurements. We propose that using our algorithm, a population of synthetic faces evolves and iterates toward at least a local maximum to provide a glimpse of the elusive perfect face.

CONCLUSIONS
The use of genetic algorithm in combination with morphing software and traditional focus group-derived attractiveness scores can be used to evolve attractive synthetic faces. We have demonstrated that the evolution of attractive faces can be mimicked in software. The approach creates a virtual "Galapagos," with beauty acting as the selection pressure. Genetic algorithms and morphing differ substantially from traditional methods that rely heavily on correlating attractiveness scores with a series of morphometric measurements, and in the end, do not produce an ideal composite attractive face. Clearly, to fully exploit the potential advantages of this approach, research will require the development of automated software algorithms to increase generation throughput, examination of lateral images, employment of larger basis populations, and examination of the impact of various demographic factors.