Label-Free Virtual HER2 Immunohistochemical Staining of Breast Tissue using Deep Learning

The immunohistochemical (IHC) staining of the human epidermal growth factor receptor 2 (HER2) biomarker is widely practiced in breast tissue analysis, preclinical studies, and diagnostic decisions, guiding cancer treatment and investigation of pathogenesis. HER2 staining demands laborious tissue treatment and chemical processing performed by a histotechnologist, which typically takes one day to prepare in a laboratory, increasing analysis time and associated costs. Here, we describe a deep learning-based virtual HER2 IHC staining method using a conditional generative adversarial network that is trained to rapidly transform autofluorescence microscopic images of unlabeled/label-free breast tissue sections into bright-field equivalent microscopic images, matching the standard HER2 IHC staining that is chemically performed on the same tissue sections. The efficacy of this virtual HER2 staining framework was demonstrated by quantitative analysis, in which three board-certified breast pathologists blindly graded the HER2 scores of virtually stained and immunohistochemically stained HER2 whole slide images (WSIs) to reveal that the HER2 scores determined by inspecting virtual IHC images are as accurate as their immunohistochemically stained counterparts. A second quantitative blinded study performed by the same diagnosticians further revealed that the virtually stained HER2 images exhibit a comparable staining quality in the level of nuclear detail, membrane clearness, and absence of staining artifacts with respect to their immunohistochemically stained counterparts. This virtual HER2 staining framework bypasses the costly, laborious, and time-consuming IHC staining procedures in laboratory and can be extended to other types of biomarkers to accelerate the IHC tissue staining used in life sciences and biomedical workflow.


Introduction
The immunohistochemical (IHC) staining of tissue sections plays a pivotal role in the evaluation process of a broad range of diseases.Since its first implementation in 1941 1 , a great variety of IHC biomarkers have been validated and employed in clinical and research laboratories for characterization of specific cellular events 2 , e.g., the nuclear protein Ki-67 associated with cell proliferation 3 , the cellular tumor antigen P53 associated with tumor formation 4 , and the human epidermal growth factor receptor 2 (HER2) associated with aggressive breast tumor development 5 .Due to its capability of selectively identifying targeted biomarkers, IHC staining of tissue has been established as one of the gold standards for tissue analysis and diagnostic decisions, guiding disease treatment and investigation of pathogenesis [6][7][8] .
Though widely used, the IHC staining of tissue still requires a dedicated laboratory infrastructure and skilled operators (histotechnologists) to perform laborious tissue preparation steps and is therefore timeconsuming and costly.Recent years have seen rapid advances in deep learning-based virtual staining techniques, providing promising alternatives to the traditional histochemical staining workflow by computationally staining the microscopic images captured from label-free thin tissue sections, bypassing the laborious and costly chemical staining process.Such label-free virtual staining techniques have been demonstrated using autofluorescence imaging 9,10 , quantitative phase imaging 11 , light scattering imaging 12 , among others [13][14][15] , and have successfully created multiple types of histochemical stains, e.g., hematoxylin and eosin (H&E) [9][10][11][12][13][14] , Masson's trichrome [9][10][11] , and Jones silver stains [9][10][11] .These previous works did not perform any virtual IHC staining and mainly focused on the generation of structural tissue staining, which enhances the contrast of specific morphological features in tissue sections.In a related line of research, deep learning has also enabled the prediction of biomarker status (e.g., Ki-67 quantification 16 ) and tumor prognostic from H&E-stained microphotographs of various malignancies including hepatocellular carcinoma 17 , breast cancer [18][19][20][21][22] , bladder cancer 23 , thyroid cancer 24,25 , and melanoma 26 .These studies highlight a possible correlation between the presence of specific biomarkers and morphological microscopic changes in the tissue; however, they do not provide an alternative to IHC stained tissue images that reveal sub-cellular biomarker information for pathologists' diagnostic inspection for e.g., inter-and intra-cellular signatures such as cytoplasmic and nuclear details 27 .
Here, we present a deep learning-based label-free virtual IHC staining method (Fig. 1), which transforms autofluorescence microscopic images of unlabeled tissue sections into bright-field equivalent images, matching the standard IHC stained images of the same tissue samples.In this study, we specifically focused on the IHC staining of HER2, which is an important cell surface receptor protein that is involved in regulating cell growth and differentiation 28,29 .Assessing the level of HER2 expression in breast tissue, i.e., HER2 status, is routinely practiced based on the HER2 IHC staining of the formalin-fixed, paraffin-embedded (FFPE) tissue sections, and helps predict the prognosis of breast cancer and its response to HER2-directed immunotherapies 5,[29][30][31][32][33] .For example, the intracellular and extracellular studies of HER2 have led to the development of pharmacological anti-HER2 agents that benefit the treatment of HER2positive tumors [34][35][36][37][38] .Further efforts are being made to develop new pharmacological solutions that can counter HER2-directed-drug resistance and improve treatment outcomes in clinical trials [39][40][41][42] .With numerous animal models established for preclinical studies and life sciences related research, a deeper understanding of the oncogene, biological functionality, and drug resistance mechanisms of HER2 is being explored [43][44][45][46][47] .In addition to these, HER2 biomarker was also used as an essential tool in developing and testing of novel biomedical imaging 48,49 , statistics 50 , and spatial transcriptomics 51 methods.
The presented virtual HER2 staining method is based on a deep learning-enabled image-to-image transformation, using a conditional generative adversarial network (GAN), as shown in Fig. 2. Once the training phase was completed, two blinded quantitative studies were performed using new breast tissue sections with different HER2 scores to demonstrate the efficacy of our virtual HER2 staining framework.
For this purpose, we used the semi-quantitative Dako HercepTest scoring system 52 , which involves assessing the percentage of tumor cells that exhibit membranous staining for HER2 along with the intensity of the staining.The results are reported as 0 (negative), 1+ (negative), 2+ (weakly positive/equivocal), and 3+ (positive).In the first study, three board-certified breast pathologists blindly graded the HER2 scores of virtually stained HER2 whole slide images (WSIs) as well as their IHC stained standard counterparts.Our results and the statistical analysis revealed that determining the HER2 status based on our virtual HER2 WSIs is as accurate as standard analysis based on the chemicallyprepared IHC HER2 slides.In the second study, the same pathologists rated the staining quality of both virtual HER2 and standard IHC HER2 images using different metrics, i.e., nuclear detail, membrane clearness, background staining, and staining artifacts.This study revealed that at least two pathologists out of the three agreed that there is no statistically significant difference between the virtual HER2 staining image quality and the standard IHC HER2 staining image quality in the level of nuclear detail, membrane clearness, and absence of staining artifacts.
The presented framework achieved the first demonstration of label-free virtual IHC staining, and bypasses the costly, laborious, and time-consuming IHC staining procedures that involve toxic chemical compounds.This virtual HER2 staining technique has the potential to be extended to virtual staining of other biomarkers and may accelerate the IHC-based tissue analysis workflow in life sciences and biomedical applications, while also enhancing the repeatability and standardization of IHC staining.

Label-free virtual HER2 staining of breast tissue
We demonstrated our virtual HER2 staining method by training deep neural network (DNN) models with a dataset of 25 breast tissue sections collected from 19 unique patients, constituting in total 20,910 image patches, each with 1024×1024 pixels.Once a DNN model was trained, it virtually stained the unlabeled tissue sections using their autofluorescence microscopic images captured with DAPI, FITC, TxRed, and Cy5 filter cubes (see Methods section), matching the corresponding bright-field images of the same fieldof-views, captured after standard IHC HER2 staining.In the network training and evaluation process, we employed a cross-validation approach.Separate network models were trained with different dataset divisions to generate 12 virtual HER2 WSIs for blind testing, i.e., 3 WSIs at each of the 4 HER2 scores (0, 1+, 2+, and 3+).Each virtual HER2 WSI corresponds to a unique patient that was not used during the network training phase.Note that all the tissue sections were obtained from existing tissue blocks, where the HER2 reference (ground truth) scores were provided by UCLA Translational Pathology Core Laboratory (TPCL) under UCLA IRB 18-001029.Fig. 3 summarizes the comparison of the virtual HER2 images inferred by our DNN models against their corresponding IHC HER2 images captured from the same tissue sections after standard IHC staining.
Both the WSIs and the zoomed-in regions show a high degree of agreement between virtual staining and standard IHC staining.These results indicate that a well-trained virtual staining network can reliably transform the autofluorescence images of unlabeled breast tissue sections into the bright-field equivalent, virtual HER2 images, which match their IHC HER2 stained counterparts, across all the HER2 statuses, 0, 1+, 2+, and 3+.Upon close examination, our board-certified pathologists confirmed that the comparison between the IHC and virtual HER2 images showed equivalent staining with no significant perceptible differences in intracellular features such as membrane clarity or nuclear details.In particular, the virtual staining network clearly produced the expected intensity and distribution of membranous HER2 staining (or lack thereof) in tumor cells.In HER2 positive (3+, Figs.3a-e) breast cancers, both virtually stained and IHC stained images showed strong complete membranous staining in >10% of tumor cells, as well as dim cytoplasmic staining in tumor cells.None of the stromal and inflammatory cells showed false positive staining and the nuclear details of the tumor cells were comparable in both panels.In equivocal (2+, Figs.3f-j) tumors, virtual images showed weak to moderate membranous staining in >10% of tumor cells, providing the same amount of membranous staining of tumor cells in corresponding areas.HER2 negative (1+, Figs.3k-o) tumors showed faint membranous staining in 10% or more of tumor cells.None of the stromal and inflammatory cells showed faint staining.HER2 negative (0, Figs.3p-t) tumor showed no staining in the tumor cells.

Blind evaluation and quantification of virtual HER2 staining
Next, we evaluated the efficacy of the presented virtual HER2 staining framework with a quantitative blinded study in which the 12 virtual HER2 WSIs and their corresponding standard IHC HER2 WSIs were mixed and presented to three board-certified breast pathologists who graded the HER2 score (i.e., 3+, 2+, 1+, or 0) for each WSI without knowing if the image was from a virtual stain or standard IHC stain.Random image shuffling, rotation, and flipping were applied to the WSIs to promote blindness in evaluations.The HER2 scores of the virtual and the standard IHC WSIs that were blindly graded by the three pathologists are summarized in Fig. 4 and compared to their reference, ground truth scores provided by UCLA TPCL.The confusion matrices of virtual HER2 WSIs (Fig. 4a) and IHC HER2 WSIs (Fig. 4b), each corresponding to N=36 evaluations, reveal that our virtual HER2 staining approach achieved a similar level of accuracy for HER2 status assessment as the standard IHC staining.Close examination of these confusion matrices reveals that the sum of the diagonal elements of the virtual HER2-based evaluations ( 22) is higher than that of the IHC HER2 (19), showing that more cases were correctly scored based on virtual HER2 WSIs compared to those based on standard IHC HER2 WSIs.Furthermore, the sum of the absolute off-diagonal errors of virtual HER2-based evaluations ( 14) is smaller than that of the standard IHC HER2 (18).Based on the same confusion matrices shown in Fig. 4, a chi-square test was performed to compare the degree of agreement between virtual staining and standard IHC staining methods in HER2 scoring.The test results indicate that there is no statistically significant difference between the two methods (P=0.4752,see Supplementary Table 1).
In addition to evaluating the efficacy of virtual staining in HER2 scoring, we also quantitatively evaluated the staining quality of the virtual HER2 images and compared them to the standard IHC HER2 images.In this blinded study, we randomly extracted 10 regions-of-interest (ROIs) from each of the 12 virtual HER2 WSIs and 10 ROIs at the same locations from each of their corresponding IHC HER2 WSIs, building a test set of 240 image patches.Each image patch has 8000×8000 pixels (1.3×1.3 mm 2 ), which was also randomly shuffled, rotated, and flipped before being reviewed by the same three pathologists.These pathologists were asked to grade the image quality of each ROI based on four pre-designated feature metrics for HER2 staining: membrane clearness, nuclear detail, absence of excessive background staining, and absence of staining artifacts (Fig. 5).The grade scale for each metric is from 1 to 4, with 4 representing perfect, 3 representing very good, 2 representing acceptable, and 1 representing unacceptable.Fig. 5a summarizes the staining quality scores of virtual HER2 and standard IHC HER2 images based on our pre-defined feature metrics, which were averaged over all image patches and pathologists.Figs.5b-e further compare the average quality scores at each of the 4 HER2 statuses under each feature metric.In Fig. 5b, the membrane clearness scores of HER2 negative ROIs are noted as "not applicable" since there is no staining of the cell membrane in HER2 negative samples.It is important to emphasize that, the standard IHC HER2 images had an advantage in these comparisons because they were pre-selected: a significant percentage of the standard IHC HER2 tissue slides suffered from unacceptable staining quality issues (see Discussion and Supplementary Fig. 1), and therefore they were excluded from our comparative studies in the first place.Nevertheless, the quality scores of virtual and standard IHC HER2 staining are very close to each other and fall within their standard deviations (dashed lines in Fig. 5).We also performed one-sided t-tests on each feature metric evaluated by board-certified pathologists to determine whether standard IHC HER2 images are statistically significantly better than the virtual HER2 images in staining quality.The t-test results showed that only for the metric of 'absence of excessive background staining', two of the three pathologists reported a statistically significant improvement in the quality of the standard IHC staining compared to the virtual staining.For the rest of the feature metrics (i.e., nuclear details, membrane clearness, and staining artifacts), at least two of the three pathologists reported that the staining quality of the IHC HER2 images is not statistically significantly better than their virtual HER2 counterparts (Supplementary Table 2).Also note that the virtually stained HER2 images did not mislead the diagnosis at the whole slide level as also analyzed using the confusion matrices shown in Fig. 4 and the chi-square test reported in Supplementary Table 1.
Besides rating the staining quality of each ROI, the pathologists also graded a HER2 score for each ROI, the results of which are reported in Supplementary Fig. 2. Each histogram in Supplementary Fig. 2a summarizes the HER2 scores of the 10 ROIs extracted from each WSI evaluated by 3 pathologists (i.e., N=30 evaluations).The reference (ground truth) HER2 scores of the corresponding WSIs are plotted as gray dashed lines.This analysis reveals that, for the majority of the patients, there is no discrepancy between HER2 scores evaluated from virtually generated ROIs and standard IHC stained ROIs.For the cases where there is a disagreement (e.g., Patients #5 and #11), the histograms of the virtual HER2 scores were centered closer to the reference HER2 scores (dashed lines) compared to the histograms of the standard IHC-based HER2 scores.It is important to also note that grading the HER2 scores from subsampled ROIs vs. from the WSI can yield different results due to the inhomogeneous nature of the tissue sections.

Discussion
We demonstrated a deep learning-enabled label-free virtual IHC staining method.By training a DNN model, our method generated virtual HER2 images from the autofluorescence images of unlabeled tissue sections, matching the bright-field images captured after standard IHC-staining.Compared to chemically performing the IHC staining, our virtual HER2 staining method is rapid and simple to operate.The conventional IHC HER2 staining involves laborious sample treatment steps demanding a histotechnologist's periodic monitoring (see Supplementary Note 1), and this whole process typically takes one day before the slides can be reviewed by diagnosticians.In contrast, the presented virtual HER2 staining method bypasses these laborious and costly steps, and generates the bright-field equivalent HER2 images computationally using the autofluorescence images captured from label-free tissue sections.After the training is complete (which is a one-time effort), the entire inference process using a virtual staining network only takes ~12 seconds for 1 mm 2 of tissue using a consumer-grade computer, which can be further improved by using faster hardware acceleration units.
Another advantage of the presented method is its capability of generating highly consistent and repeatable staining results, minimizing the staining variations that are commonly observed in standard IHC staining.
The IHC HER2 staining procedure is delicate and laborious as it requires accurate control of time, temperature, and concentrations of the reagents at each tissue treatment step; in fact, it often fails to generate satisfactory stains.In our study, ~30% of the sample slides were discarded because of unsuccessful standard IHC staining and/or severe tissue damage even though the IHC staining was performed by accredited pathology labs.Supplementary Fig. 1 shows two examples of the standard IHC staining failures we experienced, including complete tissue damage and false negative staining that failed to reflect the correct HER2 score.In contrast, our computational virtual staining approach does not rely on the chemical processing of the tissue and generates reproducible results, which is important for the standardization of the HER2 interpretation by eliminating commonly experienced staining variations and artifacts.
Since the autofluorescence input images of tissue slices were captured with standard filter sets installed on a conventional fluorescence microscope, the presented approach is ready to be implemented on existing fluorescence microscopes without hardware modifications or customized optical components.
Our results showed that the combination of the four commonly used fluorescence filters (DAPI, FITC, TxRed, and Cy5) provided a very good baseline for the virtual HER2 staining performance.As an ablation study, we also quantitatively compared virtual staining networks that are trained with different autofluorescence input channels by calculating peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) 53 between the network output and ground truth images (see Supplementary Fig. 3).Since the staining of the cell membrane is an important assessment factor in HER2 status evaluation, we also performed color deconvolution 54 to split out the membrane stain channel (i.e., diaminobenzidine, DAB stain) followed by calculating and comparing the SSIM scores (Supplementary Fig. 4).These analyses revealed that the performance of the virtual staining network partially degraded with decreasing number of input autofluorescence channels, motivating the use of DAPI, FITC, TxRed, and Cy5 altogether (Supplementary Fig. 3b).
The success of our virtual HER2 staining method relies on the processing of the complex spatial-spectral information that is encoded in the autofluorescence images of label-free tissue using convolutional neural networks.The presented virtual staining method can potentially be expanded to a wide range of other IHC stains.Though our virtual HER2 staining framework was demonstrated based on autofluorescence imaging of unlabeled tissue sections, other label-free microscopy modalities may also be utilized for this task, such as holography 11 , fluorescence lifetime imaging 55,56 and Raman microscopy 57 .In addition to generalizing to other types of IHC stains in the assessment of various biomarkers, this method can be further adapted to non-fixed fresh tissue samples or frozen sections, which can potentially provide realtime virtual IHC images for intraoperative consultation during surgical operations.
To the best of our knowledge, this is the first demonstration of label-free virtual IHC staining, and we believe that it opens up new avenues for various applications in life sciences and biomedical diagnostics and can potentially transform the traditional IHC staining workflow.

Sample preparation and standard IHC staining
The unlabeled breast tissue blocks were provided by the UCLA TPCL under UCLA IRB 18-001029 and were cut into 4 μm thin sections.The FFPE thin sections were then deparaffinized and covered with glass coverslips.After acquiring the autofluorescence microscopic images, the unlabeled tissue sections were sent to accredited pathology labs for standard IHC HER2 staining, which was performed by UCLA TPCL and the Department of Anatomic Pathology of Cedars-Sinai Medical Center in Los Angeles, USA.The IHC HER2 staining protocol provided by UCLA TPCL is described in Supplementary Note 1.

Image data acquisition
The autofluorescence images of the unlabeled tissue sections were captured using a standard fluorescence microscope (IX-83, Olympus) with a ×40/0.95NA(UPLSAPO, Olympus) objective lens.Four fluorescent filter cubes, including DAPI (Semrock DAPI-5060C-OFX, EX 377/50 nm, EM 447/60 nm), FITC (Semrock FITC-2024B-OFX, EX 485/20 nm, EM 522/24 nm), TxRed (Semrock TXRED-4040C-OFX, EX 562/40 nm, EM 624/40 nm), and Cy5 (Semrock CY5-4040C-OFX, EX 628/40 nm, EM 692/40 nm) were used to capture the autofluorescence images at different excitation-emission wavelengths.Each autofluorescence image was captured with a scientific complementary metal-oxide-semiconductor In this work, a GAN-based network model 62 was employed to perform the transformation from the 4channel label-free autofluorescence images (DAPI, FITC, TxRed, and Cy5) to the corresponding brightfield virtual HER2 images, as shown in Fig. 2.This GAN framework includes (1) a generator network that creates virtually stained HER2 images by learning the statistical transformation between the input autofluorescence images and the corresponding bright-field IHC stained HER2 images (ground truth), and (2) a discriminator network that learns to discriminate the virtual HER2 images created by the generator from the actual IHC stained HER2 images.The generator and the discriminator were alternatively optimized and simultaneously improved through this competitive training process.Specifically, the generator (G) and discriminator (D) networks were optimized to minimize the following loss functions: where (•) represents the generator inference, (•) represents the probability of being a real, actuallystained IHC image predicted by the discriminator,  input denotes the input label-free autofluorescence images, and  target denotes the ground truth, standard IHC stained image.The coefficients (, , ) in  generator were empirically set as (10, 0.2, 0.5) to balance the pixel-wise smooth L 1 error 63 of the generator output with respect to its ground truth, SSIM loss 53 of the generator output, and the binary cross-entropy (BCE) loss of the discriminator predictions of the output image.Compared to using the mean squared error (MSE) loss, the smooth L 1 loss is a robust estimator that prevents exploding gradients by using MSE around zero and mean absolute error (MAE) in other parts 64 .Specifically, smooth L 1 loss between two images  and  is defined as: where  and  are the pixel indices, the  ×  represents the total number of pixels in each image. was set to 1 in our case.
The SSIM of two images is defined as 53 : where   and   are the mean values of the images  and ,   2 and   2 are the variance of images  and , and   is the covariance between images  and . 1 and  2 were set to be 0.01 2 and 0.03 2 , respectively 53 .
The BCE with logits loss used in our network is defined as: where  represents the discriminator predictions and  represents the actual labels (0 or 1).
As shown in Fig. 2a, the generator network was built following the attention U-Net architecture 65  The down-sampling path contains four down-sampling convolutional blocks, each consisting of a twoconvolutional-layer residual block, followed by a leaky rectified linear unit 66 (Leaky ReLU) with a slope of 0.1, and a 2×2 max-pooling operation with a stride size of 2 for down-sampling.The twoconvolutional-layer residual blocks contain two consecutive convolutional layers with a kernel size of 3×3 and a convolutional residual path 67 connecting the in and out tensors of the two convolutional layers.
The numbers of the input channels and the output channels at each level of the down-sampling path were set to 4, 64, 128, 256, and 64, 128, 256, 512, respectively.
Symmetrically, the up-sampling path contains four up-sampling convolutional blocks with the same design as the down-sampling convolutional blocks, except that the 2× down-sampling operation was replaced by a 2× bilinear up-sampling operation.The input of each up-sampling block is the concatenation of the output tensor from the previous block with the corresponding feature maps at the matched level of the down-sampling path passing through the attention gated connection.An attention gate consists of three convolutional layers and a sigmoid operation, which outputs an activation weight map highlighting the salient spatial features 65 .The numbers of the input channels and the output channels at each level of the up-sampling path were 1024, 1024, 512, 256, and 1024, 512, 256, 128, respectively.
Following the up-sampling path, a two-convolutional layer residual block together with another single convolutional layer reduces the number of channels to 3, matching that of our ground truth images (i.e., 3channel RGB images).Additionally, a two-convolutional-layer center block was utilized to connect and match the dimensions of the down-sampling path and the up-sampling path.
The structure of the discriminator network is illustrated in Fig. 2b.An initial block containing one convolutional layer followed by a Leaky ReLU operation first transformed the 3-channel generator output or ground truth image to a 64-channel tensor.Then, five successive two-convolutional-layer residual blocks were added to perform 2× down-sampling and expand the channel numbers of each input tensor.
The 2× down-sampling was enabled by setting the stride size of the second convolutional layer in each block as 2. After passing through the five blocks, the output tensor was averaged and flattened to a onedimensional vector, which was then fed into two fully connected layers to obtain the probability of the input image being the standard IHC-stained image.
Training set: images from the remaining WSIs (~80%).The network models were optimized using image patches of 256×256 pixels, which were randomly cropped from the images of 1024×1024 pixels in the training dataset.An Adam optimizer with weight decay 68 was used to update the learnable parameters at a learning rate of 1×10 -4 for the generator network and 1×10 -5 for the discriminator network, with a batch size of 28.The generator/discriminator update frequency was set to 2:1.Finally, the best model was selected based on the best MSE loss, assisted with the visual assessment of the validation images.The networks converged after ~120 hours of training.

Implementation details
The image preprocessing was implemented in MATLAB using version R2018b (MathWorks).The virtual staining network was implemented using Python version 3.9.0 and Pytorch version 1.9.0.The training was performed on a desktop computer with an Intel Xeon W-2265 central processing unit (CPU), 64 GB random-access memory (RAM), and an Nvidia GeForce GTX 3090 graphics processing unit (GPU).

Blind evaluation of HER2 images
For the evaluation of WSIs, 24 high-resolution WSIs were randomly shuffled, rotated, and flipped, and uploaded to an online image viewing platform that was shared with three board-certified pathologists to blindly evaluate and score the HER2 status of each WSI using the Dako HercepTest scoring system 52 .For the evaluation of sub-ROI images, the 240 image patches were randomly shuffled, rotated, and flipped, and uploaded to an online image sharing platform GIGAmacro (https://www.gigamacro.com/).The pathologists' blinded assessments are provided in Supplementary Data 1.

with 4 resolution
levels, which can map the label-free autofluorescence images into the HER2 stained images by learning the transformations of spatial features at different spatial scales, catching both the high-resolution local features at shallower levels and the larger scale global context at deeper levels.Our attention U-Net structure is composed of a down-sampling path and an up-sampling path that are symmetric to each other.

Figure 1 .
Figure 1.Virtual HER2 staining of unlabeled tissue sections via deep learning.a, The standard immunohistochemical (IHC) HER2 staining (top) relies on tedious and costly tissue processing performed by histotechnologists, which typically takes ~1 day.A pre-trained deep neural network enables virtual HER2 staining of unlabeled tissue sections (bottom).b, Virtual HER2 staining transforms autofluorescence images of unlabeled tissues sections into bright-field equivalent images that match the images of standard IHC HER2 staining.

Figure 2 .
Figure 2. Virtual HER2 staining network.A GAN framework which consists of a generator model and a discriminator model was used to train the virtual HER2 staining network.a, The generator uses an attention-gated U-net structure to map the label-free autofluorescence images into bright-field equivalent HER2 images.b, The discriminator is a CNN composed of five successive two-convolutional-layer residual blocks and two fully connected layers (see Methods).Once the network models converge, only the generator model is used to infer the virtual HER2 images, which takes ~12 seconds for 1 mm 2 of tissue area.

Figure 3 .
Figure 3.Comparison of virtual and standard IHC HER2 staining of breast tissue sections at different HER2 scores.a, f, k, p, Bright-field WSIs of standard IHC HER2 stained samples at a HER2 3+, f HER2 2+, k HER2 1+, and p HER2 0. b, g, l, q, Bright-field WSIs generated by virtual staining, corresponding to the same samples as a, f, l, p respectively.c1-e1, c2-e2, Zoomed-in regions of interest from a, b at a HER2 score of 3+.h1-j1, h2-j2, Zoomed-in regions of interest from f, g at a HER2 score of 2+.m1-o1, m2-o2, Zoomed-in regions of interest from k, l at a HER2 score of 1+.r1-t1, h2-t2, Zoomedin regions of interest from p, q at a HER2 score of 0.

Figure 4 .
Figure 4. Confusion matrices of HER2 scores.Each element in the matrices represents the number of WSIs with their HER2 scores evaluated by board-certified pathologists (rows) based on: a virtual HER staining or b standard IHC HER2 staining, compared to the reference (ground truth) HER2 scores provided by UCLA TPCL (columns).

Figure 5 .
Figure 5. Comparisons of image quality of virtual HER2 and standard IHC HER2 staining.a, Quality scores of virtual HER2 and standard IHC HER2 images calculated based on 4 different feature metrics: nuclear details, absence of staining artifacts, absence of excessive background staining, and membrane clearness.Each value was averaged over all the image patches and pathologists.b-e, Detailed comparisons of quality scores under each feature metric at different HER2 scores.The grade scale applied for each metric is 1 to 4: 4 for perfect, 3 for very good, 2 for acceptable, and 1 for unacceptable.The standard deviations are plotted by dashed lines.
The full image dataset contains 25 WSIs from 19 unique patients, making a set of 20,910 image patches, each with a size of 1024×1024 pixels.For the training of each virtual staining model used in our crossvalidation studies, the dataset was divided as follows: (1) Test set: images from the WSIs of 1-2 unique patients (~10%, not overlapped with training or validation patients); after splitting out the test set, the remaining WSIs were further divided to (2) Validation set: images from 2 of the WSIs (~10%), and