Automatic Segmentation of the Right Ventricle from Cardiac MRI Using a Learning-Based Approach

Purpose: This study aims to accurately segment the right ventricle (RV) from cardiac MRI using a fully automatic learning-based method. Methods: The proposed method uses deep learning algorithms, i.e., convolutional neural networks and stacked autoen-coders, for automatic detection and initial segmentation of the RV chamber. The initial segmentation is then combined with the deformable models to improve the accuracy and robustness of the process. We trained our algorithm using 16 cardiac MRI datasets of the MICCAI 2012 RV Segmentation Challenge database and validated our technique using the rest of the dataset (32 subjects). Results: An average Dice metric of 82.5% along with an average Hausdorff distance of 7.85mm were achieved for all the studied subjects. Furthermore, a high correlation and level of agreement with the ground truth contours for end-diastolic volume (0.98), end-systolic volume (0.99), and ejection fraction (0.93) were observed. Conclusion: Our results show that deep learning algorithms can be effectively used for automatic segmentation of the RV. Computed quantitative metrics of our method outperformed that of the existing techniques participated in the MICCAI 2012 challenge, as reported by the challenge organizers. Magn Reson Med 000:000–000, 2017. V C 2017 International Society for Magnetic Resonance in Medicine.


INTRODUCTION
Compared with left ventricle (LV), the study of the right ventricle (RV) is a relatively young field.Although many recent studies have confirmed the prognostic value of RV function in cardiovascular disease, for several years its significance has been underestimated (1,2).Understanding the role of RV in the pathophysiology of heart failure traditionally has dawdled behind that of the LV.The RV used to be considered a relatively passive chamber relating the systemic and pulmonary circulation until more recent studies revealed its importance in sustaining the hemodynamic stability and cardiac performance (3)(4)(5).
Cardiac MRI is the preferred method for clinical assessment of the RV (6)(7)(8)(9)(10)(11)(12).Currently RV segmentation is manually performed by clinical experts, which is lengthy, tiresome and sensitive to intra and interoperator variability (6,13,14).Therefore, automating the RV segmentation process turns this tedious practice into a fast procedure.This ensures the results are more accurate and not vulnerable to operator-related variabilities, and eventually accelerates and facilitates the process of diagnosis and follow-up.
There are many challenges for RV segmentation that are mainly attributed to RV anatomy.These can be summarized as: presence of RV trabeculations with signal intensities similar to that of the myocardium, complex crescent shape of the RV, which varies from base to apex, along with inhomogeneity reflected in the apical image slices, and considerable variability in shape and intensity of the RV chamber among subjects, notably in pathological cases (6).
Currently, only limited studies have focused on RV segmentation (6).Atlas-based methods have been considered in some studies (15)(16)(17), which require large training datasets and long computational times while their final segmentation may not preserve the mostly regular boundaries of the ground-truth masks (16).Nevertheless, it is challenging to build a model general enough to cover all possible RV shapes and dynamics (18).Alternatively, graph-cut-based methods combined with distribution matching (19), shape-prior (20) and region-merging (21) were studied for RV segmentation.Overall, these methods suffer from a low robustness and accuracy, and require extensive user interaction.Image-based methods have been considered by Ringenberg et al (22) and Wang et al (23).They showed notable accuracy and processing time improvement over other methods while deformed RV shape and patient movement during the scan are the limitations of their method (22).Current learning-based approaches, such as probabilistic boosting trees and random forests, depend on the quality and extent of annotated training data and are computationally expensive (24)(25)(26)(27).
Motivated by these limitations, we developed an accurate, fast, robust and fully automated segmentation framework for cardiac MRI.A convolutional neural network (28)(29)(30)(31) is used to automatically detect the location of RV in the thoracic cavity and provide a region of interest (ROI).Afterward, a stacked autoencoder (stacked-AE) (32-37) is developed to automatically segment the RV and provide an initial contour.Finally, a method is introduced that incorporates the initial contour into classical deformable models to provide an accurate and robust RV contour.The algorithm is successfully validated on the MICCAI 2012 RV database (6).
The developed deep learning algorithm is based on the supervised learning paradigm.In supervised learning, some example data and corresponding labels are required to train and develop the algorithm.In other words, the algorithm artificially mimics the function of a human annotator.As a result, the algorithm can perform as good as the human annotator.Therefore, to obtain good results, it is important to provide the algorithm with clean and accurate data and labels (38).
Our major contributions include: (i) designing a fully automatic RV segmentation method for MRI datasets; (ii) using deep learning algorithms, trained with limited data, for automatic RV localization and initial segmentation; and (iii) incorporating the deep learning output into deformable models to address the shrinkage/leakage problems and reduce the sensitivity to initialization.Finally, a better performance in terms of multiple evaluation metrics and clinical indices was achieved.

METHODS
We used the MICCAI 2012 RV segmentation challenge (RVSC) database (6) provided by the LITIS Lab, at the University of Rouen, France.The algorithms were developed in our research centers at UC Irvine.Then, the results were submitted to the LITIS lab for independent evaluations.The cardiac MRI datasets were acquired by a 1.5 Tesla Siemens scanner that includes 48 short-axis images of patients with known diagnoses.The database is grouped into three datasets namely: TrainingSet, Test1Set, and Test2Set.Each dataset contains 16 image sequences corresponding to 16 patients.Manual delineations of RV at the end-diastole (ED) and end-systole (ES) are included in TrainingSet only.A typical dataset contains nine images at ED and seven images at ES from base to apex.Image parameters are summarized as: slice thickness ¼ 7 mm, image size ¼ 256 Â 216 (or 216 Â 256) pixels with 20 images per cardiac cycle.
Our method requires square inputs; therefore, patches of 216 Â 216 were cropped out of the original images and used during the training and testing procedures.We used images in TrainingSet to train our algorithm.After completion of training, the algorithm was deployed for RV segmentation in Test1Set and Test2Set.The ground truth delineations of Test1Set and Test2Set are not publicly available and the LITIS Lab provided the independent assessment results upon receiving the automatic segmentations.

Algorithm Description
The method is carried out over three stages as shown in Figure 1.The algorithm receives a short-axis cardiac MR image as the input (Fig. 1).First, in Step 1, the ROI containing the RV is determined in the image using a convolutional network trained to locate the RV.Then, in Step 2, the RV is initially segmented using a stacked-AE trained to delineate the RV.The obtained contour is used for initialization and incorporated into deformable models for segmentation in Step 3.Each stage of the block diagram is individually trained during an offline training process to obtain its optimum values of parameters, as described in our previous publication on LV segmentation (39).After training, the system is deployed for automatic segmentation.Here, we have used our developed localization and segmentation algorithms jointly; however, the two can be applied independently.In other words, our segmentation algorithm can work in conjunction with other automatic localization techniques or even without localization.Each step is further explained as follows for completeness of the presentation.

Automatic Localization (Step 1)
The original images in the database have a large field of view, covering the RV chamber as well as parts of the other surrounding organs.In addition, direct handling of the images is not computationally feasible because of the large image size.As such, we first localize the RV and crop out a ROI from the original images such that the RV chamber is positioned approximately within the center of the images.
Figure 2 shows a block diagram of the automatic RV localization using convolutional networks.We use a down-sampled m Â m image as the input to reduce complexity.Let us represent the pixel intensity at coordinate [i,j] by I [i,j].Throughout the study, we represent the i-th element of vector x by x[i] and the element at the i-th row and the j-th column of matrix X by X [i,j].
Then, the filters ðF l 2 R aÂa ; b 0 2 R k ; l ¼ 1; . . .; kÞ are convolved with the input image to obtain k convolved feature maps of size As shown in Figure 2, the next step in automatic localization is average pooling.The goal of average pooling is to down-sample the convolved feature maps by averaging over p Â p nonoverlapping regions in the convolved feature maps.This is done by calculating for 1 i 1 , j 1 m 2 .This results in k reduced-resolution features P l ʦ R m2 Â m2 for l ¼ 1,ÁÁÁ, k, where m 2 ¼ m 1 /p and p is chosen such that m 2 is an integer value.We set The pooled features are finally converted to vector p ʦ R n2 , where n 2 ¼ km 2  2 , and fully connected to a linear regression layer with two outputs.We train the network to find matrices W 1 ʦ R 2 Â n2 and b 1 ʦ R 2 and compute y c ¼ W 1 p þ b 1 at the output layer.Centered at the obtained output, a ROI with size M roi is cropped from the original image to be used for the next stage.The image slices near the RV base require a larger region to cover the whole RV with respect to image slices at the apex.We group the contours into large and small, and set M roi ¼ 171,91 for those, respectively.To optimize the performance of the automatic RV localization, the convolutional network is trained using the back-propagation algorithm (40) to obtain the parameter values F l ,l ¼ 1,ÁÁÁ,k, b 0 ,W 1 and b 1 .

Automatic Initialization (Step 2)
We use a stacked-AE to obtain an initial RV segmentation.As shown in Figure 3, in addition to the input and output layers, we have two hidden layers in the stacked-AE.The input vector, x s ʦ R n1 , is constructed by downsampling and unrolling the sub-image obtained from the to produce a binary mask.The binary mask is black (zero) everywhere except at the RV borders, and can also be converted to a contour as shown in Figure 3. Here, are trainable matrices and vectors that are obtained during the training process.We set the parameters as n 1 ¼ 3249, n 2 ¼ 300, n 3 ¼ 300, n 4 ¼ 3249.The output of this stage has a dual functionality; it is used as the initial contour for the segmentation step as well as a preliminary RV shape.

Training Stacked-AE
Although, an end-to-end supervised training can be used to train the stacked-AE, it does not lead to a good generalization due to the small size of the training data.For better generalization, we use an unsupervised layer-wise pretraining followed by an end-to-end supervised finetuning.Four typical examples of input images and labels are shown in Figure 4.The details can be found in Avendi et al (39).

RV Segmentation (Step 3)
As shown in Figure 1, the initial segmentation derived from the previous step is used as a preliminary contour in a deformable model.Deformable models are dynamic contours that eventually lie on the boundary of the object of interest.The evolution of the deformable models is aimed at minimizing an energy function.In conventional deformable methods, contours tend to shrink inward or leak outward because of the fuzziness of the cavity borders and presence of RV trabeculations.These issues are resolved by integrating the preliminary RV shape obtained from the previous stage into the deformable models.
We define the level-set function u(x,y) with negative and positive values for the pixels inside and outside a contour, respectively.We also define the following energy function which is a combination of the length-based, regionbased, and prior shape energy terms.The weights a 1 , a 2 , and a 3 are the combining parameters, empirically determined as a 1 ¼ 1, a 2 ¼ 0.5, and a 3 ¼ 0.25.The deformable method minimizes the energy function in Equation [3] to find the following unique contour: The solution u * will lie on the boundary of the object of interest.The optimization problem in Equation [4] can be solved using the gradient descent algorithm.

Implementation Details
Images of all cases in TrainingSet were collected and divided into the large-contour and small-contour groups.
As such, there are approximately 128 and 75 images of size 256 Â 216 or 216 Â 256 in each group, respectively.We also artificially enlarged the training dataset by translation, rotation and changing the pixel intensities of our images based on the standard principal component analysis (PCA) technique explained by Koikkalainen et al (41).Accordingly, we augmented the training dataset by a factor of 10.Afterward, we built and trained two networks, one for the large-contour and one for the smallcontour dataset.Over-fitting is a main issue in deep learning networks as many parameters should be learned.A poor performance on the test data is possible despite a well-fitted network to the training data.To overcome this issue, we adopted multiple techniques including artificial enlargement of the training dataset, performing layer-wise pretraining, l 2 regularization and sparsity constraints.The use of layer-wise pretraining greatly helped to mitigate the challenge of limited training data.To keep track of the number of parameters, the inputs were downsampled and only two hidden layers were used in the networks.To monitor the overfitting problem during training, a four-fold cross-validation was performed.Accordingly, the original training data (16 subjects) was divided into four partitions, with three partitions in training and one partition for validation, in each fold.The average of the four-fold cross-validations is typically considered the final outcome of the model.
Our method was developed in MATLAB 2014a, performed on a Dell Precision T7610 workstation, with Intel(R) Xeon(R) CPU 2.6 GHz, 32 GB RAM, on a 64-bit Windows 7 platform.

RESULTS
The performance of our methodology was assessed based on comparing the accuracy of the automated segmentation with the ground truth (i.e., manual annotations by experts).TrainingSet of the RVSC database (6) was used for training only, and Test1Set and Test2Set were used for validation.Because the reference contours of Test1Set and Test2Set are not publicly available, we submitted our automatic segmentation results to the LITIS Lab for independent evaluation.
Dice metric (DM) and Hausdorff distance (HD) were computed (6).DM is a measure of contour overlap, with a range between zero and one.A higher DM indicates a better match between automated and manual segmentations.Data augmentation improved the results related to DM for approximately 2.5%.HD measures the maximum perpendicular distance between the automatic and manual contours.Table 1 presents   Step 2 and Step 3, respectively.The refinement in Step 3 leads to overall improvement in the average DM and volume calculations.
For clinical validation, end-diastolic volume (EDV), end-systolic volume (ESV), and ejection fraction (EF) were computed.Correlation and Bland-Altman plots were obtained to assess their agreement to the ground truth.Correlation plots are shown in Figures 7 and 8 and the remaining plots can be found in Supporting Figure S1.
The range of EDV, ESV and EF was (40 mL to 232 mL), (17 mL to 173 mL) and (21% to 67%), in Test1Set and (61 mL to 242 mL), (18 mL to 196 mL), and (19% to 71%) in Test2Set, respectively.The correlation with the ground truth contours of R ¼ 0.99, 0.99, 0.96 and R ¼ 0.98, 0.99, 0.93 for EDV, ESV, and EF were achieved, for Test1Set and Test2Set, respectively.No statistically significant difference in global EDV (Test1Set P ¼ 0.96, Test2Set P ¼ 0.25), ESV (Test1Set P ¼ 0.12, Test2Set P ¼ 0.54) and EF (Test1Set P ¼ 0.1, Test2Set P ¼ 0.22), was observed.The DM shows the average overlap between the manual delineations and the automatic results.However, R 2 values correspond to the EDV, ESV, and EF.Obviously, a higher DM leads to a better volume estimation and higher R 2 values.This can be seen in Table 1, where both DM values and R 2 improve from Step 2 to Step 3.
Approximate elapsed times for training and test processes were obtained using the tic-toc command in

DISCUSSION AND CONCLUSIONS
Most of the challenges for RV segmentation are due to the complex anatomy of the RV chamber.These include RV trabeculations with signal intensities similar to the myocardium's, complex crescent shape of the RV that varies from base to apex, as well as significant variation of RV shape and intensity among the subjects (6).Due to these challenges, only limited studies have focused on RV segmentation.Among those, the state-of-the-art methods for RV segmentation suffer from several limitations such as leakage and shrinkage of contours due to the fuzziness of the RV borders and presence of trabeculations.Our learning-based method overcame these shortcomings and minimized shrinkage/leakage by integrating the inferred shape into the deformable model.Figures 5  and 6 show that the RV was properly segmented from base to apex.Like many other methods in the literature (6), the large contours can be more accurately segmented compared with the small contours, and working with image slices in vicinity of the apex particularly at ES can be challenging due to the small size and irregular shape.
Computed metrics in Table 1 show that the contours at ED were more accurately segmented in terms of DM compared with the contours at the ES, with an overall accuracy of 86% in both testing sets.This is because the contours at ES are larger and easier to segment.Again, this is also a characteristic of other segmentation methods as reported in Petitjean et al (6).
Table 2 summarizes the computed quantitative metrics averaged over ED and ES.As can be seen from the table, our method outperforms the state-of-the-art methods.Mean DM improvements compared with the other methods range from 0 to 0.28 on Test1Set and 0 to 0.22 on Test2Set.Ringenberg et al ( 22) demonstrated a mean improvement of 0.01 on Test2Set.Our mean HD improvements range from 1.38-20.77mm on Test1Set and 0.7-14.18mm on Test2Set.The closest results to our method is the work by Ringenberg et al (22) with similar DM values.However, our method provides better HD values, i.e., 7.67 mm and 8.03 mm for Test1Set and Test2Set, respectively, compared with 9.05 mm and 8.73 mm reported by Ringenberg et al (22).The smaller HD values of our method indicates superiority of our method over Ringenberg et al.
Figures 7 and 8 show a high correlation for ESV, EDV, and EF (greater than 0.98 for RV volumes), denoting excellent match between the automatic and manual contours.The Bland-Altman analysis revealed negligible biases and a better level of agreement compared with that of the other methods.For example, the Bland-Altman diagrams related to EF showed a bias close to zero with the 95% limits of agreement ( 6 1.96 SD) close to 6 0.10.This performance is similar to what reported by Caudron et al (42) for intraoperator variability values.A nonzero bias with the 95% limits closer to 6 0.2 exist for the other methods (6).Compared with the work by Ringenberg et al (22), that provides the closest results to ours in Table 2, our method provides a better R-value (correlation coefficient) for EDV, ESV and EF.For  Automatic Segmentation Using a Learning-Based Approach example, for EF, our method provides R ¼ 0.96 and 0.93 for Test1Set and Test2Set, respectively, compared with R 5 0.78 and 0.91 reported by Ringenberg et al (22).The higher R-values in our statistical evaluation demonstrates a better performance.These observations show the potential clinical applicability of our framework for automatic RV segmentation.
The measured elapsed times show that the method can be trained within a relatively short time and off-line.The first stage, i.e., convolutional network, requires the longest computational time among the three stages.This is because the most time-consuming operation needed is the convolution of the filters with images.Nevertheless, these computational times can be reduced by developing the algorithms into GPU-accelerated computing platforms.
During the test, the average time to perform RV segmentation, in a typical image, was less than 0.5 s.Most of the computational time was spent by the convolution network and the integrated deformable model.Yet, the integrated deformable model converges faster than classical deformable models because of the initialization and integration with the inferred shape.Overall, our method needs 5 s per patient for the processing.Unfortunately, a fair comparison between computational-time related to different methods was not possible because the other methods have been developed over different platforms.Their reported computational times range from 19 s to 30 min per patient (6,16,17,(19)(20)(21)(22)(23)43).In particular, the reported computational time reported by Ringenberg et al ( 22) on a similar workstation with Xeon processor is 19 s per patient, which is approximately four times more than the 5 s needed by our method.
As a limitation, the developed method may not perform as efficiently in patients with irregular RV shape, such as congenital heart defects.This is due to the fact that learning-based approaches are as good as their training data.A rich and diverse dataset for training will ensure the performance for various cases.In other words, to efficiently perform on patients with irregular shape RV, the training dataset should include some of those examples.
As discussed in our previous publication (39), a difficulty in applying deep learning approaches for cardiac MRI segmentation is the lack of enough data for training and eventually validation.For this work, we used a portion of the MICCAI 2012 RVSC dataset (6) and artificially enlarged that for training.Similar to LV segmentation, currently, no analytic approach exists to design hyperparameters in deep learning networks and they should be obtained empirically (39).Nevertheless, the results indicate that our automated method is accurate.Another limitation of this study is that the validation was performed on a dataset with a rather limited number of subjects and abnormalities.Also, because there is only one manual segmentation available from the MICCAI 2012 RVSC dataset, it was not possible to evaluate the intraand interobserver variability.Testing our method on a larger set of clinical data with multiple manual segmentation, that currently we do not have access to, is subject of future research.
In prospect, manual segmentation is time-consuming and requires dedicated operator training that makes it inefficient due to the extent of information in CMR images (46,47).Furthermore, because the traditional practice of manual segmentation is subjective, less reproducible and time-consuming, fully automatic 3D segmentation methods are highly desirable for computing functional parameters in patients, such as ejection fraction, cardiac output, peak ejection rate, filling rates among the other.
Our learning approach has the potential to be performed across the whole cardiac cycle.The method can also be extended to RV myocardial segmentation to provide additional clinical details.The current RV endocardial segmentation can be used as a preprocessing step to more accurately consider RV trabeculations.Comparison of RV segmentation results with that of LV segmentation, DM (94%), and HD (3.45 mm) (39), confirmed the difficulty of RV segmentation because of its complex shape variability.Nevertheless, further improvements of these metrics for RV segmentation to reach an accuracy similar to that of LV segmentation should be considered.Furthermore, the method can be considered for simultaneous multiple chamber segmentation by providing training labels that include multiple chambers.
In conclusion, we have developed a novel method for fully automatic RV segmentation from cardiac MRI shortaxes.The method uses deep learning algorithms combined with deformable models.It brings more robustness and accuracy, particularly for challenging images with fuzzy borders.In contrast to the other existing automated approaches, our method is based on learning several levels of representations, corresponding to a hierarchy of features and does not assume any model or assumption about the image or heart.The method is simple to implement, and potentially more robust against anatomical variability and image contrast variations.The feasibility and performance of this segmentation method was successfully demonstrated through computing standard metrics and clinical indices with respect to the ground truth on the MICCAI 2012 RVSC dataset (6).Results indicate improvements in terms of accuracy and computational time compared with the existing RV segmentation methods.

FIG. 1 .
FIG. 1. Block diagram of the integrated deep learning and deformable model algorithm for RV segmentation.

FIG. 2 .
FIG. 2. Block diagram of the convolutional network for automatic localization.

FIG. 4 .
FIG. 4. Four examples of the training data for the stacked-AE, input (upper row) and labels (lower row).
FIG. 5. Endocardial contours of RV at ED from base to apex (Patient #33 from Test2Set).

Fig. S3 .
Fig. S3.Endocardial contours of RV at ES from base to apex (Patient #21 from Test1Set).Our method resulted in best DM (0.93) for this case.Red and yellow correspond to Step 2 and Step 3 segmentation results, respectively.Fig. S4.Endocardial contours of RV at ED from base to apex (Patient #29 from Test1Set).Our method resulted in worst DM (0.74) for this case.Red and yellow correspond to Step 2 and Step 3 segmentation results, respectively.Fig. S5.Endocardial contours of RV at ES from base to apex (Patient #29 from Test1Set).Our method resulted in worst DM (0.74) for this case.Red and yellow correspond to Step 2 and Step 3 segmentation results, respectively.

Table 1
Quantitative Metrics and Mean Values (SDs) of DM and HD and Correlation Coefficient R 2 for RV Volumes

Table 2
Quantitative Metrics and Mean Values (SDs) of DM and HD Average over ED and ES, for Our Method Compared to Other Techniques A ¼ automatic, sA ¼ semiautomatic.