If you don't remember your password, you can reset it by entering your email address and clicking the Reset Password button. You will then receive an email that contains a secure link for resetting your password
If the address matches a valid account an email will be sent to __email__ with instructions for resetting your password
Fast and accurate auto-segmentation on daily images is essential for magnetic resonance imaging (MRI)–guided adaptive radiation therapy (ART). However, the state-of-the-art auto-segmentation based on deep learning still has limited success, particularly for complex structures in the abdomen. This study aimed to develop an automatic contour refinement (ACR) process to quickly correct for unacceptable auto-segmented contours.
Methods and Materials
An improved level set–based active contour model (ACM) was implemented for the ACR process and was tested on the deep learning–based auto-segmentation of 80 abdominal MRI sets along with their ground truth contours. The performance of the ACR process was evaluated using 4 contour accuracy metrics: the Dice similarity coefficient (DSC), mean distance to agreement (MDA), surface DSC, and added path length (APL) on the auto-segmented contours of the small bowel, large bowel, combined bowels, pancreas, duodenum, and stomach.
A portion (3%-39%) of the corrected contours became practically acceptable per the American Association of Physicists in Medicine Task Group 132 (TG-132) recommendation (DSC >0.8 and MDA <3 mm). The best correction performance was seen in the combined bowels, where for the contours with major errors (initial DSC <0.5 or MDA >8 mm), the mean DSC increased from 0.34 to 0.59, the mean MDA decreased from 7.02 mm to 5.23 mm, and the APL reduced by almost 20 mm, whereas for the contours with minor errors, the mean DSC increased from 0.72 to 0.79, the mean MDA decreased from 3.35 mm to 3.29 mm, and more than one-third (39%) of the ACR contours became clinically acceptable. The execution time for the ACR process on one subregion was less than 2 seconds using an NVIDIA GTX 1060 GPU.
The ACR process implemented based on the ACM was able to quickly correct for some inaccurate contours produced from MRI-based deep learning auto-segmentation of complex abdominal anatomy. The ACR method may be integrated into the auto-segmentation step to accelerate the process of MRI-guided ART.
Magnetic resonance imaging (MRI)–guided adaptive radiation therapy (ART) is currently being introduced into clinics by taking advantage of the emerging MR-Linac technology.
The MR-Linac is a hybrid system that integrates an MRI scanner and a linear accelerator (Linac), enabling superior soft-tissue contrast, functional information, and real-time MRI-guided radiation therapy (RT) delivery. In particular, the MR-Linac enables online adaptive replanning to account for patient interfraction changes at each fraction, substantially improving RT delivery accuracy. However, one of the bottleneck problems in the clinical practice of MRI-guided online adaptive replanning with current technology is the impractically long time required to segment the patient's anatomy of the day, which can exceed 30 minutes by conventional manual segmentation in a tumor site with complex anatomy (eg, abdomen).
Although it has been well documented that such auto-segmentations can be more efficient compared with the time-consuming and labor-intensive manual delineations, available MRI-based auto-segmentation methods still have limited success, particularly for complex structures such as those in the abdomen. Fu et al
reported their DL auto-segmentation on MRI in the abdomen achieved average Dice similarity coefficients (DSCs) of 0.953, 0.931, 0.850, 0.866, and 0.655 for the liver, kidneys, stomach, bowels, and duodenum, respectively. Bobo et al
reported similar results from their DL auto-segmentation solutions. These previous studies indicate that the DL auto-segmentation on MRI for complex organs (eg, bowels) is generally unacceptable for clinical use. Although improvements for the auto-segmentation of complex structures can be anticipated with continually developing robust DL algorithms and/or larger training data sets, they cannot be guaranteed. The unacceptable auto-segmented contours must be examined and edited manually before their clinical use.
Manual contour editing is generally labor-intensive and time-consuming and can be subjective, inevitably introducing inter- and intra-observer errors. A desirable solution to reduce or even replace the manual process is to automate the contour editing. Several techniques have been proposed for automatic contour refinement (ACR). One of these techniques is the dense conditional random field (CRF) model,
By formulating the final segmentation results using the soft label probability maps computed from the CNN as a maximum a posteriori inference problem, this method is capable of incorporating the contextual information coming from both local and global relationships between the image voxels.
also implemented dense CRF for final postprocessing of their CNN models for brain lesion segmentation on MRI, with a minimal (less than 0.01) increase in DSC. Although these studies demonstrated that the dense CRF model was very useful to smooth contour boundaries between different structures and removed small isolated wrong contours from the CNN predictions, the slight improvements measured by the DSC failed to sufficiently correct for the inaccurate contours of the complex abdominal organs. Another ACR technique is active contour models (ACMs),
which include edge-based ACMs and region-based ACMs. The edge-based ACMs use the image gradient to identify the boundaries, whereas the region-based ACMs use the image statistical information inside and outside the contour to guide the evolution.
However, the segmentation accuracy of the ACM methods can strongly rely on the parameters used in the models, which are usually chosen empirically. For the complex organs in the abdomen, especially the bowels with large size and shape variations, it is impractical to determine the appropriate parameters by manual adjustment for each specific bowel loop using the conventional methods.
To minimize or eliminate manual contour editing, and thus to accelerate the segmentation process for MRI-guided ART, this study aimed to develop an ACR process based on an ACM method that does not require manual parameter adjustment to quickly refine unacceptable DL auto-segmented contours of the complex abdominal organs, including the small and large bowels, pancreas, duodenum, and stomach. Clinical MRI data were used to test the ACM, and the performance of the ACR process was evaluated based on clinical criteria.
Methods and Materials
This study was approved by the Institutional Review Board of Medical College of Wisconsin.
MRI data set
A total of 80 abdominal MRI scans acquired during routine RT for 71 patients with abdominal cancers were used for this study, including 65 scans from RT simulation (MR-SIM) and 15 from MRI-guided ART with an MR-Linac (MRL). The MR-SIM images were acquired on a 3T scanner (Verio, Siemens, Germany) using an axial T2-weighted half-Fourier singleshot turbo spin-echo (HASTE) sequence
with the following parameters: repetition time (TR) = 2000 msec; echo time (TE) = 95∼98 msec; flip angle = 150°∼160°; matrix size = 320 × 212 ∼ 320 × 288; pixel size = 1.0625 × 1.0625 ∼ 1.3125 × 1.3125 mm2; and slice thickness = 3∼5 mm. The MRL images were the motion-average MRIs derived from 4-dimensional (4D) MRIs acquired on a 1.5T MR-Linac (Unity, Elekta Inc, Sweden) using either turbo field echo or balanced turbo field echo sequences.
to smooth out noise while preserving edge features; and (3) image intensity normalization to the range 0 to 255. For the ACR application described below, a contrast-limited adaptive histogram equalization method
to generate auto-segmented contours of the small bowel, large bowel, combined bowels, pancreas, duodenum, and stomach. To assess contour accuracy, the auto-segmented contours were compared with the ground truth contours delineated manually by an experienced researcher and independently verified by 2 radiation oncologists.
Each slice of the MRI was cropped into multiple 2D subregions based on the dilated initial auto-segmented contours, which were generated by convoluting with a 2D square kernel of size 25 × 25. For the bowels, each cropped subregion on a slice included at least a complete loop of the initial contour; for the pancreas, duodenum, and stomach, only 1 cropped subregion was obtained on each slice. Based on the American Association of Physicists in Medicine Task Group 132 (TG-132) report,
contours with a DSC <0.8 or a mean distance to agreement (MDA) >3 mm are considered inaccurate. Only the subregions with inaccurate contours were used as the initial contours to test the ACR process described in the next section. In addition to the initial contour, the ACM method described in the next section used the probability map (ie, the probability of each pixel belonging to the segmented organ) generated from the DL algorithm. Meanwhile, using a trial-and-error method, we tested whether adding a fudge factor to enhance low probability regions in the probability maps would improve the ACR performance, because those low-probability regions would be often associated with contour inaccuracy. Fudge factors ranging from 0.1 to 0.5 were tested. Based on the test results, we observed that the best performance of the ACR process occurred by increasing by 0.4 on the probability map of each voxel, with the maximum probability set to 1.0. This fudge factor of 0.4 was included as a part of the input data for the ACR.
a region-based level set algorithm in which the contour evolves by minimizing an energy functional. Because it is generally not practical to manually adjust various ACM parameters for a variety of complex inaccurate contours, we followed the approach proposed by Hatamizadeh et al
to generalize the scalar parameters in the ACM to 2D parameter maps using the probability maps obtained from the DL models. In addition, we converted probability maps to signed distance maps to initialize the level set in the ACM. The contours to be corrected were then evolved iteratively to minimize the energy functional and better match to the desired boundary. For the ACR application, the number of iterations was set to be 600 because large changes were expected for the initial inaccurate contours. More details on the ACM method are provided in the Supplementary materials and the relevant publications.
Figure 1 shows the workflow of the proposed ACR process based on the ACM algorithm. The process includes the following 3 steps: (1) inputting the test image, the DL auto-segmented contours, and the DL probability map; (2) preprocessing the input data by cropping each image slice into multiple 2D subregions and enhancing the probability map with the determined fudge factor; and (3) executing the ACM to correct or refine the contours and postprocessing the obtained contours if any obvious geometric inaccuracy is detected, eg, contour smoothing of imperfect contours, removal of isolated regions with low probability, and filling of empty holes.
The performance of the ACR process was evaluated based on the DSC, MDA, and 2 recently introduced metrics, surface DSC (sDSC)
The sDSC measures the surface overlap of 2 contours instead of the volumetric overlap as measured by DSC. The APL calculates the surface length of the ground truth contour that is not captured by the initial (auto-segmented) contour, which is the distance that the cursor needs to travel when correcting the inaccurate contour. Compared with traditional metrics (eg, DSC, MDA), the sDSC and APL have been shown to be more clinically relevant because they are better correlated with the contour editing time.
The values of the DSC, MDA, sDSC, and APL were calculated for the initial and ACR (corrected) contours with respect to the ground truth contours. Because a distance difference of 2 mm between the ground truth contour and the initial or ACR contour is generally considered to be practically acceptable, a 2-mm tolerance was assumed in the calculation of the sDSC and APL. To measure the performance of the ACR at the different levels of contour inaccuracy, the inaccurate subregions were further divided into 2 groups, a major error group with subregions of initial DSC <0.5 or MDA >8 mm and a minor error group with remaining subregions of 0.5 ≤ DSC < 0.8 or 3 mm < MDA ≤ 8 mm. To assess whether the improvements from the ACR were statistically significant, changes of the 4 accuracy metrics for the contours obtained before and after the ACR were analyzed using the paired t test.
The execution time to correct each inaccurate subregion with the developed ACR was less than 2 seconds on an NVIDIA GTX 1060 GPU. Table 1 compares the DSC, MDA, sDSC, and APL values obtained before and after the ACR for both the major and the minor error groups of the small bowel, large bowel, combined bowels, pancreas, duodenum, and stomach, along with the P values from the paired t test. The accuracies of the refined contours in the major error group were significantly improved for all organs (P < .001), with the best improvement for the combined bowels and pancreas. For the minor error group, the improvements as measured by the APL were significant for all the organs (P < .001 for decreases of the APL). However, these improvements as measured by other metrics (DSC, MDA, and sDSC) were mixed. A portion (3%-39%) of the corrected contours become practically acceptable after the ACR process per the TG-132 recommendation (DSC ≥ 0.8 and MDA ≤ 3 mm). The best correction performance was seen for the combined bowels: (1) for the major error group, the mean DSC increased from 0.34 to 0.59, the mean MDA decreased from 7.02 mm to 5.23 mm, and more notably, the APL was reduced by almost 20 mm and the improvement was observed in 84% of the subregions; (2) for the minor error group, the mean DSC increased from 0.72 to 0.79, the mean MDA decreased from 3.35 mm to 3.29 mm, and more than one-third (39%) of the corrected contours became clinically acceptable after the ACR. Among all the structures, the largest improvement was seen in the major error group of the pancreas, where the APL was reduced by approximately one-third after the ACR. For the duodenum and stomach, although the mean DSC, MDA, and sDSC were slightly changed after the ACR, particularly for contours with minor errors, the APLs decreased. The decreases in APLs for all the contours in both major and minor error groups indicate that the use of the ACR reduces the manual editing times required to correct for all the unacceptable contours of all the organs.
Table 1Performance of the automatic contour refinement process
Error groups of initial DL auto-segmentation based on DSC and MDA
Percentage of subregions with improved DSC and MDA after ACR
Percentage of subregions with DSC ≥0.8 and MDA ≤3 mm after ACR
To demonstrate the contour quality improvements with the ACR, the scatter plots of the 4 contour accuracy metrics for each organ and each error group obtained before and after the ACR process are shown in Figure 2. For most contours, except for the duodenum and stomach contours with minor errors, the contour accuracy was improved after the ACR; as shown in the figure, after the ACR, more data points shifted toward the origin of each plot (eg, a higher DSC or sDSC and a lower MDA or APL). Figures 3 and 4 compare the contours of the initial DL auto-segmentation, ACR, and ground truth for representative MR-SIM and MRL cases, respectively. As is shown, the ACR process improved contour accuracy even for the very irregularly shaped contours of the complex organs.
Additional details on the performance of the ACR for MR-SIM images and MRL images, respectively, are provided in Supplementary Tables S1 and S2. In general, the performance of the ACR for the MR-SIM scans was better than that for the MRL scans. Comparisons of the accuracy metrics for these 2 data sets before and after the ACR are shown in Supplementary Figures S1 and S2.
In this study, an automatic contour refinement process was developed based on an improved ACM algorithm to quickly correct for inaccurate contours generated by DL auto-segmentation. The effectiveness of the ACR process was demonstrated for complex abdominal structures including the bowels, pancreas, duodenum, and stomach. The ACR process can be implemented as a step after DL auto-segmentation to minimize subsequent manual editing effort, substantially accelerating the recontouring during MRI-guided ART, particularly for tumor sites with complex anatomy (eg, the abdomen).
Although DL-based segmentation methods have enabled the organ auto-contouring and achieved great success in many clinical applications, the current auto-segmented contours of challenging organs, such as the bowels, can still be clinically unacceptable.
Inevitably, manual editing needs to be performed subsequently to make the contours acceptable. The manual editing is generally time-consuming and labor-intensive, especially for inaccurate contours with irregular shapes (eg, complex abdominal organs). The presented ACR process can efficiently improve contour accuracy by reducing contour errors (eg, converting contours with major errors to those with minor errors) or even making a portion of the inaccurate auto-segmented contours practically acceptable. Thus, the adoption of the ACR will reduce the workload for manual editing, which is clearly indicated by the reduction of the APL. For the contours with major errors, average improvements of more than 0.2 increases in the DSC and 20-mm reductions in the APL after the ACR were seen for bowels and pancreas. For the contours with minor errors, more than 25% of the unacceptable bowel and pancreas contours became acceptable after the ACR. Although there were no obvious improvements for the duodenum and stomach contours if measured by other metrics, their APLs were reduced after the ACR, as seen in Table 1.
The ACR workflow was fast, with the execution time less than 2 seconds for each subregion and less than 4 minutes for each MRI set. By decreasing the number of iterations from 600 to 400 during the ACR execution, the processing time for each MRI set can be reduced to less than 2 minutes, with minimal effect on the ACR performance (eg, the average DSC and sDSC decreased <0.01, the average MDA increased <0.1 mm, and the average APL increased <1 mm). The time may be further shortened by applying smaller iteration numbers.
The ACM method that uses probability maps to define the per-pixel parameters and to initialize the contour evolution eliminates the need of the manual adjustment for the parameters, which is one of the major limitations with the traditional ACM methods.
For our purpose of correcting a large variety of contour inaccuracies for the complex abdominal organs, it would be impractical to determine and adjust ACM parameters for each specific case during the correction. Instead of fixing scalar parameters, the ACM method implemented in the ACR process establishes 2D parameter maps using the probability maps produced from the CNN models, which provide clues for the contours to be adjusted based on the actual organ boundaries. Therefore, the performance of the ACR depends on the quality of the probability maps from the DL auto-segmentation. This may explain why a better ACR performance was achieved for the combined bowels compared with the small bowel and large bowel separately, because the adopted CNN models had some difficulty in differentiating the small and large bowels.
In contrast, the MR-SIM sets were acquired with a respiration trigger. In addition, the quality of the probability maps for the MRL sets was poorer compared with those for the MR-SIM sets, because a smaller number of MRL sets was used in the training of the DL auto-segmentation model. The poor probability maps affected the ACR performance for the MRL sets. Nonetheless, the ACR still achieved very promising results, particularly for the contours with major errors (as shown in Table S2).
There are 2 major limitations in this study. First, the achievable accuracy by the ACR process is limited by the accuracy of the probability maps from the DL auto-segmentation. For example, if a probability map mislabels a background region as a part of an organ or wrongly includes a region of another organ, the ACR process would unlikely be able to correct for the contour. As an example, Figure 5A shows that the probability map inaccurately labeled some background pixels as small bowel (the values of these pixels on the probability map was equal or close to 1); the contour evolution would still be initialized and performed based on these incorrect regions, making it hard to correct these mistaken pixels. This could be a problem in a region with complicated anatomy (eg, the bowels and duodenum) where it is difficult to distinguish the organ from its surrounding background. Second, the ACR process relies on a region-based ACM, which has an intrinsic limitation in handling contours with inhomogeneous intensities. Figure 5B illustrates an example of a stomach contour where the ACR method failed to encompass the bright region, even though the probability map gave some hints. These 2 limitations explain why the ACR method was ineffective in correcting some inaccurate contours of complex organs.
Clearly, more robust methods for automatic contour refinement are needed. It is anticipated that as more advanced DL auto-segmentation algorithms and/or larger training data sets become available, the accuracy of the auto-segmentation will be continually increased. Demand for correcting the auto-segmented contours may be primarily for complex structures such as abdominal organs. The presently developed ACR process may still be applicable.
This study on ACR is a part of our effort to develop a 4-step segmentation pipeline for MRI-guided ART, including (1) auto-segmentation of MRI based on DL
; (3) auto-refinement of the detected inaccurate contours; and (4) manual editing using robust tools for the uncorrectable contours. Such a segmentation pipeline would effectively address the current slowness in the recontouring process, making MRI-guided daily online adaptive replanning more practical.
This work demonstrates the feasibility of using an improved ACM method to automatically refine inaccurate contours from the DL auto-segmentation of the complex abdominal organs, including the bowels, pancreas, duodenum, and stomach. This automatic contour refinement process is fast and efficient without the need for manual parameter adjustment. The developed ACR method can be integrated into the recontouring process to improve segmentation accuracy, minimize the subsequent tedious manual editing, and accelerate the execution of MRI-guided ART.
Input from Ergun Ahunbay, PhD, Haidy Nasief, PhD, William Hall, MD, Beth Erickson, MD, and Virgil Willcut, MS, is appreciated.
Sources of support: This research was partially supported by the Medical College of Wisconsin (MCW) Cancer Center and Froedtert Hospital Foundation, the MCW Meinerz and Fotsch Foundations, and the National Cancer Institute of the National Institutes of Health under award number R01CA247960. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Disclosures: The Medical College of Wisconsin received institutional research support from Elekta AB. Dr Xu and Mr Thill are employees of Elekta AB. All other authors have no disclosures to declare.
Research data are stored in an institutional repository and will be shared upon request to the corresponding author.