Advertisement

Development and Clinical Implementation of an Automated Virtual Integrative Planner for Radiation Therapy of Head and Neck Cancer

Open AccessPublished:July 16, 2022DOI:https://doi.org/10.1016/j.adro.2022.101029

      Abstract

      Purpose

      Head and neck (HN) radiation (RT) treatment planning is complex and resource intensive. Deviations and inconsistent plan quality significantly impact clinical outcomes. We sought to develop a novel automated virtual integrative (AVI) knowledge-based planning application to reduce planning time, increase consistency, and improve baseline quality.

      Materials and Methods

      An in-house write-enabled script was developed from a library of 668 previously treated HN RT plans. Prospective hazard analysis was performed, and mitigation strategies were implemented before clinical release. The AVI-planner software was retrospectively validated in a cohort of 52 recent HN cases. A physician panel evaluated planning limitations during initial deployment, and feedback was enacted via software refinements. A final second set of plans was generated and evaluated. Kolmogorov-Smirnov (KS) test in addition to Generalized Evaluation Metric (GEM) and Weighted Experience Score (WES) were used to compare normal tissue sparing between final AVI-planner versus respective clinically treated and historically accepted plans. T-test was used to compare the interactive time, complexity, and monitor units for AVI-planner versus manual optimization.

      Results

      Initially, 86% of plans were acceptable to treat with 10% minor and 4% major revisions or rejection recommended. Variability was noted in plan quality among HN subsites, with high initial quality for oropharynx and oral cavity plans. Plans needing revisions were comprised of sinonasal, nasopharynx, p-16 negative SCC Unknown Primary or cutaneous primary sites. Normal tissue sparing varied within subsites, but AVI-planner significantly lowered mean larynx dose (median 18.5 Gy vs 19.7 Gy, p<0.01) compared to clinical plans. AVI-planner significantly reduced interactive optimization time (mean 2 vs 85 minutes, p<0.01).

      Conclusions

      AVI-planner reliably generated clinically acceptable RT plans for oral cavity, salivary, oropharynx, larynx and hypopharynx cancers. Physician driven iterative learning processes resulted in favorable evolution in HN RT plan quality with significant time savings, and improved consistency using AVI-planner.
      Introduction
      Radiation therapy (RT) is a cornerstone of HN cancer treatment. Intensity-modulated radiation therapy (IMRT) has improved treatment accuracy and reduced RT-associated morbidity [1-10]. HN IMRT manual optimization is resource-intensive and variable, with heavy reliance upon physician and facility expertise [11-16]. HN IMRT implementation has been met with frequent treatment planning and quality assurance (QA) deviations, which are associated with worse outcomes [17-19]. Furthermore, the time required for HN IMRT planning must be considered in the context of survival advantages associated with minimizing total treatment time and time interval between consultation and starting treatment [20, 21]. HN RT delivered at high-accruing centers is associated with improved outcomes, though factors including travel burden and patients’ resources influence access to these centers [22-24].
      Automated planning has been developed to standardize treatment planning, maximize efficiency, improve plan quality, and mitigate geographic disparities by increasing access to high quality RT plans [13, 25]. Knowledge-based planning (KBP) models rely upon dosimetric and geometric experience from dose-volume histograms (DVH) of previously treated acceptable plans [25]. KBP benefits have been documented in various disease sites, including HN [26-34]. Iterative learning, a process incorporating manually driven feedback into model training, improves automated HN plan quality [35]. However, commercially available KBP algorithms are limited by smaller training datasets, lack of standardized inputs, challenging user-interface for plan revision, and limited ability to customize commercial algorithms to fit specific clinical needs. Script-based approaches like ours enable clinic-specific customization. Prior studies have characterized plan quality in cohorts of HN patients without regard for primary site, while others report achievements in only one subsite (e.g. oropharynx [36] or nasopharynx [31]). There is a paucity of data regarding automated planning algorithm performance among different HN sites.
      Herein, we report the development of an automated virtual integrative (AVI) planning algorithm. The algorithm is not a machine learning approach. This algorithm was designed using the same treatment planning system tools applied by dosimetrists during the manual process and integrates historical optimization norms from prior plans. The AVI-planner algorithm uniquely generates optimization parameters based upon statistical analyses of DVH metrics from previously treated HN RT plans. We sought to create preliminary automated HN RT plans for “warm start optimization” where dosimetrists continue optimization from the automated plan instead of starting each plan with a new manual process [37]. We describe the iterative learning process to address planning deficiencies noted for select primary sites. To our knowledge, this is the first investigation of a HN-specific automated planning algorithm whereby the identification of site-specific clinically-significant deficiencies drive autoplanner script refinements to improve overall RT plan quality.
      Methods:
      Script Development and Hazard Analysis
      Our script release process is shown in Figure 1. The write-enabled script was developed to incorporate practice norms defined by a library of 668 previously treated HN RT plans collected at our institution between 2014-2019. This library was comprised of 31.3% oropharynx (n=209), 19.3% oral cavity (n=129), 14.7% larynx (n=98), 7.9% cutaneous (n=53), 6.6% salivary (n=44), 4.2% sinonasal (n=28), 3.7% nasopharynx (n=25), 3% Unknown Primary (n=20), 2.7% hypopharynx (n=18), 1.8% thyroid (n=12), 0.6% orbital or lacrimal (n=4), 4.2% “other” (n=28). Software inputs were standardized including nomenclature and complete sets of contoured organs at risk/planning target volumes (OAR/PTVs) with explicitly defined planning priorities and objectives. Within the foundational library, >90% of plans contained spinal cord, brainstem, bilateral cochlea, parotids, superior and inferior pharyngeal constrictors, oral cavity, esophagus, mandible, lips. When surgically present and clinically relevant, bilateral submandibular glands (SMG) were included in 75%, larynx in 81%, bilateral optic nerves, chiasm, eyes, and lenses were included in 18-25%, while only 11% included lacrimal glands (data not shown).
      Figure 1:
      Figure 1Flow chart depicting write-enabled script development and release process.
      During development, the AVI-planner algorithm statistically evaluated DVH parameters from the 668 plan library, which then informed optimization parameters. Optimization constraints were defined as less than 30% of historic values. A team of physicists, dosimetrists and software developers then used the algorithm to iteratively optimize a subset of 20 HN patients. None of the 20 HN plans were included in the physician Round 1 evaluation. Prior to Round 1 evaluation (see below), all planning parameters in the algorithm were finalized for physician evaluation. Based upon standardized input targets and OARs, the algorithm created a full set of optimization structures using typical margin and boolean operations. Optimization structures included sub-volumes of overlapping OAR and target structures, as well as high dose PTV subvolumes segmented from lower PTV volumes. Dose sculpting rings were used by the normal tissue objective to conform prescription isodose lines. The AVI-planner software automatically placed an isocenter, segmented optimization structures, and generated beams and plan setup with full calculation. All plans were VMAT, calculated in Eclipse version 15.6, with the analytical anisotropic algorithm (AAA), using 0.25 cm grid size. Eclipse Scripting Application Programming Interface (ESAPI) enabled the integration of AVI-planner software with Eclipse (Varian Medical System, Palo Alto, CA). The non-clinical, research version of ESAPI mimicked manual optimization and allowed interaction with the optimizer during optimization. However, the clinical ESAPI version did not allow this interaction. Since our objective was designing software compatible with the Food and Drug Administration (FDA) approved ESAPI versions, our interface and algorithm generated HN plans which could be sequentially, manually modified after optimization. Optimization with our AVI-planner algorithm did not allow for dynamic real-time interaction with the optimizer.
      Routine physics quality plan check was employed for the automated plans, which then proceeded onto a second phase of clinical evaluation. Before clinical use, a prospective hazard analysis was performed using a streamlined failure mode and effects analysis described by Paradis et al. [38]. A process map for clinical use of the script was generated with associated hazards (failure modes) from multidisciplinary feedback. The priority score for each failure mode (a version of the relative risk priority number from TG-100) was assigned as high, medium, or low [39]. All failure modes with high or medium priority scores were mitigated before proceeding to plan evaluation and clinical deployment.
      Patient Selection
      This study was IRB exempt (HUM 00126332) for quality improvement. AVI-planner in Round 1 optimization was retrospectively validated within a cohort of 52 HN cancer patients treated between 2019-2020. None of these 52 plans were included within the foundational 668 plan library. We included oral cavity, oropharynx, larynx, hypopharynx, cutaneous, sinonasal, and salivary primaries to account for anatomy and OARs, adjuvant vs definitive RT, target dose, and fractionation. Institutional dose-escalation or de-escalation protocol patients were included. We excluded hypofractionated and palliative patients. Simulation CT scans were performed on a Philips Brilliance big-bore 16 slice scanner (Koninklijke Philips N.V., Amsterdam, Netherlands) using 3 mm slices. Patients were scanned head-first, supine with IV contrast and immobilized in 5-point thermoplastic masks. Intact and postoperative boost and elective CTV contours were delineated referencing published guidelines [40, 41] with a 3 mm PTV margin. Dosimetrists manually optimized clinical plans using Eclipse (Varian Medical System, Palo Alto, CA), which were delivered on Varian TrueBeam or Clinac linear accelerators with 120 leaf MLC using 6-MV photons with 2-4 VMAT arcs.
      Plan Evaluation
      Clinical plans underwent peer-review by a subspecialty panel of attending radiation oncologists. Institutional protocols specified prioritization of target coverage and objectives for OAR sparing (Supplementary Table 1). To identify AVI-planner limitations consistently requiring additional manual input for “warm start optimization,” the physician panel evaluated clinical acceptability of “Round 1” AVI-planner cases. HN subsites were grouped by treatment paradigm and anatomic proximity. These plans were “rejected” if the plan was unsafe and unsalvageable despite reoptimization. “Major revisions” indicated a high perceived risk of either 1) a clinically relevant toxicity due to exceeded OAR constraints or 2) risk of recurrence from target under-coverage. Plans with “minor revisions” were safe with room for improvement in conformality, heterogeneity, or target coverage. The highest quality plans were deemed “treat as is.” Physician feedback from Round 1 was addressed per “Write-enabled Script Refinement.” All 52 cases were then replanned with the AVI-planner script without manual modifications and labeled “Round 2.” The same physician panel re-evaluated all 52 plans.
      Beyond stand-alone clinical acceptability, Round 2 AVI-planner quality was compared to 1) clinically treated plans 2) historically accepted plans and 3) literature-based thresholds [42]. Clinically treated plan denotes the patient-specific RT plan, which was delivered during the patient's treatment course. Within this context, evaluating Round 2 versus the clinically treated plan provides an individual, patient-level comparison of plan quality. Comparisons to historically accepted plans were based on summarized metrics captured from the entire 668 HN foundational library. Thus, Round 2 plan quality was assessed in the context of aggregate institutional experience with all 668 considered high-quality HN plans. Evaluating Round 2 plans in both situations more fully characterizes plan quality at both the patient-level and institutional experience- level.
      To compare AVI-planner to historic plans, constraint metrics within the algorithm were derived from 668 previously treated plans using the previously described Generalized Evaluation Metric (GEM) and Weighted Experience Score (WES) described by Mayo et al. [43]. GEM compares DVH metrics to constraints and historical values, which are cast onto a sigmoidal curve with scale of 0 to 1, where GEM = 0.5 if the constraint was met and 0.95 when 95% of historical values were lower than the current plan's value. WES ranks the DVH curves with respect to historical values, on a 0 to 1 scale, with values weighted according to historic variability. WES correlates with NTCP but rises sooner with respect to dose, correlating with physician preferences to drive doses below NTCP thresholds.
      VRxGy[%] was used to assess coverage at the prescribed dose for each dose level. The ICRU Conformality index (CI) [26, 44] was calculated for PTV_High, PTV_Low and PTV_Mid00 volume
      CIICRU=Body:VRx[cc]PTV:Volume[cc]


      Dose heterogeneity within PTV volumes was assessed using ICRU 83 HI1 [45].
      HI1=(D2%[Gy]D98%[Gy])D50%[Gy]


      These were calculated for the PTV subvolumes, not overlapping with volumes at prescribed doses as PTV_High, PTV!_Low and PTV!_Mid00 in TG-263 nomenclature.
      We collected total monitor units (MU) per plan for the 52 patient cohort as well as calculated complexity described by Younge et al [46] as below.
      M=1MUi=1NMUixyiAi


      Write-enabled Script Refinement and Clinical Deployment
      Iterative learning occurred by a two-step process, which used physician feedback to refine the optimization algorithm. The first iteration of plans reflected the explicitly stated prescription planning objectives using statistical data gained from the 668 HN plan library (“Round 1”). In Round 1, a template of optimization constraints was defined by quantile analysis of DVH metrics within our plan library of 668 previously treated patients. For structures evaluated by Mean[Gy], the constraint corresponded with the lower 30% of historic values for D90%[Gy], D50%[Gy] and D10%[Gy]. This enabled prioritization of portions receiving lower dose further away from PTV (D90%[Gy]) as compared to portions receiving high dose in close proximity to PTV volumes (D10%[Gy]). “Round 1” planning used the validation cohort described below, and “Round 1” indicates optimization with the initially released AVI-planner script and minor manual edits. After Round 1 evaluation, we discovered additional implicit physician preferences and expectations which were not stated in the prescription planning documentation. The algorithm was modified in several ways to incorporate physician feedback. This modified, refined algorithm was subsequently used to generate refined plans (“Round 2”). To shorten development time required to refine algorithm performance, these modified parameters were placed in an external configuration file. This limited the scope of changes that required recompiling the code, and also facilitates more rapid customization in the future when releasing this script to other clinics. Script modifications were as follows:
      • 1)
        Normal tissue constraints and priorities – Instead of limiting the level of priority to 1, 2, or 3, we included additional more granular priority levels (i.e priority 1, 1.5, 2, 2.5, 3, etc) to better align with physicians’ intent. Instead of a fixed constraint value, the algorithm was modified to allow increasing or decreasing a given constraint.
      • 2)
        Dose-sculpting structures- Automatically generated rings and buffers were added to increase conformality and minimize dose in non-target and non-OAR normal tissues (i.e minimize low and intermediate dose within the base of the neck).
      • 3)
        Segmented structures- Subvolumes of PTV and OAR overlap were transformed into standardized segmented structures, and new constraints and priorities were added to enhance the algorithm's ability to more precisely control dose. For example, areas of parotid and PTV overlap were segmented out, and a higher priority for sparing was placed upon the non-overlapping ipsilateral parotid.
      • 4)
        Isocenter placement- In response to major revision or rejected plans, we modified the algorithm's ability to detect unilateral or atypical target location and subsequently adapt isocenter placement was added. This maximized use of the central 0.5 cm MLC leaves.
      After the evaluation process described, AVI-planner was deployed to the clinic as a staged process. The initial use of the application was carefully monitored by physician and physicist stakeholders in the initial limited clinical release for proper functioning and introduction of hazards. Following validation, the script was then deployed without changes. The script is routinely used in clinic, though dosimetrists regularly manually modify these automated plans with physician input. Requests for additional improvements and features are monitored and incorporated into future development cycles.
      Statistical Analysis
      We used Python 3.8 statistical software for this analysis. A one-sided, Kolmogorov-Smirnov (KS) test was used to determine if the distribution of AVI-planner OAR mean or D0.1cc values was higher or lower than clinically treated plans of the validation cohort. The distribution of per plan differences was analyzed. AVI-planner values were compared to literature based thresholds [42] using a normal distribution, with matched cardinality, centered on each threshold with a 0.5 Gy standard deviation as the reference distribution using a t-test for mean value difference. Descriptive statistics were utilized to evaluate dosimetric endpoints and time required for treatment planning. One-tailed T-test was used to compare the interactive time required for AVI-planning versus manual planning. Two-tailed T-test was used for comparing total monitor units (MUs) and complexity between AVI-planner and clinical plans.
      Results
      Failure Mode and Effects Analysis (FMEA)
      Before clinical deployment, 12 failure modes were identified relating to contour generation (7), plan creation (1), treatment field generation (2), plan optimization (1) and plan approval (1) (Supplementary Table 2). None of these failure modes were higher relative risk compared to the manual treatment planning process. A detailed summary of the failure modes and associated mitigations is shown in Supplementary Table 2. Several code modifications were prompted by FMEA. These included detailed analysis of structure volumes at the beginning and end of the algorithm to identify changes made, checks at entry of the script algorithm that PTV and organ at risk volumes are approved and cannot be edited, enforcement of naming conventions for structure sets, course and plans to minimize risk of unintentional use of an automated plan.
      Validation and Clinical Implementation: Round 1
      We retrospectively validated AVI-planner in 52 patients, which consisted of mostly men (69%) with locally advanced oropharynx (40%) or oral cavity and salivary (31%) cancers (Figure 2, Supplementary Table 3). Definitive intent organ-preservation RT comprised the majority of plans (58%) with a median of 70 Gy (range 54-80 Gy) in 35 fractions (range 27-35 fx). 62% received concurrent chemotherapy.
      Figure 2:
      Figure 2Highlighting the importance of disease subsite-specific AVI-planner algorithm performance. We show an integrated pie and doughnut chart demonstrating clinical acceptability of AVI-planned cases among H&N subsites. Central chart (blue) shows plan frequency by subsite (n=52). Innermost doughnut chart shows “Round 1” clinical acceptability. Outermost doughnut chart “Round 2” shows evolution in acceptability for 7 plans initially requiring revisions.
      Overall, 86% of Round 1 plans were safe to treat; however, we identified variability in plan quality among different HN subsites (Figure 2). All oropharynx and p16+ SCC Unknown Primary (21/21 plans), larynx and hypopharynx (7/7 plans) were “treat as is.” Similarly, most oral cavity and salivary cases (14/16 plans; 87.5%) required no revisions. This contrasts with the frequency of major revisions or rejections recommended for sinonasal, nasopharynx and p16-negative SCC Unknown Primary (1/6 plans; 17%) and cutaneous (1/2 plans; 50%- Figure 2). Minor revisions were increasing conformality and reducing heterogeneity. Major revisions were limiting hot spots outside PTV, restricting hot spots within PTV to 105-110%, and improving target coverage. Sample Round 1 isodose distributions are shown for both a definitive early stage p16+ base of tongue cancer considered “treat as is” (Figure 3B), compared to “major revisions” for an adjuvantly treated malar cheek Merkel cell carcinoma (Figure 3E). Software refinements were made to the script in response to the Round 1 evaluation which included normal tissue constraints and priorities, dose-sculpting structures, segmented structures, and isocenter placement as discussed above in Methods section Write-enabled Script Refinement and Clinical Deployment.
      Figure 3:
      Figure 3Identifying the strengths and limitations of automated planning among typical versus atypical HN cases. We show sample plans for early-stage oropharynx and adjuvant Merkel cell carcinoma. Top panel shows a stage II cT2N2M0 p16+ squamous cell carcinoma of the right base of tongue treated with definitive chemoradiation to 70 Gy in 35 fx (A-C), which is considered a typical case and well-represented within the model. The bottom panel shows a stage III pT1 pN1a(sn) Merkel Cell carcinoma treated adjuvantly to 54 Gy in 30 fx (D-F), which is considered atypical and underrepresented within the model. Left panels (A,D) Clinically Treated Plan. Middle panels (B) Round 1 AVI-planner “treat as is;” (E) Round 1 AVI-planner “major revision” due to conformality and 119% hotspot outside PTV. Right panels (C, F) Round 2 following AVI-planner upgrades. Isodose lines (absolute dose, Gy) show 75 (light green), 70 (white), 65 (pink), 60 (red), 54 (yellow), 51 (green), 45 (orange), 40 (purple), 30 (cyan), 25 (green), 20 (dark blue).
      Clinical Reassessment: Round 2
      During Round 2 evaluation of all 52 plans, there were no rejections nor major revisions (Figure 2). Minor revisions were recommended for 1 oral cavity (6.3%) and 3 sinonasal or nasopharynx or p16-negative SCC Unknown Primary plans (50%). The remaining 48 plans were “treat as is.” Minor revisions in Round 2 focused on improving conformality, or more aggressive sparing of spinal cord, optics and contralateral orbit or salivary structures. This evolution in quality is evident for the adjuvantly treated Merkel cell carcinoma (Figure 3D-F).
      Conformality and dose heterogeneity for all PTV levels were similar between Round 2 AVI-planner and clinical cases (Table 1). The number of MU per plan was significantly lower for AVI-planner Round 2 (mean 619.7 ± 69.7 MU) compared to the clinically treated plan (693.5 ± 219.4 MU; p=0.03). Similarly, AVI-planner generated less complex plans as compared to clinically treated plans (mean complexity score 0.13 ± 0.02 vs 0.14 ± 0.03; p<0.01). To evaluate patterns of OAR sparing in Round 2 AVI-planner, we compared the entire distribution of mean or D0.1cc dose values between AVI-planner cases versus clinically treated or historically accepted plan values (Figure 4A). Given that oropharynx was the most prevalent HN subsite within both the foundational library and the validation cohort, we also selected a representative DVH from a locally advanced oropharynx cancer treated with definitive chemoradiation. This demonstrates typical DVH metrics from a case which was well represented in the model (Figure 5).
      Table 1Conformality and heterogeneity (ICRU 83) indexes of clinical and Round 2 AVI-planner for high, intermediate, and low PTV.
      PTV_HighPTV_MidPTV_Low
      Clinical PlanAVI Plannerp-valueClinical PlanAVI Plannerp-valueClinical PlanAVI

      Planner
      p-value
      Conformality Index1.1 ± 0.71.2 ± 0.70.71.3 ± 0.41.4 ± 0.50.71.3 ± 0.21.3 ± 0.20.8
      Heterogeneity Index1.1 ± 0.041.1 ± 0.030.51.1 ± 0.041.1 ± 0.040.41.2 ± 0.11.2 ± 0.10.4
      Figure 4:
      Figure 4Differential normal tissue sparing by clinical versus AVI-planner. Comparisons for all HN subsites (A); oropharynx, p16+ SCC Unknown Primary, Oral Cavity, Salivary, Larynx, Hypopharynx (B); Sinonasal, Nasopharynx, and p16-negative SCC Unknown Primary (C). OARs are listed on y-axis with corresponding constraint. Dose (Gy) on x-axis. Box plots of clinical (blue) or Round 2 AVI-planner (yellow) provide median, IQR, minimum, and maximum doses. Red “x” denotes consensus thresholds [42]. The difference in normal tissue sparing between AVI-planner and clinical plan is typically small compared to the difference in relation to established thresholds. Statistical significance was achieved with p<0.05 on one-sided Kolmogrorov-Smirnov test. Symbols along y-axis indicate statistically significant difference in OAR sparing between AVI-planner versus clinical plans: total cohort (panel A-circles), Oropharynx and p16+ SCC Unknown Primary (panel B-stars), Oral Cavity and Salivary (panel B- diamonds). Filled shapes indicate AVI-planner significantly improved sparing whereas unshaded symbols indicate clinical plan achieved significantly better sparing.
      Figure 5:
      Figure 5Representative typical dose-volume histogram for a cT4N1M0 p16+ squamous cell carcinoma of the left tonsil treated definitively to 70 Gy, comparing manual plan (squares) versus AVI-planner (triangles). X-axis is Dose (Gy), Y-axis volume (%).
      Considering all 52 plans, the contralateral parotid dose distribution was higher with AVI-planner compared to clinically treated plans (median 25 vs 23 Gy, p<0.01), with higher doses compared to historic plans (WES 0.50 vs 0.42, p<0.01). Inferior pharyngeal constrictor muscles had higher distribution of mean dose in AVI-planner versus clinical plans (median 21 Gy vs 19 Gy, p<0.01), with higher doses compared to historic plans (WES 0.53 vs 0.35, p<0.01), and narrowly exceeded our constraint (GEM 0.52). Conversely, AVI-planner lowered the dose to ipsilateral SMG (62 Gy vs 65 Gy, p=0.04) though this was not clinically relevant (GEM >0.90) [43]. AVI-planner lowered the distribution of mean dose to the larynx compared to clinical (median 19 Gy vs 20 Gy, p<0.01) and historical plans (WES 0.29 vs 0.44, p<0.01). Brainstem D0.1cc from AVI-planner was lower than clinical plans (median 28 Gy vs 32 Gy, p<0.01). Spinal cord distribution of D0.1cc was lower in AVI-planner as compared to historic plans (WES 0.44 vs 0.63, p<0.01), but similar to clinical plans (median 36.4 Gy vs 36.7 Gy, p=0.9). Distribution of dose to optic nerves, chiasm, eyes, contralateral SMG, superior pharyngeal constrictors, oral cavity, mandible, and esophagus were similar among AVI-planner cases, clinical and historic plans (Figure 4A).
      For oropharynx or p16+ SCC Unknown Primary (n=21; Figure 4B), the distribution of mean larynx dose was significantly lower with AVI-planner versus clinical plans (median 18 vs 20 Gy, p<0.01) or historical plans (WES 0.28 vs 0.44, p<0.01), which was clinically relevant (GEM 0.46). Esophagus and ipsilateral parotid were spared equally among AVI-planner, clinical and historic plans. Distribution of mean dose to contralateral SMG was higher for AVI-planner compared to clinical plans (median 36 Gy vs 33 Gy, p=0.02) and historical plans (WES 0.44 vs 0.41, p=0.02). Neither clinical nor AVI-planner met constraints for relevant sparing (GEM 0.62 and 0.57). Contralateral parotid distribution of mean dose was higher for AVI-planner compared to clinical plans (median 25 vs 23 Gy, p=0.047), but similar to historically accepted plans (WES 0.50 vs 0.43, p=0.1). Superior and inferior pharyngeal constrictors received higher dose with AVI-planner and exceeded constraints compared to historical controls (p<0.01). Oral cavity distribution of mean dose was higher with AVI-planner versus clinical (37 vs 33 Gy, p=0.04) and historic plans (WES 0.64 vs 0.53, p=0.04; Figure 4B).
      OAR doses were similar between clinical and Round 2 AVI-planner for the remaining HN subsites (Figure 4B and 4C). Of note, the oral cavity/salivary contralateral parotid distribution of mean dose was significantly higher for AVI-planner compared to clinical plans (median 26 Gy vs 23 Gy, p<0.01), and historic plans (WES 0.53 vs 0.42, p<0.01) and did not meet constraints (GEM 0.56) (Figure 4B). Two cutaneous plans did not reach the 3 plan threshold required for formal comparison.
      Time Study
      We compared Eclipse optimizer interactive time for 10 recent manual plans versus interactive time with AVI-planner software for 51 of the validation cohort patients. This interactive time included all steps of the manual optimization such as segmentation structures, setting isocenter, ring and buffer dose sculpting structures, normal tissue optimization limits, target and OAR prioritization, setting the # arcs, optimization time. Of the 10 manually optimized plans, 70% were oropharynx (n=7), while 30% were comprised of oral cavity (n=1), thyroid (n=1), and Unknown Primary (n=1). Mean time for manual interaction time was shorter for AVI-planner vs manually optimized plans, 2 vs 85 minutes respectively (p<0.01).
      Discussion
      We developed and implemented a knowledge-based automated virtual integrative software to facilitate HN treatment planning. Initially, we identified inconsistent plan quality among different HN subsites. Following iterative software adaptations, we noted favorable evolution in target coverage, heterogeneity and OAR sparing. This software exceeded our “warm start optimization” goal and rapidly created clinically acceptable plans without manual adjustments for many HN subsites. We have published the source code for AVI-Planner at a GitHub repository (http://xxxx.xxxx.xxxx) to promote use and development of automated planning.
      Regarding clinical acceptability of automated HN plans, we found 86% of Round 1 plans were treat as is, which is comparable to 88% by Radiation Planning Assistant [47]. Our script was developed from a diverse training dataset, capturing unique nuances and planning considerations. The inconsistent site-specific plan quality likely resulted from limited experience within the foundational library. Our heterogeneous library accrued over 5 years, but there were fewer cutaneous (7.9%), sinonasal (4.2%) and nasopharynx (3.7%) plans. Improvements in both conformality and heterogeneity were shown for prostate and cervix cancer after refining Varian's RapidPlan default settings [48], but studies detailing specific software refinements and evolution in plan quality among multiple subsites are limited for HN [35, 47].
      Physicians frequently emphasized higher OAR prioritization. For instance, the clinical plan aggressively spared contralateral parotid further below the planning objective in a cT1N1 p16+ tonsil cancer, whereas AVI-planner less aggressively spared the contralateral parotid to meet constraints. Given OAR constraint heterogeneity, we compared the Round 2 AVI-planner results to consensus thresholds [42]. Structure laterality is reported, however the distinction of ipsilateral or contralateral relative to the target is less readily available. AVI-planner achieved lower contralateral parotid doses (median 25 Gy) than the 26 Gy threshold (p <0.01). AVI-planner lowered dose to optic structures, eyes, brainstem, spinal cord, esophagus, inferior constrictors and mandible compared to accepted thresholds. Larynx doses achieved in AVI-planner cases were lower than thresholds (35 Gy, p<0.01), and values with automation approaches by Fogliata et al. (24.8 ± 5.2 Gy) or Ouyang et al. (28.4 Gy) [26, 27]. This highlights the importance of benchmarking automation achievements against literature values, historic norms, and the validation subset.
      Secondarily, the failure modes addressed during development can be found in manual planning, suggesting these hazards already exist and may be more likely to happen without the software. Thus, automated planning does not obviate standard clinical and physics QA. Similar to Wang et al. [49], inconsistencies in standardized OAR prioritization affected the performance of our model. In line with time-savings noted by other groups for autoplanned nasopharynx [31, 50, 51] and oropharynx cancers [33, 36], we confirmed time savings with AVI-planner. AVI-planner generated less complex plans (p<0.01) with fewer MU (p=0.03) compared to the clinically treated plans.
      Limitations of this work include this software is integrated only with Eclipse for 30-35 fraction plans. Given the revisions required for sinonasal, nasopharynx, and cutaneous sites, this software should be used cautiously near the skull base. Three target dose levels are currently supported, but additional dose levels require manual editing. Dosimetrists must also ensure the relevance of automated decisions. For instance, planners must remain vigilant about modifying the isocenter location or number of arcs for a unilateral target. Physicians must explicitly address planning preferences in the planning directive. For example, in a locally advanced maxillary sinus cancer requiring adjuvant RT following an orbital exenteration, aggressively sparing the remaining contralateral orbit and lacrimal gland may take precedence over PTV coverage. The user-friendly interface contains the same tools used in manual optimization, allowing real-time modification by dosimetrists, compared to fully automated optimization which must run to completion before permitting revision. However, these standardized optimization parameters likely differ from personalized approaches of experienced dosimetrists. Therefore, additional time may be required for revisions. Inter-institutional heterogeneity in delineated OARs, inconsistent OAR contouring, and variability in constraints and prioritization are barriers to widespread adoption of automated planning.
      To our knowledge, this is the first report identifying HN primary site-specific variability in automated plan quality, which favorably evolved with physician input. Our work cautions against interpreting that automated planning achievements are universal among HN subsites. This is relevant for clinics that would ideally employ one planning algorithm for all HN cases, instead of separate optimization algorithms for each HN subsite. We are not advocating this software in lieu of skilled dosimetrists or treatment at high-accruing centers. However, in settings of limited resources, increased demand, urgent starts or reduced subspecialized dosimetrists, AVI-planner software can be easily integrated into workflows to increase availability of high quality HN RT plans.
      Furthermore, our institutional adoption of AVI-planner into routine practice has expanded the number of dosimetrists able to rapidly generate high quality HN plans. We plan to release AVI-planner to our affiliate sites to improve plan quality and uniformity, we are also extending this standardized approach to other disease sites including lung and prostate. In the future, automated planning will facilitate adaptive RT planning. Future software upgrades may incorporate gEUD, update the foundational library, and focus on well-lateralized cases near the skin surface. Application programming interfaces that enable clinics to programmatically automate all parts of the treatment planning process, give clinics the tools they need to increase efficiency and consistency in plan quality in their process workflows. To promote these clinical improvements it is highly desirable for manufactures to provide application programming interfaces that give users at minimum the same capabilities they have in manual operations to algorithmically interact with optimizers, dose calculation engines, reference points, and scheduling capabilities.
      Acknowledgements
      The authors would also like to acknowledge Steven Kronenberg for his assistance with the creation of the figures for this manuscript.
      Declaration of interests
      The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
      References
      • 1
        Hawkins, P.G., et al., Organ-Sparing in Radiotherapy for Head-and-Neck Cancer: Improving Quality of Life. Semin Radiat Oncol, 2018. 28(1): p. 46-52.
      • 2
        Nutting, C.M., et al., Parotid-sparing intensity modulated versus conventional radiotherapy in head and neck cancer (PARSPORT): a phase 3 multicentre randomised controlled trial. Lancet Oncol, 2011. 12(2): p. 127-36.
      • 3
        Pow, E.H., et al., Xerostomia and quality of life after intensity-modulated radiotherapy vs. conventional radiotherapy for early-stage nasopharyngeal carcinoma: initial report on a randomized controlled clinical trial. Int J Radiat Oncol Biol Phys, 2006. 66(4): p. 981-91.
      • 4
        Kam, M.K., et al., Prospective randomized study of intensity-modulated radiotherapy on salivary gland function in early-stage nasopharyngeal carcinoma patients. J Clin Oncol, 2007. 25(31): p. 4873-9.
      • 5
        Dirix, P. and S. Nuyts, Evidence-based organ-sparing radiotherapy in head and neck cancer. Lancet Oncol, 2010. 11(1): p. 85-91.
      • 6
        Lee, N., et al., Intensity-modulated radiation therapy in head and neck cancers: an update. Head Neck, 2007. 29(4): p. 387-400.
      • 7
        Eisbruch, A., et al., Parotid gland sparing in patients undergoing bilateral head and neck irradiation: techniques and early results. Int J Radiat Oncol Biol Phys, 1996. 36(2): p. 469-80.
      • 8
        Murdoch-Kinch, C.A., et al., Dose-effect relationships for the submandibular salivary glands and implications for their sparing by intensity modulated radiotherapy. Int J Radiat Oncol Biol Phys, 2008. 72(2): p. 373-82.
      • 9
        Feng, F.Y., et al., Intensity-modulated chemoradiotherapy aiming to reduce dysphagia in patients with oropharyngeal cancer: clinical and functional results. J Clin Oncol, 2010. 28(16): p. 2732-8.
      • 10
        Beadle, B.M., et al., Improved survival using intensity-modulated radiation therapy in head and neck cancers: a SEER-Medicare analysis. Cancer, 2014. 120(5): p. 702-10.
      • 11
        Boero, I.J., et al., Importance of Radiation Oncologist Experience Among Patients With Head-and-Neck Cancer Treated With Intensity-Modulated Radiation Therapy. J Clin Oncol, 2016. 34(7): p. 684-90.
      • 12
        Lee, C.C., et al., Survival rate in nasopharyngeal carcinoma improved by high caseload volume: a nationwide population-based study in Taiwan. Radiat Oncol, 2011. 6: p. 92.
      • 13
        Cilla, S., et al., Template-based automation of treatment planning in advanced radiotherapy: a comprehensive dosimetric and clinical evaluation. Sci Rep, 2020. 10(1): p. 423.
      • 14
        Batumalai, V., et al., How important is dosimetrist experience for intensity modulated radiation therapy? A comparative analysis of a head and neck case. Pract Radiat Oncol, 2013. 3(3): p. e99-e106.
      • 15
        Moore, K.L., et al., Experience-based quality control of clinical intensity-modulated radiotherapy planning. Int J Radiat Oncol Biol Phys, 2011. 81(2): p. 545-51.
      • 16
        Nelms, B.E., et al., Variation in external beam treatment plan quality: An inter-institutional study of planners and planning systems. Pract Radiat Oncol, 2012. 2(4): p. 296-305.
      • 17
        Eisbruch, A., et al., Multi-institutional trial of accelerated hypofractionated intensity-modulated radiation therapy for early-stage oropharyngeal cancer (RTOG 00-22). Int J Radiat Oncol Biol Phys, 2010. 76(5): p. 1333-8.
      • 18
        Zhong, H., et al., The Impact of Clinical Trial Quality Assurance on Outcome in Head and Neck Radiotherapy Treatment. Front Oncol, 2019. 9: p. 792.
      • 19
        Peters, L.J., et al., Critical impact of radiotherapy protocol compliance and quality in the treatment of advanced head and neck cancer: results from TROG 02.02. J Clin Oncol, 2010. 28(18): p. 2996-3001.
      • 20
        Graboyes, E.M., et al., Association of Treatment Delays With Survival for Patients With Head and Neck Cancer: A Systematic Review. JAMA Otolaryngol Head Neck Surg, 2019. 145(2): p. 166-177.
      • 21
        Rosenthal, D.I., et al., Importance of the treatment package time in surgery and postoperative radiation therapy for squamous carcinoma of the head and neck. Head Neck, 2002. 24(2): p. 115-26.
      • 22
        Wuthrick, E.J., et al., Institutional clinical trial accrual volume and survival of patients with head and neck cancer. J Clin Oncol, 2015. 33(2): p. 156-64.
      • 23
        Naghavi, A.O., et al., Patient choice for high-volume center radiation impacts head and neck cancer outcome. Cancer Med, 2018. 7(10): p. 4964-4979.
      • 24
        George, J.R., S.S. Yom, and S.J. Wang, Combined modality treatment outcomes for head and neck cancer: comparison of postoperative radiation therapy at academic vs nonacademic medical centers. JAMA Otolaryngol Head Neck Surg, 2013. 139(11): p. 1118-26.
      • 25
        Hussein, M., et al., Automation in intensity modulated radiotherapy treatment planning-a review of recent innovations. Br J Radiol, 2018. 91(1092): p. 20180270.
      • 26
        Ouyang, Z., et al., Evaluation of auto-planning in IMRT and VMAT for head and neck cancer. J Appl Clin Med Phys, 2019. 20(7): p. 39-47.
      • 27
        Fogliata, A., et al., RapidPlan head and neck model: the objectives and possible clinical benefit. Radiat Oncol, 2017. 12(1): p. 73.
      • 28
        Krayenbuehl, J., et al., Evaluation of an automated knowledge based treatment planning system for head and neck. Radiat Oncol, 2015. 10: p. 226.
      • 29
        Tol, J.P., et al., Evaluation of a knowledge-based planning solution for head and neck cancer. Int J Radiat Oncol Biol Phys, 2015. 91(3): p. 612-20.
      • 30
        Gintz, D., et al., Initial evaluation of automated treatment planning software. J Appl Clin Med Phys, 2016. 17(3): p. 331-346.
      • 31
        Giaddui, T., et al., Offline Quality Assurance for Intensity Modulated Radiation Therapy Treatment Plans for NRG-HN001 Head and Neck Clinical Trial Using Knowledge-Based Planning. Adv Radiat Oncol, 2020. 5(6): p. 1342-1349.
      • 32
        Hansen, C.R., et al., Automatic treatment planning improves the clinical quality of head and neck cancer treatment plans. Clin Transl Radiat Oncol, 2016. 1: p. 2-8.
      • 33
        Kusters, J., et al., Automated IMRT planning in Pinnacle: A study in head-and-neck cancer. Strahlenther Onkol, 2017. 193(12): p. 1031-1038.
      • 34
        Krayenbuehl, J., et al., Planning comparison of five automated treatment planning solutions for locally advanced head and neck cancer. Radiat Oncol, 2018. 13(1): p. 170.
      • 35
        Fogliata, A., et al., RapidPlan knowledge based planning: iterative learning process and model ability to steer planning strategies. Radiat Oncol, 2019. 14(1): p. 187.
      • 36
        Kamima, T., et al., Multi-institutional evaluation of knowledge-based planning performance of volumetric modulated arc therapy (VMAT) for head and neck cancer. Phys Med, 2019. 64: p. 174-181.
      • 37
        Ahunbay, E.E., O. Ates, and X.A. Li, An online replanning method using warm start optimization and aperture morphing for flattening-filter-free beams. Med Phys, 2016. 43(8): p. 4575.
      • 38
        Paradis, K.C., et al., The Fusion of Incident Learning and Failure Mode and Effects Analysis for Data-Driven Patient Safety Improvements. Pract Radiat Oncol, 2020.
      • 39
        Huq, M.S., et al., The report of Task Group 100 of the AAPM: Application of risk analysis methods to radiation therapy quality management. Med Phys, 2016. 43(7): p. 4209.
      • 40
        Biau, J., et al., Selection of lymph node target volumes for definitive head and neck radiation therapy: a 2019 Update. Radiother Oncol, 2019. 134: p. 1-9.
      • 41
        Gregoire, V., et al., Delineation of the primary tumour Clinical Target Volumes (CTV-P) in laryngeal, hypopharyngeal, oropharyngeal and oral cavity squamous cell carcinoma: AIRO, CACA, DAHANCA, EORTC, GEORCC, GORTEC, HKNPCSG, HNCIG, IAG-KHT, LPRHHT, NCIC CTG, NCRI, NRG Oncology, PHNS, SBRT, SOMERA, SRO, SSHNO, TROG consensus guidelines. Radiother Oncol, 2018. 126(1): p. 3-24.
      • 42
        Lee, A.W., et al., International Guideline on Dose Prioritization and Acceptance Criteria in Radiation Therapy Planning for Nasopharyngeal Carcinoma. Int J Radiat Oncol Biol Phys, 2019. 105(3): p. 567-580.
      • 43
        Mayo, C.S., et al., Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards. Adv Radiat Oncol, 2017. 2(3): p. 503-514.
      • 44
        Baltas, D., et al., A conformal index (COIN) to evaluate implant quality and dose specification in brachytherapy. Int J Radiat Oncol Biol Phys, 1998. 40(2): p. 515-24.
      • 45
        ICRU report 83 Prescribing, recording, and reporting photon-beam intensity-modulated radiation therapy (IMRT). J ICRU 10:35–36, 2010.
      • 46
        Younge, K.C., et al., Predicting deliverability of volumetric-modulated arc therapy (VMAT) plans using aperture complexity analysis. J Appl Clin Med Phys, 2016. 17(4): p. 124-131.
      • 47
        Olanrewaju, A., et al., Clinical Acceptability of Automated Radiation Treatment Planning for Head and Neck Cancer Using the Radiation Planning Assistant. Pract Radiat Oncol, 2021. 11(3): p. 177-184.
      • 48
        Hussein, M., et al., Clinical validation and benchmarking of knowledge-based IMRT and VMAT treatment planning in pelvic anatomy. Radiother Oncol, 2016. 120(3): p. 473-479.
      • 49
        Wang, Y., B.J.M. Heijmen, and S.F. Petit, Knowledge-based dose prediction models for head and neck cancer are strongly affected by interorgan dependency and dataset inconsistency. Med Phys, 2019. 46(2): p. 934-943.
      • 50
        Chang, A.T.Y., et al., Comparison of Planning Quality and Efficiency Between Conventional and Knowledge-based Algorithms in Nasopharyngeal Cancer Patients Using Intensity Modulated Radiation Therapy. Int J Radiat Oncol Biol Phys, 2016. 95(3): p. 981-990.
      • 51
        Hu, J., et al., Quantitative Comparison of Knowledge-Based and Manual Intensity Modulated Radiation Therapy Planning for Nasopharyngeal Carcinoma. Front Oncol, 2020. 10: p. 551763.

      Appendix. Supplementary materials