Purpose
Pretreatment quality assurance (QA) of treatment plans often requires a high cognitive workload and considerable time expenditure. This study explores the use of machine learning to classify pretreatment chart check QA for a given radiation plan as difficult or less difficult, thereby alerting the physicists to increase scrutiny on difficult plans.
Methods and Materials
Pretreatment QA data were collected for 973 cases between July 2018 and October 2020. The outcome variable, a degree of difficulty, was collected as a subjective rating by physicists who performed the pretreatment chart checks. Potential features were identified based on clinical relevance, contribution to plan complexity, and QA metrics. Five machine learning models were developed: support vector machine, random forest classifier, adaboost classifier, decision tree classifier, and neural network. These were incorporated into a voting classifier, where at least 2 algorithms needed to predict a case as difficult for it to be classified as such. Sensitivity analyses were conducted to evaluate feature importance.
Results
The voting classifier achieved an overall accuracy of 77.4% on the test set, with 76.5% accuracy on difficult cases and 78.4% accuracy on less difficult cases. Sensitivity analysis showed features associated with plan complexity (number of fractions, dose per monitor unit, number of planning structures, and number of image sets) and clinical relevance (patient age) were sensitive across at least 3 algorithms.
Conclusions
This approach can be used to equitably allocate plans to physicists rather than randomly allocate them, potentially improving pretreatment chart check effectiveness by reducing errors propagating downstream.
Introduction
Radiation treatment plans are created using a complex, iterative process. The patient's clinical history, cancer diagnosis, and normal and malignant tissue anatomy are all considered to create a personalized plan. A multidisciplinary team of professionals including physicians, physicists, and dosimetrists are involved, and every step of the process must be verified and checked multiple times. According to the current quality assurance (QA) recommendations, the physicist or dosimetrist manually performs “chart checks (ie, plan QA)” of multiple metrics for each patient's radiation plan before it can be made deliverable.
1- Ford EC
- Terezakis S
- Souranis A
- Harris K
- Gay H
- Mutic S.
Quality control quantification (QCQ): A tool to measure the value of quality control checks in radiation oncology.
This plan evaluation also involves checking many data elements and documents including (but not limited to) the treatment plan images, fused images, contours, simulation documents, treatment prescription, treatment plan parameters/reports, and prior radiation records. This can be a time intensive process, involving a high cognitive workload and, though one of the most important safety barriers, is estimated to be only 60% effective in detecting high-severity incidents.
1- Ford EC
- Terezakis S
- Souranis A
- Harris K
- Gay H
- Mutic S.
Quality control quantification (QCQ): A tool to measure the value of quality control checks in radiation oncology.
Recent American Association of Physicists in Medicine task groups have suggested methods for risk analysis, but there are currently very few concrete guidelines outlining the plan quality control process.
1- Ford EC
- Terezakis S
- Souranis A
- Harris K
- Gay H
- Mutic S.
Quality control quantification (QCQ): A tool to measure the value of quality control checks in radiation oncology.
, 2- de los Santos EF
- Evans S
- Ford EC
- et al.
Medical Physics Practice Guideline 4. a: Development, implementation, use and maintenance of safety checklists.
, 3- Tracton GS
- Mazur LM
- Mosaly P
- Marks LB
- Das S.
Developing and assessing electronic checklists for safety mindfulness, workload, and performance.
, 4- Younge KC
- Naheedy KW
- Wilkinson J
- et al.
Improving patient safety and workflow efficiency with standardized pretreatment radiation therapist chart reviews.
Instead, most departments have developed institution-specific chart check standards of practice.
5- Hoopes DJ
- Dicker AP
- Eads NL
- et al.
RO-ILS: Radiation Oncology Incident Learning System: A report from the first year of experience.
However, the standards of practice vary widely among institutions.
6- Kisling KD
- Ger RN
- Netherton TJ
- et al.
A snapshot of medical physics practice patterns.
,7- Potters L
- Ford E
- Evans S
- Pawlicki T
- Mutic S.
A systems approach using big data to improve safety and quality in radiation oncology.
Although the majority use written procedures or checklists, the details of what is reviewed or checked is heterogeneous.
4- Younge KC
- Naheedy KW
- Wilkinson J
- et al.
Improving patient safety and workflow efficiency with standardized pretreatment radiation therapist chart reviews.
,8- Fong de los Santos L
- Dong L
- Greener A
- et al.
Tu-d-201-02: Medical physics practices for plan and chart review: Results of AAPM task group 275 survey.
Automation of the plan evaluation process has been successfully used before, typically using rules- or atlas-based approaches.
9- Furhang EE
- Dolan J
- Sillanpaa JK
- Harrison LB.
Automating the initial physics chart-checking process.
,10- Pillai M
- Adapa K
- Das SK
- et al.
Using artificial intelligence to improve the quality and safety of radiation therapy.
This type of approach, however, is limited in its ability and is not easily adaptable to changes in the treatment planning process that will inevitably occur over time as technology improves.
11- Kalet AM
- Luk SM
- Phillips MH.
Radiation therapy quality assurance tasks and tools: the many roles of machine learning.
Presently, at our institution, these types of in-house QA tools augment the chart checking process by automating standardized second checks, thus improving efficiency, and reducing cognitive workload.
3- Tracton GS
- Mazur LM
- Mosaly P
- Marks LB
- Das S.
Developing and assessing electronic checklists for safety mindfulness, workload, and performance.
Artificial intelligence (AI) and machine learning (ML) are increasingly used to improve QA processes in 4 broad areas- machine QA, patient-specific QA, treatment plan review, and QA of contours. In a recent review, Luk et al
12- Luk SMH
- Ford EC
- Phillips MH
- Kalet AM.
Improving the quality of care in radiation oncology using artificial intelligence.
have reported that despite the importance of treatment plan review, fewer studies have explored the application of AI and ML models in assisting treatment plan review. Azmandian et al
13- Azmandian F.
- Kaeli D.
- Dy J.G.
- et al.
Towards the development of an error checker for radiotherapy treatment plans: a preliminary study.
developed an outlier detection model to cluster many treatment plans, and while checking a new treatment plan, the parameters of the plan were tested to check if they belonged to established clusters. If they did not belong to these established clusters, they were identified as “outliers” and brought to the attention of human chart checkers. Although the k-means clustering algorithm helps identify problematic plans, it does not provide any information on factors contributing to treatment plan complexity. Kalet et al
14- Kalet AM
- Gennari JH
- Ford EC
- Phillips MH.
Bayesian network models for error detection in radiotherapy plans.
and Luk et al
12- Luk SMH
- Ford EC
- Phillips MH
- Kalet AM.
Improving the quality of care in radiation oncology using artificial intelligence.
developed an error detection Bayesian network that mimics human reasoning processes to improve the detection of errors during the treatment plan review. Triaging radiation treatment plans as difficult and less difficult before treatment plan review is likely to optimize physicists’ cognitive workload and reduce potential errors during plan review.
15- Campbell AM
- Mattoni M
- Yefimov MN
- Adapa K
- Mazur LM.
Improving cognitive workload in radiation therapists: A pilot EEG neurofeedback study.
,16- Mazur LM
- Mosaly PR
- Hoyle LM
- et al.
Relating physician's workload with errors during radiation therapy planning.
To the best of our knowledge, no previous ML studies have examined a comprehensive array of factors to determine the degree of difficulty to check radiation treatment plans.
12- Luk SMH
- Ford EC
- Phillips MH
- Kalet AM.
Improving the quality of care in radiation oncology using artificial intelligence.
The research objective herein is to use machine learning to identify and flag difficult cases that require additional scrutiny by the physicist, potentially leading to fewer errors evading this check and propagating downstream in the clinical workflow to affect patient treatment. We used the clinical research framework presented by Park et al
17- Park Y
- Jackson GP
- Foreman MA
- et al.
Evaluating artificial intelligence in medicine: phases of clinical research.
to frame our study. Park et al
17- Park Y
- Jackson GP
- Foreman MA
- et al.
Evaluating artificial intelligence in medicine: phases of clinical research.
described clinical research in 5 phases, and our study illustrates phase 0 (ie, user needs and workflow assessment, data quality check, algorithm development and performance evaluation, prototype design) and phase 1 (ie, in silico algorithm performance optimization). The analysis demonstrates a classification algorithm that categorizes the radiation treatment plans as difficult or less difficult to QA via the physics pretreatment chart check.
Discussion
Machine learning methods are increasingly being adopted in radiation oncology, particularly within the QA space.
11- Kalet AM
- Luk SM
- Phillips MH.
Radiation therapy quality assurance tasks and tools: the many roles of machine learning.
The results presented herein add to the growing body of literature demonstrating the utility of machine learning algorithms with QA tasks. Specifically, we have shown that various machine learning algorithms operating within a voting structure can be used to classify radiation treatment plans and flag the plans that may be more difficult for physics pretreatment chart check.
The first key aspect of our approach is that it enables the use of multiple algorithms. This allows us to make the most of the strengths of each algorithm, while mitigating the effects of individual weaknesses. For instance, the adaboost algorithm was tuned in such a way that it did not perform as well on the difficult cases but outperformed the other algorithms on the less difficult cases. Although this raises concerns for overfitting, this weakness is reduced within the voting structure. On the other hand, the neural network was tuned in such a way that it did not perform as well on the less difficult cases but outperformed the other algorithms on the difficult cases (
Table 3). This resulted in less overall accuracy, the effect of which is mitigated within the voting structure. The neural network's above average accuracy on the difficult cases plays a pivotal role in boosting the overall accuracy on the difficult cases.
Another key aspect of this approach is that it allows us to select a voting schema that maximizes the combined efforts of each algorithm, tailored to the solution we are seeking, which is to maximize performance on classifying difficult cases. We recognize that mislabeling a less difficult case as difficult (false positive) is a more acceptable error than mislabeling a difficult case as less difficult (false negative). Therefore, we are particularly interested in maximizing the sensitivity of our algorithm. By using a development set, we were able to evaluate different voting schemas. We found that by labeling a case as difficult with a vote of 2 or more out of 5, we would achieve the best possible accuracy on the difficult cases in the development set. Difficult cases comprise a minority of the total cases and for the purposes of this model, we defined difficult cases to be the top 30% most difficult cases. Classification algorithms commonly struggle to predict cases that fall within a minority group. This can be seen clearly with our model as each algorithm performs better on the less difficult cases than on the difficult cases. Oversampling the training set brought the distribution of difficult and less difficult cases closer to 50/50 for training purposes, but this inherent difficulty was still apparent at the time of testing. Choosing a voting schema in which 2 out of 5 votes classified a case as difficult helped our overall model perform better on difficult cases than any individual algorithm could perform alone. This does come with reduced accuracy on the less difficult cases and, as a result, overall reduced accuracy. The tradeoff we make is increased sensitivity (ie, recall) at the price of decreased specificity. In short, the adaptability of the voting structure allowed us to pick the best schema for our task.
The capabilities and scope of machine learning solutions are often misunderstood. It is easy to identify a problem or area that needs assistance and seek a solution through machine learning methods. These algorithms, however, can fall short and in these situations, it is easy to blame the machine learning method, the lack of data, or the quality of the data, when the project design and intent may be at fault. Often, the most reliable solutions do not actually solve the problem but reduce its scope. Such is the case with our voting algorithm. It does not fully automate the chart checking process (which would be a massive undertaking), but it does assist in a meaningful way by flagging the difficult cases which may require more cognitive scrutiny.
Finally, our approach to classifying radiation treatment plans and flagging difficult plans has practical applications at the departmental and individual physicist level. This machine learning approach can enable directors/administrators of medical physics to equitably allocate difficult and less difficult cases to multiple physicists in a large academic medical center with a view to optimizing cognitive load. At the individual level, in this project we attempted to create a classification algorithm that identifies difficult cases and alerts a physicist to devote more time and attentional resources for difficult plans. The intended effect of the tool is for a physicist to take the suggested difficulty level of a plan and plan their time accordingly, but there are potential risks and benefits to using the tool. If a physicist is shown a false positive, the risk is that they may think a plan requires more scrutiny and therefore more time to check. If a physicist is shown a false negative, the risk is that the physicist may have planned to check more plans in a given time, and the suspected less difficult plan really being difficult would disrupt their schedule. We do not expect physicists to overspend or underspend effort based on model predictions; we expect them to plan their time according to predictions. To mitigate these risks, we may design the interface such that physicists will be able to see other important information (ie, most important features) along with the model prediction to help them understand how to plan their time (eg, patient age, site name). We expect that triaging treatment plans as difficult and less difficult is likely to increase physicists’ situational awareness, improve their overall performance and reduce potential errors during chart review.
The next step for this project is to implement the solution in our department. It will be used to guide our physicists in prioritizing their activities to reduce cognitive workload and improve the effectiveness of pretreatment chart checks. We plan to implement this by creating a back-end script that can extract the plan features once the dosimetrist is ready for a physics pretreatment check. Once the features have been extracted, another program, using the voting library of trained algorithms, will process the plan features and assign a difficulty rating to the plan that will be available to the physicist in the quality checklist. Before deployment, we plan to conduct usability testing, and as it is a possible risk that there may be more undetected errors after introducing the tool, we will test it with a controlled evaluation of algorithm performance with physicists (phase 2
17- Park Y
- Jackson GP
- Foreman MA
- et al.
Evaluating artificial intelligence in medicine: phases of clinical research.
). After implementation, we intend to study its effectiveness by evaluating changes in reporting of near errors/errors reported to our department's incident learning system, which catches downstream errors (phase 3
17- Park Y
- Jackson GP
- Foreman MA
- et al.
Evaluating artificial intelligence in medicine: phases of clinical research.
). Future directions may also include validating the model at our community sites, where we will evaluate generalizability.
Limitations
Although the use of multiple algorithms has its strengths, it introduces a fair amount of complexity, which increases the amount of time and computation power required during training. Using random oversampling to balance the training set also increased the chances of overfitting and limited generalization to the test set. Using a subjective rating by physicists rather than departmental criteria may be considered a limitation, but we highlight that the physicists have years of experience evaluating the difficulty of plans, making them the subject matter experts. The study was conducted in a single academic medical center with data from the institutional database, so we were unable to evaluate generalizability. Future testing at community sites will mitigate this limitation.
Article info
Publication history
Published online: April 06, 2023
Accepted:
March 26,
2023
Received:
August 2,
2022
Footnotes
Sources of support: Dr Pillai was supported by the National Institutes of Health-National Library of Medicine T15 Biomedical Informatics and Data Science Training grant (T15-LM012500).
Disclosures: Dr Chera is a coinventor on a patent application regarding a method for measuring tumor-derived viral nucleic acids in blood samples, which is owned by the University of North Carolina at Chapel Hill and licensed to Naveris. No other disclosures were reported.
Research data are not available at this time.
Copyright
© 2023 The Authors. Published by Elsevier Inc. on behalf of American Society for Radiation Oncology.