Ⅰ. Introduction
Pre-treatment patient-specific delivery quality assurance (DQA) is important for accurate dose delivery in advanced radiotherapy techniques, such as intensitymodulated radiation therapy (IMRT), volumetric modulated arc therapy, helical tomotherapy (HT), and stereotactic body radiation therapy [1-3]. Therefore, quality control (QC) is important to ensure that the mechanical and dosimetric performance of the treatment machine is within acceptable tolerance levels for delivering doses to patients [4,5].
HT is an IMRT delivery system that delivers a radiation beam with a high degree of conformity and image guidance using megavoltage computed tomography (CT) [6,7]. It uses a 6 mega voltage (MV) linear accelerator mounted on a gantry and the system is modulated using a 64 binary multileaf collimator (MLC) [7]. Various treatment planning parameters can influence the DQA and treatment plan quality for HT, such as pitch, field width (FW), leaf open time (LOT), planning, and actual modulation factor (MF) [6].
In the DQA process, QC quantifies the difference between the measured and calculated doses and dose distributions. Statistical process control (SPC) is a statistical method of QC for monitoring and controlling a process to ensure that it operates according to its recommended tolerance levels. It evaluates the variability and stability of the process [8], and involves the application of statistical and graphical tools to document, correct, and improve the process performance [9]. The most well-known SPC tools are the individual (X) and moving range (MR) charts, which are used to monitor individual values and variation of a process based on samples taken from a process over time. The individual chart represents individual measurements and the MR chart shows variability between two consecutive measurements [10]. These are designed to reduce variation in the process over time. Process capability analysis is used to measure the ability of a process to operate within its specified range [11].
Recently, several studies have presented the use of SPC to analyze treatment machine performance, IMRT QA, and tomotherapy quality assurance (TQA) in radiation therapy [12-18]. However, many institutions still perform DQA with EBT film and ionization chamber (IC). To the best of our knowledge, the current literature reports no studies on comprehensive statistical analyses such as control charts, capability analysis, classification, and regression tree (CART) algorithm to evaluate the impact of various treatment planning parameters on DQA failure in HT.
The purpose of this study was to evaluate the upper and lower control limits (UCL and LCL) of treatment planning parameters using EBT film based delivery DQA results, and to study the availability of SPC in HT patients.
Ⅱ. Materials and methods
1. Patient characteristics
To analyze the DQA results, 152 patients with passed and failed DQA measurements were randomly selected for this study (Table 1). Prostate (n=66), rectal (n=51), and large-field cancer patients, including lymph nodes (n=35), were included in the study (Table 2-4). All selected patients were treated with tomotherapy (TomoHDA; Accuray Inc., Sunnyvale, CA, USA).
2. Patient-specific DQA
Treatment planning for all patients were done using the HT planning station (Accuray Inc., Sunnyvale, CA, USA). A cylindrical solid water phantom (“Cheese phantom” Accuray Inc., Sunnyvale, CA, USA) was selected for all the treatment plans to create the DQA plan. The center of the IC (Exradin A1SL, Standard Imaging, Middleton, WI, USA) was positioned in the cylindrical solid water phantom and was moved to the low-dose gradient region in the target volume. A cheese phantom with an IC and Gafchromic EBT3 film (International Specialty Products, Wayne, NJ, USA) was used to measure the absolute dose and gamma values for all the HT plans [18]. The differences between the calculated and measured point doses and dose distributions were computed using Tomotherapy DQA software (Accuray Inc., Sunnyvale, CA, USA). The absolute point dose difference (DD) and global gamma passing rate (GPR) were analyzed for all patients. The 10% of global maximum was set to analyze of all measurement. The DD was within the tolerance range of ±5% for all measurements. The GPR with 3%/3 mm criteria was analyzed for all of the measurements. If one criterion failed, the DQA was considered to have failed [19].
3. Control charts
Control charts were used to evaluate the UCL and LCL for all the assessed treatment planning parameters. The center line (CL) was defined as the average (X) of each treatment planning parameter. The UCL and LCL values were set at three standard deviations from the CL, implying that 99.7% of the data points fell within the control levels for a normally distributed dataset, as shown in equations (1-3) [11-13].
In the above equations, (X) is the average, R is defined as the range for each parameter, mR is the average of the moving range or absolute difference between two consecutive measurements, and d2 is a constant dependent on a continuous set of n measurements. The dataset was considered as one group for each analysis; therefore, n = 1 and the constant d2 = 1.128 [11-13].
4. Normal distribution test
The Anderson-Darling test was used to test the hypothesized distribution F*(xi) for normality, according to the following equation:
where F*(xi) is the cumulative distribution function of the normal distribution, Xi is the ordered data, and n is the sample size [20]. A non-parametric statistical analysis method was used when data was not normally distributed. A null hypothesis (H0) was rejected or accepted based on the variation in the set p-value significance for hypothesis (H). In this study, the p-value was set at a significance level of 5% using the Minitab program (Minitab Inc., State College, PA, USA).
5. Process capability analysis
Process capability analysis was used to measure the ability of a process to operate within a specified range, which is defined as the ratio of the specification level to the operating range of the process [12]. The process was analyzed using the capability (Cp) and acceptability (Cpk) ratio to evaluate the system processes for nominal upper and lower user-specified limits (USL and LSL). The USL and LSL used for the calculation of Cp and Cpk were 100% and 90% for GPR, and +3% and -3% for DD, respectively. Cp was used to measure the potential in the variation process to produce results within the specification level, using Equation 5. Additionally, Cpk was used to show how closely the process center was relative to the specified level, using Equation 6. The two capability indices were defined as normal distributions using the following equation:
where represents the average process value, and σ is the standard deviation. A Cp value of 1.0 indicates that the process is within the action limits, and Cp value greater than 1.0 indicates that the process is well within the specification limits. Cp value less than 1.0 indicates that the process is outside the tolerance range for a given action limit [14]. If the process is centered about the baseline, the Cp and Cpk are equal. This was calculated to identify if the existing dataset were within user specified tolerances [14]. The Minitab program (Minitab Inc., State College PA, USA) was used to calculate the normality, capability, and acceptability values from the measurement data.
Ⅲ. Results
1. Analysis of treatment planning parameters and DQA results
The treatment planning parameters for the 152 patients in the passing and failing groups are summarized in Table 1. There were 111 (73.0%) and 41 (27.0%) patients who passed and failed DQAs, respectively. The average LOT in the passing and failing groups were 23.34% and 30.81%, respectively. The average DD and GPR in the passing and failing groups were 0.80% and 94.58%, and 1.16% and 79.28%, respectively (Table 1).
Tables 2-4 summarize the various treatment planning parameters for the 152 patients and show the patient characteristics for each case and the characteristics of the planning parameters such as prescription dose, FW, pitch, MF, LOT, and treatment time.
2. Statistical process control and process analysis
Fig. 1 shows the results of the Cp and Cpk for DD and GPR for prostate, rectum, and large-field patients. For prostate patients, Cp of GPR and Cp and Cpk of DD were over 1, and it was stable for the DQA process. For rectal and large patients, the Cpk and Cp values were below 1, except for Cp of DD. Lower Cp and Cpk values for each anatomical site were considered as treatment planning parameters for the DQA pass.
Fig. 2 shows individuals control charts for all anatomical patients, respectively. The top row shows control charts the LOT, DD and GPR for prostate patients. The middle row shows control charts the LOT, DD and GPR for rectal patients. The bottom row shows control charts the LOT, DD and GPR for large-field patients. For all patients, the DQA results (DD and GPR) were within the UCL and LCL. For all patients, LOT control charts of almost all measurements were within the UCL and LCL; however, some did not fall within the UCL and LCL.
3. Analysis of the significant variables for DQA results in clinical patients
1) Prostate patients
Table 2 shows the treatment planning parameters and DQA results for passed and failed patients of prostate patients. For passed patients, the DDs and GPR were 1.42% and 93.33%, respectively, while for failed patients, they were 1.62% and 78.69%, respectively. Fig. 3 shows the variable importance of treatment planning parameters influencing DQA result for prostate cancer. Couch travel, target volume, and LOT were significant top 3 parameters of DQA failure, as shown in Fig. 3. The couch travels were 15.67 cm and 21.99 cm, and target volumes were 459.20 cc and 721.76 cc, for passed and failed patients, respectively. Accuray Vendor recommend that the percentage of LOT below 100 ms is maintained at less than 30% because of the risk of increased MLC errors [22]. For passed and failed patients, the percentages of patients with LOT below 100 ms were 26.86% and 30.46%, respectively (Table 2).
2) Rectal patients
For passed patients, the DDs and GPR were 0.22% and 95.28%, respectively, while for failed patients, they were 0.26% and 91.22%, respectively (Table 3). GP, couch travel, and LOT were the significant top 3 parameters of DQA failure, as shown in Fig. 4. The GPs were 19.89 s and 21.99 s, couch travels were 16.14 cm and 16.89 cm, and the percentages of patients with an LOT less than 100 ms were 18.95% and 20.71%, for passed and failed patients, respectively (Table 3).
3) Large-field patients
For passed patients, the DDs and GPR were 0.70% and 94.76%, respectively, while for failed patients, they were 1.28% and 85.68%, respectively (Table 4). LOT, actual MFl, and fractional dose were the significant top three parameters of DQA failure, as shown in Fig. 5. The percentages of patients with LOT less than 100 ms were 24.57% and 44.98%, the actual MFs were 1.77 and 2.03, and the fractional doses were 239.17 cGy and 182.00 cGy for passed and failed patients, respectively (Table 3).
Ⅳ. Discussion
In this study, we analyzed the results from SPC to provide upper and lower control limits for planning parameters based on DQA results, and presented the variable importance of treatment planning parameters influencing HT DQA using CART. The pass patients in the DQA for each clinical case were analyzed, and the range of each planning parameter is summarized in Table 2-4. A range of treatment planning parameters using the passing DQA results for each anatomical site is presented. Additionally, we confirmed that the passing probability of DQA was higher when the percentage of LOT less than 100 ms was lower than that of the proportion of DQA failure.
In the present study, we investigated the usefulness of the SPC statistic method for detecting out-of-control measurements based on the DQA results for the HT system, as shown in Figs. 1 and 2. A previous study by Breen et al. only investigated the results of SPC to develop action levels for head and neck treatment plans [12], and another study reported that there were no data for DQA failing patients [14]. Mezzenga et al. evaluated the results of the HT output using SPC [15]. Our study results demonstrated the ranges of acceptable DQA for prostate, rectal, and large-field patients [12, 14, 15]. Although the number of patients included in this study was smaller than that in previous research, we confirmed the possibility of reducing DQA failures by providing an acceptable DQA range for each treatment site using the SPC method. Therefore, we collected DQA data to provide a range of more accurate and acceptable DQA for each treatment site.
Accuray Vendor recommended that the percentage of LOT below 100 ms be maintained below 30% because of the risk of increased MLC errors and DQA failures [22]. The percentages of LOT below 100 ms in all patients are shown in Table 1. The average LOT in the passing and failing groups were 23.34% and 30.81%, respectively (Table 1), which are consistent with previous studies [14,18]. The LOT for the passing and failing groups in rectal patients were 18.95% and 20.71%, respectively (Table 3). Although these results are not consistent with those of previous studies [14,18], our study confirmed that the proportion of rectal patients with LOT below 100 ms was higher in the failing group than the passing group [Table 3 and Fig. 4(a)]. Conversely, we confirmed that the LOT value for the passing group was higher than that of the failing groups [18]. This was because the number of failing patients (n = 3) was small, whereas the number of failing patients in this study was 11. Therefore, statistical accuracy could be improved by increasing the number of included patients.
Binny et al. investigated the guidelines of seven planning parameters (%LOT below 100 ms, number of projections, MF, GP, couch travel, sinogram segments, and durations) using SPC in brain, head & neck, and pelvic patients [14]. They found that the %LOT below 100 ms values in the brain, head & neck, and pelvis patients were 27 ± 41%, 19 ± 44% and 37 ± 70%, respectively. In addition, they showed that DD and GPR variations versus LOT for the three sites were not correlated [14]. In another study, although the number of patients was only six, it was reported that a small leaf opening time increases the probability of DQA failure [23]. However, in the previous paper, there were no data for failure patients based on EBT film DQA results for prostate, rectal, and large-field patients. Therefore, we did not compare and evaluate DQA passing and failure patterns in this study. However, we confirmed the applicability of the SPC method used by previous researchers for HT system [14-16].
Further, we confirmed that couch travel was a significant variable of DQA failure in prostate and rectal patients, as shown in Figs. 3 and 4. These results are consistent with a previous study [18]. Although treatment time, total MUs, and target volume were not ranked high as relatively important variables for DQA failure, these values were higher than those of the DQA passing groups (Table 2-4). We believe that if the target is long and large in the axial direction, the couch would move significantly in the longitudinal direction, and the DQA result may be relatively low because of the uncertainty of movement or the influence of scatter. In addition, because the MU increases as the dose increases in general, it is assumed that the uncertainty may increase because the movement of the MLC increases [18,23].
The average GPs were 18.17 s and 17.64 s, for passed and failed patients, respectively (Table 1). Westerly et al. suggested that the optimal pitch should be considered so that the GP is at least 15 s to reduce the error risk of the MLC [23]. We found that the GP value was a significant variable that affected DQA failure in large-field patients (Fig. 4), and the average GPs for the passing and failing groups were 18.02 s and 15.49 s (Table 4), whereas, for rectal patients, they were 19.89 s and 22.66 s, respectively (Table 3). The reason for the opposite results, as described above, is that other treatment planning parameters, such as treatment time or target volume, may have affected the DQA failure in the HT system. HT plans are quite complicated owing to the many treatment planning parameters, and it is difficult to analyze the influence of each treatment parameter [18]. Therefore, it is necessary for medical physicists and dosimetrists to clearly define and understand the effects of various treatment planning parameters when establishing a treatment plan [18].
The values of the treatment planning parameters can be modified prior to treatment to improve the probability of DQA pass using SPC. Because the HT planning system has a variety of treatment planning parameters, it is time-consuming and labor-intensive to determine re-planning and patient-specific QA to modify these various parameters. Therefore, we believe that dosimetrists can predict the DQA results in advance by following the range of acceptable DQA of each treatment planning parameter for the anatomical site.
This study has several limitations due to its retrospective design. The number of patients in this study (152 patients) was relatively smaller than the number of patients in previously published papers to compare and evaluate SPC for a long-term period [8,12,15]. Thus, we are collecting DQA data and developing a model that can predict DQA results. This finding requires further validation in a study with a larger number of patients. In addition, all DQA measurements were performed with EBT3 film, and the effect of scanner uncertainties such as warming of the scanner lamp, film homogeneity, scan-to-scan stability, long-term stability of the scanner, light scattering, film calibration, phantom setup, measurement position, and human errors may decrease the DQA accuracy in the present study [18,24].
Ⅴ. Conclusion
We investigated the usefulness of the SPC statistic method to detect out-of-control measurements based on DQA results for the HT system. The SPC method was suggested to reduce the probability of DQA failure. With reference to these acceptable DQA ranges, the values are given so that the planning parameters can be considered prior to treatment to reduce DQA failure. Although it is difficult to use statistical analyses in routine clinical practice, this study may contribute to the verification of DQA patterns in HT systems.