INTRODUCTION
Dysphagia is a frequent result of a stroke, brain tumor, or neurodegenerative disease. Many authors have tried to detect swallowing abnormalities (particularly aspiration) using non-radiographic observations, yet, these methods demonstrate poor sensitivity and specificity.
1,
2 The Videofluoroscopic Swallowing Study (VFSS) has been the gold standard for evaluating patients with swallowing disorders for many years.
3,
4 VFSS can detect oral, pharyngeal, and esophageal dysphagia; however, it demonstrates a limited ability in predicting the prognosis of dysphagia. Among many recent attempts to quantify and predict the prognosis of dysphagia, the Functional Dysphagia Scale (FDS), as reported by Han et al.
5 in 2001, is a useful tool, correlating well with the ASHA-NOMS (American Speech-Language-Hearing Association National Outcomes Measurement System).
6 However, despite its value in explaining the severity of dysphagia, FDS does not predict the long-term prognosis of dysphagia, which is important due to the close relationship between prolonged dysphagia, lower respiratory tract infection, and high mortality.
7,
8
The Videofluoroscopic Dysphagia Scale (VDS) can be used to predict the long-term prognosis of dysphagia patients following stroke. Han et al.
9 define the long-term prognosis of dysphagia based on the occurrence of any aspiration/penetration event after 6 months from the onset of dysphagia. VDS consists of 14 items with weighted values, and also shows good correlation with aspiration/penetration occurring 6 months after the initial onset of dysphagia. The 14 items in VDS (
Appendix 1) represent oral (lip closure, bolus formation, mastication, apraxia, premature bolus loss, and oral transit time) and pharyngeal (pharyngeal triggering, vallecular and pyriform sinus residues, laryngeal elevation and epiglottic closure, pharyngeal coating, pharyngeal transit time, and aspiration) functions that can be observed by VFSS. VDS can also express the severity of dysphagia in a quantifiable score; however, limitations regarding the subjectivity of its results have been noted in previous studies. Stoeckli et al.
10 report high interobserver reliability for some of the parameters used to evaluate aspiration and penetration, but low reliability for other oral and pharyngeal phase parameters. Although their study did not evaluate VDS, it suggests that the results of VFSS can be subjective on several parameters. Since VDS is measured based on the findings of VFSS, the results may also be dependent on the observer; furthermore, there have not been any studies on its inter-rater reliability of VDS. Therefore, in this study, we investigate the inter-rater reliability of VDS.
RESULTS
One hundred patients (59 males and 41 females) with dysphagia were enrolled, including 64 stroke patients, 13 patients with traumatic brain injury, 12 patients with head and neck cancer, 6 patients with brain tumors, and 5 patients with other diseases. The average age of the enrolled patients was 64.4±14.8 years. All of the recruited patients underwent VFSS. Inter-rater reliability of the oral phase parameters are shown in
Table 1. All of the oral phase parameters demonstrated low reliability (κ<0.4). Among the oral phase parameters, lip closure showed the highest reliability (κ=0.325), whereas premature bolus loss and oral apraxia demonstrated the lowest reliabilities (κ=0.060 and κ=0.099, respectively).
Table 1 also presents data on pharyngeal phase reliability. Pharyngeal phase parameters demonstrated higher reliability than the oral phase parameters, but the κ value was below 0.4. Aspiration showed the highest reliability of all of the tested parameters (κ=0.393). Total score reliability, in terms of the ICC, was 0.556.
DISCUSSION
The past two decades have brought an enormous widening of our knowledge about dysphagia research and treatment.
13 The most valuable and frequently used diagnostic tool for the evaluation of dysphagia is VFSS. While the VFSS protocol has been standardized for use in many research projects,
11 it also has a limited ability to predict dysphagia prognosis and provide the quantitative evaluation of dysphagia. Many physicians have tried to predict the long-term prognosis of dysphagia, and as a result, there are several studies on the long-term prognosis of dysphagia after a stroke. Delayed oral transit time, penetration, age over 70 years, poor Barthel index, and the presence of a frontal and insular cortex lesion have been suggested to indicate poor prognosis.
7,
14,
15 However, if the risk factors alone cannot explain the quantitative probability of poor prognosis of dysphagia, then, the VDS should be used to quantitatively investigate and predict the severity of dysphagia 6 months after the onset of a stroke.
9
Overall, the VDS score demonstrated low to moderate reliability in our study (0.556 in terms of ICC). However, 14 individual sub parameters, particularly the oral phase parameters, showed low reliability. A previous study conducted by Stoeckli et al.
10 reported low oral phase reliability (κ=0.15-0.56); the highest value was for lip closure (κ=0.56). Lip closure also demonstrated the highest reliability in our study (κ=0.35). Stoeckli et al.
10 reported higher values than those of our study because lip closure was classified as a binary value ("yes" or "no") in their study, without any intermediate values. Lip closure on VDS has 3 categorical values ("intact", "inadequate", and "none"); however, "inadequate" lip closure lacks an accurate definition and can be defined arbitrarily by the interpreter depending on which food material is used as the standard for evaluation. For example, if the lip closure of a patient was very good for a pureed diet but poor for the liquid diet, it might be classified as "inadequate" or "none" depending on which food material the interpreter chose to use as the standard.
Regarding the pharyngeal phase, the overall reliability was higher than the oral phase (κ=0.165-0.393 vs. κ=0.060-0.325, respectively), similar to other studies that reported higher reliability for pharyngeal phase parameters than oral phase parameters.
10 This is because many pharyngeal phase parameters have two categorical values (e.g., the triggering of pharyngeal swallowing, laryngeal elevation, the coating of the pharyngeal wall, pharyngeal transit time). Also, the pharyngeal phase parameters can be clearly seen by the VFSS. Penetration was defined as the passage of material into the larynx, but not through the vocal folds, and aspiration was defined as the passage of material through the vocal folds.
16 These pharyngeal phase findings are relatively easier to differentiate than other oral phase findings.
The total VDS score demonstrated higher reliability than the individual parameters (0.556 in terms of ICC). This is due to the dilution effect of the scores of each parameter given by the interpreters.
The overall reliability is not particularly high in our study, and we believe this is because no clear definitions exist for intermediate values VDS, even though 9 of the 14 parameters have at least 3 categorical values. For example, "intact" mastication is given 0 points and "inadequate" mastication is given 4 points according to the VDS; however, depending on how each interpreter classifies the patient's mastication function, a single patient can be given any point--either 0 or 4. Therefore, the evaluation of patients showing some poor functioning of the parameters may lack consistency from interpreter to interpreter. Second, the guidelines specifying the type of food to be used as a standard for evaluation do not exist. In our study, various types of food material were tested on each patient. Depending on which type of material was used as the standard for evaluation, VFSS findings may be classified differently for each patient. For example, patients demonstrating good swallowing of solid foods but poor swallowing of liquid foods may be interpreted differently depending on whether solid or liquid foods was used for evaluation. For future studies, there should be guidelines regarding which food materials should be used as the standard for evaluating the findings related to each parameter.
This study has an obvious limitation. The interpretation was performed only via the observation of VFSS video recordings, as it was not logistically possible to have all 10 interpreters examine each patient. If the interpreters had been allowed to clinically examine their patients, this would have improved the results of the interpretations by increasing accuracy. However, the object of this study was to evaluate inter-rater reliability of VDS based on VFSS findings. If the interpreter had predicted the findings from the clinical examination, this would have acted as a bias.
This is the first study to evaluate the inter-rater reliability of VDS. For future studies, a more precise and widely accepted study protocol will be needed. The development of such a protocol can be achieved by standardized education programs, such as interactive lecture movies or formal guidelines for interpreters. These education programs may contribute to achieving higher levels of accuracy in interpretation, and subsequently, to improving the abilities to predict the long-term prognosis of dysphagia.