A Retrospective Analysis of the Effectiveness of MRI and Ultrasound in Identifying Rotator Cuff Tears ()
1. Introduction
Rotator cuff pathology is a problem that has a negative impact on the day-to-day activities of many individuals and becomes increasingly prevalent as individuals age. Damage to the rotator cuff encompasses a very broad spectrum of pathology. It can range from inflammation, impingement, and tendinosis all the way through retracted massive full tears to the rotator cuff. The prevalence and possible range of damage necessitates cost effective and appropriate imaging techniques in order to determine the most appropriate course of treatment [1] . Multiple studies have been done which examine the accuracy of ultrasound in comparison to MRI for diagnosing rotator cuff pathology [2] [3] [4] [5] . Recent meta-analysis within this field conducted by Smith et al. consisted of 44 studies examining MRI of the shoulder with image data sets ranging from 2 to 275 per study and 62 studies of ultrasound evaluation with sensitivity and specificity analysis conducted on data sets ranging from 2 to 400 images [6] [7] . Although there are studies looking at the effectiveness of imaging in a clinic setting [8] [9] [10] [11] , many of the currently documented studies consist of a limited number of patients who fit into pre-established criteria. We wanted to display an accurate representation of all patients who underwent shoulder surgery and compare the accuracy of in-clinic performance, and surgeon read ultrasound to that of MRI across this large patient group. While eliminating exclusionary criteria introduces threats to validity and confounding variables, it also portrays an environment which more closely resembles one of the in-clinic patient care in comparison to studies conducted on a tightly controlled groups.
Due to the multiple categories of tears and the limit of specificity and sensitivity, specifically their use in only evaluating the ability to determine the presence or absence of pathology, not the extent of damage, we chose to conduct further statistical analysis into the strength of agreement between the image reports and the surgical operative reports. The weighted kappa statistic, which evaluates both the presence and extent of disagreement, was also implemented in order to give a more complete picture of agreement between image interpretation and surgical findings. The value of the weighted kappa coefficient lies in its ability to assign limited credibility to imaging diagnoses that are close to the actual surgical finding, while discrediting those that are completely inaccurate [12] . This allows for the evaluation of all three variables (no tear, partial tear, full tear) seen in the continuum of rotator cuff pathology without being forced to collapse the data into a 2 × 2 analysis.
2. Materials and Methods
Patient Data
A data set was assembled from 1000 patients who had recently undergone arthroscopic shoulder surgery. In order to preserve patient confidentiality and anonymity, identifying data regarding the patients was not collected. This data set was composed of 500 individuals who had been evaluated by MRI and 500 individuals who had been evaluated by ultrasound. Utilizing the arthroscopic shoulder surgery findings as the benchmark of accuracy (Gold Standard) the accuracy of the diagnostic imaging was evaluated by determining the sensitivity, specificity, and weighted kappa of both the MRI and Ultrasound subgroups.
Consecutive electronic patient records were reviewed for the presence of both arthroscopic surgery operative reports and image reports from either MRI or ultrasound. The surgeries were conducted by one of 12 surgeons from a single orthopedic group over a 3 year period. MRI image reports were present from multiple sources including on site, intra-organizational radiology, and regional referral sources. Ultrasound imaging was conducted on site by a single physician assistant trained in musculoskeletal ultrasonography and interpreted by a single orthopedic surgeon. Charts were continuously reviewed in a consecutive fashion until a data set of 500 patients evaluated by MRI and 500 patients evaluated by ultrasound were documented. The imaging and surgical results were initially categorized into the following three categories: no tear, partial tear, and full tear. In order to categorize the wide range of qualitative descriptions from the image and surgical reports, the following criteria were established: Report descriptions up to and including tendinosis and inflammation were included in the no tear category. Reports ranging from scuffing or fraying, up to near full thickness tears were included with partial tears. Only reports indicating full thickness tear were considered a full tear. This resulted in findings which fell into nine separate categories upon the combination of image and surgical findings. The image findings and surgical findings were organized into 3 × 3 tables and weighted kappa values were determined in order to not only measure amount of agreement but also the strength of the agreement between the two findings (Figure 1). For the purpose of sensitivity and specificity analysis the data was examined in two separate ways. First, the accuracy of diagnosing full tears was appraised by including the partial tears with the no tears (NT/PT vs. FT). Then the strength of the imaging tests for diagnosing generalized damage to the rotator cuff was evaluated by including the partial tears with the full tears (NT vs. PT/FT). This allowed us to collapse the data from three possible variables into the bivariate analysis format needed for calculation of sensitivity and specificity.
![]()
Figure 1. Sensitivity and specificity table (NT = No Tear, PT = Partial Tear, FT = Full Tear).
The 2 × 2 tables needed for sensitivity and specificity analysis were assembled as follows. For cuff damage true positives were considered to be diagnosing either a partial or full tear in the image and then finding either a partial or full tear in surgery. False positives were considered to be diagnosing either a partial or full tear by image and then finding no tear in the surgery. True negatives were only considered to be no tear images followed by no tear found in surgery. Then to calculate the sensitivity and specificity of full tears of the rotator cuff true positives were only considered to be full tears in the image interpretation confirmed by finding a full tear in surgery. False positives were considered to be full tear images followed by finding either a partial tear or no tear in surgery. True negatives were finding either a partial or no tear in the image and also a partial or no tear in surgery. The false negative group consisted of either a no tear or partial tear image interpretation followed by a full tear surgical finding.
3. Results
3.1. Data Tables and Weighted Kappa Values
The errors observed in the interpretation of MRI scans were more evenly spread with 62 instances of under diagnosis and 62 cases of over diagnosis for a total of 124 instances of disagreement. The ultrasound interpretations had only 25 instances of over diagnosis with 101 instances of under diagnosis for a total of 126 instances of disagreement. The majority of errors in ultrasound interpretation were the diagnosis of no tear followed by surgical findings of a partial tear which occurred 72 times. Weighted Kappa allows for a more accurate description of the data by counting a full tear image/no tear surgery or the inverse as a higher order error than those closer to an accurate reading. When analyzed by weighted kappa the MRI group had a value of 0.699 (95% CI of 0.65 - 0.75) and the ultrasound group had a value of 0.668 (95% CI of 0.62 - 0.72). Weighted kappa values of 0.61 - 0.80 indicate a substantial strength of agreement between observations [12] .
3.2. Sensitivity and Specificity
MRI evaluation of generalized cuff damage demonstrated 285 true positives, 41 false positives, 42 false negatives and 132 true negatives. Ultrasound evaluation of generalized cuff damage yielded 337 true positives, 11 false positives, 67 true negatives, and 85 false negatives. When diagnosing full tears, MRI observations consisted of 149 true positives, 29 false positives, 297 true negatives, and 25 false negatives. Ultrasounds for full tears consisted of 250 true positives, 28 false positives, 203 true negatives, and 29 false negatives.
Using the previous data we calculated sensitivity and specificity for both ultrasound and MRI evaluation of full tears and generalized cuff damage (Figure 2). The sensitivity and specificity of ultrasound for detecting full tears were 0.90 (95% CI of 0.85 - 0.93) and 0.92 (95% CI of 0.87 - 0.95) respectively. MRI sensitivity was 0.86(95% CI of 0.79 - 0.90) and specificity was 0.91(95% CI of 0.87 - 0.94) for full tears. There was more of a noted difference for the diagnosis of all
![]()
Figure 2. Sensitivity and specificity table.
cuff tearing. MRI had a sensitivity and specificity of 0.87(95% CI of 0.83 - 0.90) and 0.76 (95% CI of 0.69 - 0.82). Ultrasound demonstrated a sensitivity and specificity of 0.80 (95% CI of 0.76 - 0.84) and 0.86 (95% CI of 0.76 - 0.92).
In order to test the level of statistical significance of the findings ratios of sensitivity and specificity were conducted and 95% confidence intervals were calculated to determine the significance at an α of 0.05. In the analysis of full tears the ratio of sensitivities was MRI/US = (149/174)/(250/279) = 0.86/0.90 = 0.96 with a 95% CI of 0.89 - 1.03. Based on an alpha of 0.05, there is not a significant difference between the MRI and US sensitivity values. The ratio of specificities was MRI/US = (297/326)/(203/221) = 0.91/0.92 = 0.99 with a 95% CI of 0.94 - 1.04. Based on an alpha of 0.05, there is not a significant difference between the MRI and US specificity values. When analyzing cuff damage including both partial tears and full tears the ratio of sensitivities was MRI/US = (285/327)/(337/422) = 0.87/0.80 = 1.09 with a 95% CI of 1.02 - 1.16. Based on an alpha of 0.05, there is a significant difference between the MRI and US sensitivity values. The sensitivity of the MRI is greater than the US. The ratio of specificities was MRI/US = (132/173)/(67/78) = 0.76/0.86 = 0.89 with a 95% CI = 0.79 - 1.004 we carried the confidence interval out to three places to note that in the instance of one more true negative ultrasound there would have been a statistically significant difference however, based on an alpha of 0.05, there is not a significant difference between the MRI and US specificity values.
4. Discussion
Much research has been done examining the effectiveness of musculoskeletal ultrasound in diagnosing rotator cuff tears, and ultrasound has been shown to be of similar effectiveness to MRI in diagnosing rotator cuff pathology. However, often these studies examine the implementation in tightly controlled formats that do not fully reflect the environment in which many patients receive care. The purpose of our study was to conduct a study with a large N and limit the exclusionary criteria in order to portray an accurate representation of the complete population receiving treatment for shoulder pathology. We included surgeries done to repair other shoulder injury and damage in order to screen for potential false negative image interpretation found during surgery. Our results confirm findings of other studies showing that ultrasound is effective as either an alternative or complement to MRI.
Threats to the validity of our data include the fact that while our ultrasound data set consisted of imaging performed by a single individual and interpreted by one surgeon, our MRI data set consisted of MRI images executed and interpreted from a very diverse number of sources. It is also important to consider the impact of alternative variables that could lead to changes in the accuracy of the image interpretation. While examining the data there were several variables that appeared to affect the agreement of the image interpretation and surgical observations and warrant further analysis. These variables included age of the patient, time elapsed between image and surgery, and body mass index. When considering reliability of false negative results of the clinical application of both ultrasound and MRI, it is important to consider the possible progression of damage that may occur between image and surgery while either waiting for an upcoming surgery date or attempting to undergo conservative management of partial tears or tendinosis. Further statistical evaluation will be done to probe for significant correlation between these variables and accuracy of image interpretation.