The Validity and Reliability of an Arabic Version of the STOP-Bang Questionnaire for Identifying Obstructive Sleep Apnea
Ahmed S. BaHammam*, Alaa M. Al-Aqeel, Alanoud A. Alhedyani, Ghaida I. Al-Obaid, Mashail M. Al-Owais, Awad H. Olaish
Identifiers and Pagination:Year: 2015
First Page: 22
Last Page: 29
Publisher ID: TORMJ-9-22
Article History:Received Date: 13/12/2014
Revision Received Date: 24/2/2015
Acceptance Date: 24/2/2015
Electronic publication date: 27/2/2015
Collection year: 2015
open-access license: This is an open access article licensed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted, non-commercial use, distribution and reproduction in any medium, provided the work is properly cited.
Obstructive sleep apnea (OSA) is a common, serious, under-recognized and under-diagnosed medical disorder. Polysomnography (PSG) is the gold standard diagnostic test for OSA; however, the cost of testing and the shortage of sleep disorders laboratories limit access to this tool. Therefore, there is a need for a simple and reliable diagnostic tool to screen patients at risk of OSA.
This study was conducted to evaluate the validity and reliability of an Arabic version of the STOP-Bang questionnaire (SBQ) as a screening tool for OSA.
This study was conducted in three steps, as follows: Step 1: the SBQ was translated from English to Arabic (examining both forward and backward translations); Step 2: the test-retest reliability of the questionnaire was investigated; and Step 3: the questionnaire was validated against PSG data prospectively on 100 patients attending a sleep disorders clinic who were subjected to a full-night PSG study after completing the translated version of the SBQ. The validity of the test was tested against the apnea-hypopnea index (AHI).
The study group had a mean age of 46.6 ± 14.0 years and a mean AHI of 50.0 ± 37.0/hour. The study demonstrated a high degree of internal consistency and stability over time for the translated SBQ. The Cronbach’s alpha coefficient for the 8-item tool was 0.7. Validation of the SBQ against the AHI at a cut-off of 5 revealed a sensitivity of 98% and positive and negative predictive values of 86% and 67%, respectively.
The Arabic version of the SBQ is an easy-to-administer, simple, reliable and valid tool for the identification of OSA in the sleep disorders clinic setting.
Obstructive sleep apnea (OSA) is a serious, relatively common sleep disorder characterized by recurrent episodes of cessation of breathing during sleep due to upper airway narrowing and closure . The Sleep Health Heart Study, a prospective study of adults aged over 40 years, found that approximately 17% of the subjects studied had clear evidence of OSA . The National Sleep Foundation poll in 2005 reported that as many as 25% of American adults are at high risk of OSA , and a survey of middle-aged Saudi men found that approximately 30% of this population is at high risk of OSA [1, 4]. Recent studies indicate that OSA is associated with a significant increase in cardiovascular and cerebrovascular morbidity and mortality [5, 6]. The numbers of referrals for OSA evaluation have increased as awareness among the public and health care providers has increased. As a result, the waiting time for polysomnography (PSG) has increased significantly . Studies have suggested a more than 10-year delay between OSA symptom onset and the performance of a diagnostic overnight sleep study in Saudi women with OSA . One of the causes of diagnostic delay is the lack of availability of simple and easily accessible diagnostic tools for primary care providers . Because PSG is an expensive, labor-intensive and time-consuming procedure, patients often face long waiting times before studies can be performed. The availability of a simple, validated and reliable screening tool that can stratify patients by their risk of having OSA will allow practitioners to prioritize the referral of patients at high risk of OSA to sleep disorders laboratories. Several clinical scoring systems have been designed and tested; however, many of these scoring systems involve complicated mathematical calculations and are not designed to be readily accessible to physicians outside the sleep medicine field. Thus, the use of such screening tools has been limited .
The STOP-Bang questionnaire (SBQ) is a self-administered, simple and validated questionnaire that detects OSA with high sensitivity . SBQ has been shown to have superior predictive value compared with other commonly used questionnaires, such as the Epworth Sleepiness Scale (ESS) and the Berlin questionnaire (BQ) . However, no Arabic version of the SBQ currently exists. Therefore, we sought to produce an equivalent Arabic version of the SBQ and to evaluate its reliability and validity in detecting patients at risk for OSA.
The study sample consisted of 100 consecutive patients referred for any reason to the sleep disorders clinic at the University Sleep Disorders Center (USDC) at King Saud University during the period from October 2013 to April 2014. The USDC receives patients with a variety of sleep disorders (e.g., insomnia, hypersomnia, parasomnias and sleep-disordered breathing). Consecutive patients who agreed to participate were included regardless of the reason for referral or of whether the patient was clinically suspected to have a specific sleep disorder. Subjects of either sex between the ages of 18 and 75 years were eligible to participate in the study. The exclusion criteria included the following: illiteracy; chronic anxiolytic or sedative drug use; a history of renal, hepatic, pulmonary, cardiovascular or neuromuscular disease; and upper respiratory tract infection within the past three weeks. The study was approved by the College of Medicine Institutional Review Board (IRB), and informed consent was obtained from all participants.
Study Design and Data Collection
Demographic data, including name, age, gender, height (cm), weight (kg), body mass index (BMI) and neck circumference, were collected for each study subject. The ESS, which is a specialized, validated sleep questionnaire containing eight items that ask for self-reported disclosure of the expectation of dozing in a variety of situations, was used to assess sleepiness . ESS scores of ≥ 10 were considered to indicate sleepiness .
STOP-Bang Questionnaire (SBQ)
The SBQ is a self-administered questionnaire that consists of eight questions scored based on Yes/No answers (scored as 1/0) . The eight items enquire about the presence of snoring, tiredness, witnessed pauses in breathing while asleep, a diagnosis of high blood pressure, a BMI greater than 35 kg/m2, a neck circumference > 40 cm, and the subject’s age and gender. Thus, the score ranges from a value of 0 to 8. A score of 5-8 indicates a high risk of OSA, a score of 3 or 4 indicates an intermediate risk of OSA, and a score of 0-2 indicates a low risk of OSA .
Step I: STOP-Bang Questionnaire Translation
To develop the Arabic version of the SBQ, the following steps were taken: translation into Arabic; translation back into English; and finally, comparison of the back translation with the original English version by a committee of bilingual individuals . Two independent translators translated the original questionnaire into Arabic, and two other independent certified English translators who were blinded to the original documents performed the back translations. Then, the two translations were compared with the original SBQ. A committee of bilingual experts (staff members of the College of Medicine, King Saud University) made the necessary adjustments and approved a final Arabic version after verifying the consistency of the forward and backward translations. A pilot study of a sample of 10 subjects (not included in the final analysis) was conducted to ensure that the final draft was clear, understandable and acceptable. After completing the translation process, reliability and validity testing were initiated.
Step II: Test-Retest Reliability
The Arabic version of the SBQ was self-administered by the patients. The overall score for each patient was based on the patient’s responses to each of the eight items of the SBQ. Evaluation of each patient’s BMI was performed in the clinic. A retest session was arranged after 4-5 weeks if the participant was in stable clinical condition.
Step III: Validity
After analyzing the results of the Arabic version of the SBQ, we administered PSG tests to validate our results. All patients who completed the questionnaire underwent an overnight in-laboratory level I attended diagnostic sleep study (PSG), regardless of their score on the SBQ. The following physiological parameters were monitored during the sleep study: electroencephalogram (EEG; C3A2, C4A1, O1A2, O2A1), electrooculogram (EOG), electromyogram (EMG) of the chin and lower limbs, respiratory efforts (thoracic and abdominal belts), airflow through the mouth and nose (thermistor and nasal prong pressure transducer), sleep position (body-position sensor), snoring (microphone) and oxygen saturation. The PSG recording was performed using Alice® 6 diagnostic equipment (Philips, Respironics Inc., Murrysville, PA, USA). Manual scoring of the electro-nic raw data was performed by experienced, certified sleep technologists in accordance with the American Academy of Sleep Medicine (AASM) Task Force recommendations . Those who performed the PSG and those who interpreted the results were blinded to the results of the questionnaire. Moreover, the interpreters of the PSG were blinded to the patients’ clinical histories. Apnea was defined as a drop in the peak thermal sensor excursion greater than or equal to 90% of baseline for at least 10 seconds . The event was scored as obstructive apnea if continued respiratory effort was present or as central apnea inspiratory if effort was absent throughout the entire period during which airflow was absent . Hypopnea was defined as a reduction in airflow of ≥ 30% of baseline that lasted for at least 10 seconds and resulted in either a ≥ 3% decrease in oxygen saturation from the pre-event baseline or an arousal . The apnea-hypopnea index (AHI) is a score of the severity of OSA. The AHI score indicates the number of apneas and/or hypopneas per hour of sleep. The severity of OSA, as measured with laboratory PSG, was classified based on AHI values as follows: 5-15, mild OSA; 15-30, moderate OSA; and > 30, severe OSA .
Data were expressed as means ± SD or number (n; %). Cronbach’s alpha coefficient was calculated and used to measure the internal consistency of the Arabic version of the SBQ. Intraclass correlation coefficients together with either Pearson’s correlation coefficients or Spearman’s rank-order coefficients were used to evaluate the test-retest reliability. Cronbach’s alpha coefficient was used to measure the internal consistency of the Arabic SBQ. Coefficients with values > 0.7 were considered acceptable. The receiver-operating characteristic (ROC) curves between the SBQ scores and the PSG AHI scores were assessed at AHI cut-off values of 5, 15 and 30. To assess the extent of the rise of the ROC curve to the upper left-hand corner, the area under the curve (AUC) was measured. In general, a steeper rise of the curve corresponded with better test results. An area of 1 represents perfect agreement, and an area of 0.5 represents the lowest possible agreement. In this study, we adopted the classification of AUC values used by Erman et al.; in this classification, 0.9 to 1 is considered excellent, 0.8 to 0.9 is considered very good, and 0.7 to 0.8 is considered good . The sensitivity and specificity of the test, positive predictive values (PPV), negative predictive values (NPV) and positive and negative likelihood ratios (LR) were calculated for the same cut-off values of the AHI. In general, a likelihood ratio of < 1 indicates that the test result is associated with the absence of disease, whereas a likelihood ratio > 1 indicates that the test result is associated with the presence of disease. Likelihood ratios below 0.1 and above 10 are considered to provide strong evidence to rule out or rule in diagnoses, respectively. Values of p < 0.05 were considered statistically significant. Standard statistical software (IBM SPSS, version 21.0, Armonk, New York, USA) and Systat SigmaPlot, version 13, San Jose, CA, USA) were used for data management and analysis.
A total of 100 patients (61% males) prospectively completed the SBQ and then underwent in-laboratory PSG. All SBQ were fully completed by the participants. Table 1 presents the general and demographic characteristics, PSG diagnoses and SBQ test and retest output for the study participants. The recruited group had a mean AHI score of 50.0 ± 37.0/hour. Patients were categorized as follows, based on the SBQ results: at low risk for OSA (n = 17), at intermediate risk (n =30) and at high risk (n =53). Patients in the high-risk group were older and had a higher BMI. There were no differences in the ESS score between high-risk and low-risk groups (Table 2). Moreover, there was no difference in the ESS score between patients with AHI < 5 and AHI ≥ 5/hour (11.17 ± 8.40 vs 11.23 ± 6.53, respectively).
Reliability of the Arabic version of the STOP-Bang Questionnaire
Table 3 presents the test-retest intraclass correlation. The mean scores of the SBQ for the test and retest sessions were 4.4 ± 1.7 and 4.5 ± 1.7, respectively. There was minimal variability between items, and the intraclass correlation of the total score was strong, with a value of 0.96 for the total score (p < 0.001).
General and demographic characteristics, PSG diagnosis and STOP-Bang questionnaire test and retest output.
|Characteristics||n = 100 (%)|
|Age (year)||46.60 ± 14.00|
|BMI (kg/m2)||34.40 ± 7.80|
|Neck (cm)||38.00 ± 3.81|
|AHI||50.00 ± 37.00|
|ESS||11.23 ± 6.60|
|Test Scores||4.33 ± 1.70|
|Retest Scores||4.50 ± 1.70|
|Gender (male)||61 (61.0)|
|No OSA||6 (6)|
|Mild OSA||9 (9)|
|Moderate OSA||22 (22)|
|Severe OSA||63 (63)|
|SBQ test output|
|High risk||53 (53)|
|Intermediate risk||30 (30)|
|Low risk||17 (17)|
|SBQ retest output|
|High risk||58 (58)|
|Intermediate risk||28 (28)|
|Low risk||14 (14)|
PSG: polysomnography; BMI: body mass index; AHI: apnea-hypopnea index; OSA: obstructive sleep apnea.
Demographic characteristics of groups with a low risk of OSA and at risk for OSA obtained using the Arabic STOP-Bang questionnaire.
|Variable||Arabic STOP-Bang Questionnaire||p-Value|
n = 17 (%)
n = 83 (%)
|Age||36.53 ± 14.20||48.70 ± 13.01||0.001|
|Gender (male)||9 (52.9)||52 (62.7)||0.455|
|BMI||27.43± 5.10||35.90 ± 7.50||< 0.001|
|BMI ≥ 25||29.94 ± 2.91||36.70 ± 7.00||0.001|
|Neck||13.52 ± 1.4||15.30 ± 1.34||< 0.001|
|ESS||9.71 ± 7.10||11.54 ± 6.50||0.299|
OSA: obstructive sleep apnea; BMI: body mass index; ESS: Epworth sleepiness scale.
Cronbach’s alpha coefficient was 0.7 for the 8 items of the Arabic SBQ. This was within an accepted range of internal consistency. Table 4 presents the inter-item correlation matrix between the test and retest items. With the exception of tiredness, fatigability and sleepiness during the daytime, the correlation coefficients for the test and retest values for all items were ≥ 0.80, with a correlation coefficient for the total score of 0.92.
Test-retest intraclass correlations.
Mean ± SD
Mean ± SD
|Cronbach's Alpha||Intraclass Correlations||p-Value|
|Do you snore loudly (louder than talking or loud enough
to be heard through closed doors)?
|0.70 ± 0.50||0.72 ± 0.50||0.90||0.90||< 0.001|
|Do you often feel tired, fatigued, or sleepy during the daytime?||0.92 ± 0.30||0.92 ± 0.30||0.74||0.80||< 0.001|
|Has anyone observed you stop breathing during your sleep?||0.60 ± 0.50||0.63 ± 0.50||0.90||0.90||< 0.001|
|Do you have or are you being treated for high blood pressure?||0.50 ± 0.50||0.50 ± 0.50||0.92||0.93||< 0.001|
|Is your body mass index greater than 35?||0.41 ± 0.50||0.9 ± 1.00||0.96||0.96||< 0.001|
|Is your age greater than 50?||0.50 ± 0.50||0.50 ± 0.50||0.97||0.97||< 0.001|
|Does your neck measure more than 16 in / 40 cm around?||0.23 ± 0.42||0.20 ± 0.40||0.92||0.92||< 0.001|
|Gender male?||0.61 ± 0.50||0.61 ± 0.5||1.00||1.00||< 0.001|
|Total test scores||4.40 ± 1.70||4.50 ± 1.70||0.96||0.96||< 0.001|
Validation Against the AHI
Inter-item correlation matrix between test and retest items.
|Do you snore loudly (louder than talking or loud enough to be heard through closed doors)?||0.80|
|Do you often feel tired, fatigued, or sleepy during the daytime?||0.60|
|Has anyone observed you stop breathing during your sleep?||0.82|
|Do you have or are you being treated for high blood pressure?||0.90|
|Is your body mass index greater than 35?||0.92|
|Is your age greater than 50?||0.94|
|Does your neck measure more than 16 in / 40 cm around?||0.90|
The correlation is significant at the 0.05 level (2-tailed).
The prevalence of AHI ≥ 5 among patients attending the sleep disorders clinic was 94%; the prevalence of mild OSA was 9%, the prevalence of moderate OSA was 22%, and the prevalence of severe OSA was 63%. Approximately 97.6% of the patients who were classified by the SBQ as at risk for OSA had AHI ≥ 5 during PSG (Table 5). In addition, a good, positive and highly significant correlation was found between SBQ and AHI (Fig. 1). The Arabic SBQ classified 53% of the study participants as at high risk for OSA, which correlates well with the PSG findings that categorized 63% of cases as severe OSA (Table 1). Table 6 presents the sensitivity, specificity, PPV, NPV and positive and negative LR for the SBQ at different cut-off values of the PSG AHI. Validation of the SBQ against the AHI at a cut-off of 5 revealed a sensitivity of 98% and a PPV and NPV of 86% and 67%, respectively. In addition, a clear rise in the ROC curve to the upper left-hand corner was observed (Fig. 2).
The Arabic version of the SBQ was found to be easy to administer and reliable and exhibited a strong intraclass correlation, reflecting stability both over time and across the items among patients referred to the sleep disorders clinic. This study shows that, among patients referred to a sleep disorders clinic, an SBQ score ≥ 3 has high sensitivity (98.0%) and PPV (86.6%) for the detection of OSA (AHI > 5). Moreover, the AUC was consistently high for the diagnostic ability of the SBQ for all OSA severities. To the best of our knowledge, this is the first study that validates an Arabic version of the SBQ. In this study, the performers and the scorers of the PSG were blinded to the SBQ scores to avoid the risk of bias that has occurred in some previously published papers that validated screening questionnaires . For widespread use, a screening tool for OSA must be simple and easy to use and must have high sensitivity and PPV, such that practitioners would be able to stratify patients, make quick, reasonable decisions about the likelihood that patients have OSA, and plan further diagnostic tests or treatment. This study shows that the Arabic SBQ can be a very useful tool for screening patients for the risk of OSA in the sleep disorders clinic setting.
In general, the performance of a screening test differs among different populations due to the severity of disease . The SBQ was originally developed and validated among surgical patients attending preoperative clinics, and it achieved good sensitivity and specificity . Chung et al., who validated the SBQ primarily among preoperative patients, reported sensitivities of 72.1%, 78.6% and 87.2% for AHIs of > 5, > 15 and > 30/hour, respectively, with corresponding specificities of 38.2%, 37.4% and 36.2% . Subsequent studies have validated this questionnaire in patients attending sleep disorders clinics . Vana et al. demonstrated that the SBQ has high sensitivity (93.8%) and low specificity (33.3%) for detecting OSA in a sleep clinic setting; our results are in agreement with these results . Another recent study, by Reis et al., evaluated a Portuguese version of the SBQ in a sleep disorders clinic setting where patients completed the SBQ and underwent a sleep study. The sensitivity and PPV for OSA were 93.4% and 86.6%, respectively, whereas the specificity was 48.9%. Farney et al., examining patients in a sleep disorders clinic setting, reported an 85.1% probability of having an AHI ≥ 5/hour if the SBQ score is > 3 . We were able to show better results for the same SBQ score; with the same SBQ score cutoff, our SBQ score achieved a sensitivity of 98.0% and PPV of 86.6%.
Scatter plot for the correlation between the Arabic version of the STOP-Bang questionnaire and the apnea-hypopnea index (AHI).
Sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, and negative likelihood ratio for Arabic STOP-Bang questionnaire scores (low risk, high risk) for different cut-off values of AHI.
|AHI ≥ 5||1.00||0.98||0.24||0.86||0.67||1.28||0.10|
|AHI ≥ 15||0.63||0.95||0.65||0.93||0.73||2.70||0.07|
|AHI ≥ 30||0.78||0.71||0.76||0.94||0.35||3.02||0.38|
AHI: apnea-hypopnea index; AUC: area under the curve; PPV: positive predictive value; NPV: negative predictive value; PLR: positive likelihood ratio; NLR: negative likelihood ratio.
|AHI||Arabic STOP-Bang Questionnaire||p-Value|
|Low Risk (n=17; %)||At Risk (n=83; %)|
|AHI||16.80 ± 16.30||56.70 ± 36.43||< 0.001|
|AHI < 5||4 (23.5)||2 (2.4)||0.001|
|AHI ≥ 5||13 (76.5)||81 (97.6)|
AHI: apnea-hypopnea index.
Receiver-operating characteristic (ROC) curve with PSG AHI cut-off values of: (A) AHI=5, (B) AHI=15, and (C) AHI=30 for the Arabic version of the STOP-Bang questionnaire, with a corresponding area under the curve (AUC).
Several questionnaires and clinical screening tests have been used to detect patients at risk for OSA. A recent study comparing the ESS, BQ and SBQ as screening tools for OSA in a sleep-disordered breathing clinic demonstrated the superiority of the SBQ in screening for the presence of OSA with a high sensitivity and a high AUC . On the other hand, the BQ exhibited the greatest specificity . In another retrospective study, Silva et al. analyzed data from the Sleep Heart Health Study population (n = 4770) that evaluated the abilities of the 4-Variable screening tool, STOP, SBQ, and ESS questionnaires to identify subjects at risk for OSA . The SBQ exhibited the best sensitivity in predicting moderate to severe OSA; the sensitivity of the SBQ was 87% and 70% for detecting moderate and severe OSA, respectively . In our study, the corresponding sensitivities were 95.0% and 71.0% for moderate-to-severe and severe OSA, respectively. A meta-analysis examining several questionnaires that evaluated the risk of OSA included the American Society of Anesthesiologists (ASA) checklist, the BQ, the sleep questionnaire, the sleep disorders questionnaire (SDQ), the STOP questionnaire and the SBQ . The authors identified the SBQ clinical scale as an excellent tool for predicting severe OSA due to its simplicity and relative ease of use; however, the BQ and the SDQ were the most accurate questionnaires for screening OSA . A more recent meta-analysis of screening questionnaires showed that the SBQ exhibits consistently high sensitivity for detecting OSA at different severity levels . However, for predicting moderate or severe OSA, the authors concluded that the SBQ and the BQ had the highest sensitivity and specificity, respectively .
Our study has some advantages and limitations compared with previous studies conducted in the sleep disorders clinic setting. In the studies by Farney and Silva, the SBQ responses were collected retrospectively from answers to other questionnaires that the authors described as being similar to the SBQ [19, 20]. Our study was designed prospectively, specifically to assess the utility of SBQ in identifying OSA. However, the sample in our study is smaller than that of previous two studies. In a recent study by Reis et al., who assessed the validation of a Portuguese version of the SBQ in the context of a sleep disorders clinic, some patients underwent portable (level III) sleep studies . Level III sleep studies may underestimate the AHI, as total sleep time is not measured. In our study, all patients underwent in-laboratory level I attended sleep studies to obtain accurate measurements of the AHI. A limitation of some previous studies is that the performers and interpreters of the sleep studies were not blinded to the SBQ score and/or to patients’ clinical histories . To avoid this bias in our current study, the performers and the scorers of the PSG were blinded to the SBQ scores .
The Arabic version of the SBQ is an easy-to-administer, simple, reliable and valid tool for identifying OSA among Arabic-speaking patients.
CONFLICT OF INTEREST
The authors confirm that this article content has no conflict of interest.
This study was supported by a grant from the National Plan for Science and Technology Program by the King Saud University Project in Saudi Arabia.