The shirom-melamed burnout questionnaire
The Shirom-Melamed Burnout Questionnaire (SMBQ) contains 22 items in four subscales: "Physical Fatigue (PF)", "Cognitive weariness (CW)" [9] "Tension", and "Listlessness" [10]. The Physical Fatigue domain consists of 8 items, examples of which are "I feel tired" and "My batteries are dead." Six items measure Cognitive Weariness, examples of which are "I feel I am not thinking clearly" and "I have difficulty thinking about complex things." Four items measure Tension, and include "I feel tensed" and "I feel relaxed". Items measuring Listlessness include "I feel full of vitality" and "I feel alert". Each item is rated using a seven-point scale ranging from 1 'Never or almost never' to 7 'Always or almost always'. Five of the items have reversed scoring, one item in the tension domain, three in the listlessness domain and one in the physical fatigue domain. For each sub-domain, and the scale as a whole, the total score is averaged by dividing by the number of items in the domain.
Subjects and setting
Data from both a clinical and general population samples were included in the study. The clinical population consisted of patients seeking medical care at a specialized outpatient stress clinic; the Institute of Stress Medicine (ISM) located in Gothenburg, Sweden. All patients were ambulatory at the time of the study and none had received inpatient care due to their illness. They were referred from primary care units or occupational health care centres from the western part of Sweden and the referral criteria were stress-related exhaustion and a maximal duration of sick leave of six months. The patients included in this study were recruited between 2004 and 2009. All patients fulfilled the ICD-10 criteria for "other reaction to severe stress "(F.43.8A), which in Sweden has been further defined with diagnostic criteria of exhaustion which requires the presence of one or several clearly identifiable strain factors during at least six months [12]. During this period 354 patients were referred to the clinic and met these criteria, so entering the treatment program and thus were followed-up. To ensure that the exhaustion experienced by the patients is not due to other known causes, patients with known systemic or psychiatric disease (except depression, anxiety and exhaustion), present infection, body mass index below 18.5 or over 30 kg/m2, vitamin B12 deficiency, thyroid disorder or over-consumption of alcohol were excluded. Pregnant or breast-feeding patients were also excluded
Subjects from the general population were obtained from a survey study with the general aim to investigate different aspects of psychosocial work environment, stress, and stress-related health. This study population comprised a random sample (N = 5,300) of the 48,600 employees of Region Västra Götaland, a provider of public health care, and a random sample (N = 700) of the 2,200 social insurance office workers in the same geographical area. Inclusion criteria of at least one-year duration of employment (at least 50% of full-time) were applied. A postal questionnaire was used and the response rate after two reminders was 61%; thus in total 3,717 subjects responded. The majority was females (87%) and the average age of the participants was 47 years. From this population a stratified age-gender sample, comparable to the patient population, was randomly selected (n = 319). This was to ensure that the full range of burnout (e.g. low to high) was available to the psychometric analysis.
Internal construct (factorial) validity
Factor analysis
The paucity of published evidence concerning the factorial structure of the original SMBQ led to an initial exploration of its structure with a Confirmatory Factor Analysis (CFA) [13]. Both a single unidimensional solution was tested (i.e. all 22 items together) together with a four factor solution, representing the four domains listed above. A robust weighted least squares estimator (WLSMV) for categorical variables was chosen. Fit statistics chosen for this analysis were the Comparative Fit Index (CFI), Tucker-Lewis Index (TLI) and the Root Mean Square Error of Approximation (RMSEA), with guidelines for appropriate fit being > 0.95; > 0.95, and < 0.08 respectively [13]. Modification Indices were examined to give insight into possible structural aspects of model misfit (e.g. local dependency).
An Exploratory Factor Analysis (EFA) was undertaken where the CFA failed, in order to gain further insight into a possible item structure which would be appropriate for the Rasch analysis (a confirmatory procedure). A Promax non-orthogonal rotation method was used, allowing for correlated factors.
Rasch analysis
The Rasch model is the formal measurement model required to construct quantitative measurement from dichotomous or ordinal data [11, 14, 15]. It is used whenever a set of items are intended to be summed together to give a total score. The pattern of responses from such data is checked against the model expectations, which is a parametric probabilistic form of Guttman Scaling [16].
Thus the process of Rasch analysis is concerned with testing to see if the data accord to model expectations, satisfy the various assumptions of the model, and other key measurement issues such as the absence of differential item functioning [17]. For example, the assumption of local independence can be characterised as comprising two elements, response dependency and trait dependency [18]. The former is where items are linked in some way, such as a series of walking items reflecting increasing distances. The latter is multidimensionality. Both these are tested by analysis of the residuals where the former is judged to be absent when residual correlations are below 0.3, and the latter to be unidimensional where patterns of items in the residuals (as identified by a Principal Component Analysis - PCA) are shown to give similar person estimates [19]. Response dependency can be accommodated by grouping locally dependent sets of items into 'testlets' [20]. Where testlets of different lengths are constructed the item residual standard deviation may be inflated.
Another assumption is that of the stochastic ordering of items, testing the probabilistic Guttman pattern. This is confirmed by a series of fit statistics, where Chi-Square based statistics are shown to be non-significant (i.e. no deviation from model expectation) after adjustment for multiple testing [21]. Summary residual statistics, under conditions of perfect fit, are expected to have a mean of zero and standard deviation of one, whereas in practice the latter should be below 1.4, except where testlets have been used to accommodate local dependency issues, when the standard deviation becomes inflated [22]. Individual item residuals are expected to be within the range ± 2.5. Differential Item Functioning (DIF) is deemed absent when there is no significant difference in the residuals (via ANOVA) across key contextual groups, such as age or gender. For analysis of DIF three age groups were used: persons under and up to 38 years (N = 116), 39 to 46 years (N = 99) and persons 46 years or older (N = 104). These groups were based upon distribution to obtain similar numbers within groups to support an ANOVA analysis of the residuals.
Reliability is reported as a Person Separation Index, similar to Cronbach's alpha when data are normally distributed. As both items and persons are calibrated on the same metric, where data fit the Rasch model it is possible to examine the targeting of the items in the scale. A properly targeted instrument would have a mean population value of zero logits, which is also where the items of the scale are centred. Also, when data fit the model, a raw score-interval scale transformation becomes available. This means that the ordinal score, achieved by simply summing the items together, can be transformed into an interval scale latent estimate for use in parametric statistics, and for calculating change scores. This is available because under the Rasch model the raw score is a sufficient statistic for the estimate of the person ability, and the property of specific objectivity (parameter separation) fulfils the requirements to satisfy the axioms of conjoint measurement to provide interval scaling [23–26]. In summary the process of Rasch analysis tests the viability of sets of items to be used as valid and reliable additive scale, including aspects of invariance across groups, and compliance with the requirements for constructing interval scale measurement. Further details of the process are given elsewhere [27–29].
The sample size of 638 is sufficient for both a factor analysis of 22 items, and to give a high degree of precision (i.e. item location estimates within 0.3 logit with 99% confidence) for the Rasch analysis [30].
The study was approved by The Regional Ethical Review Board in Gothenburg and conduced in compliance with the Helsinki declaration. All subjects included in the study signed a written informed consent allowing their data to be used for research purposes.
The Rasch software used was RUMM2030 [31]. CFA and EFA in MPlus6 [32] and all other analysis in SPSS Version 18 [33].