Length of sick leave – Why not ask the sick-listed? Sick-listed individuals predict their length of sick leave more accurately than professionals

Background The knowledge of factors accurately predicting the long lasting sick leaves is sparse, but information on medical condition is believed to be necessary to identify persons at risk. Based on the current practice, with identifying sick-listed individuals at risk of long-lasting sick leaves, the objectives of this study were to inquire the diagnostic accuracy of length of sick leaves predicted in the Norwegian National Insurance Offices, and to compare their predictions with the self-predictions of the sick-listed. Methods Based on medical certificates, two National Insurance medical consultants and two National Insurance officers predicted, at day 14, the length of sick leave in 993 consecutive cases of sick leave, resulting from musculoskeletal or mental disorders, in this 1-year follow-up study. Two months later they reassessed 322 cases based on extended medical certificates. Self-predictions were obtained in 152 sick-listed subjects when their sick leave passed 14 days. Diagnostic accuracy of the predictions was analysed by ROC area, sensitivity, specificity, likelihood ratio, and positive predictive value was included in the analyses of predictive validity. Results The sick-listed identified sick leave lasting 12 weeks or longer with an ROC area of 80.9% (95% CI 73.7–86.8), while the corresponding estimates for medical consultants and officers had ROC areas of 55.6% (95% CI 45.6–65.6%) and 56.0% (95% CI 46.6–65.4%), respectively. The predictions of sick-listed males were significantly better than those of female subjects, and older subjects predicted somewhat better than younger subjects. Neither formal medical competence, nor additional medical information, noticeably improved the diagnostic accuracy based on medical certificates. Conclusion This study demonstrates that the accuracy of a prognosis based on medical documentation in sickness absence forms, is lower than that of one based on direct communication with the sick-listed themselves.


Background
The increasing rate of sick leave experienced in most Western countries challenges insurance companies, employers, and public authorities to identify measures to reduce burdens at the individual, workplace and societal levels.
To reduce the expenses of sick leave and the risk of expulsion from work, the Norwegian government introduced legislation in 1993 that anticipated early and more vigorous interventions of the Norwegian National Insurance Scheme [1]. The Norwegian Public Report no. 27 [2], 2000, underscored the importance of early intervention by the National Insurance Offices (NIOs). A major challenge for the NIOs is to identify newly sick-listed individuals at risk of prolonged sick leave, and who are therefore potential candidates for rehabilitating interventions.
The selection process is currently based on information in medical sickness certificates supplied by access to the register of previous sickness benefits. A medical sickness certificate (Sickness Certificate 1; SC1) is required if sick leave exceeds 3 days, and after 8 weeks an extended medical certificate is mandatory (Sickness Certificate 2; SC2) [3]. In addition to diagnosis and certified period, the majority of SC1s contain information on the occupation and employee, whereas information on chronic disease, previous sick leave episodes, prognosis and comments are more scattered. SC2s include updated medical information on work ability, planned diagnostics and treatments, and on the prognosis. The value of this information as a guideline for selective intervention has, however, never been established, either as an indicator of potential prolonged absence, or as an indicator of the need for occupational or vocational rehabilitation [4].
Based on the current practice with identifying sick-listed individuals at risk of long-lasting sick leaves, the objectives of this study were to inquire diagnostic accuracy of predictions within the NIOs, and to compare their predictions with the self-predictions of the sick-listed.

Methods
In October and November 1997 and March and April 1998, newly sick-listed persons with musculoskeletal or mental disorders (ICPC, L-and P-diagnoses) [5] were included consecutively if they were certified sick for longer than 2 weeks (Figure 1). Five hundred persons were included in each period. The study took place in the cities of Tromsø and Harstad in Northern Norway. The total length of sickness benefits was registered during the following year in the National Sickness Benefit Register. Missing data on the length of sick leave reduced the number of included subjects to 993. The mean ages of these 391 men and 602 women were 41.4 and 39.7 years, respectively. Musculoskeletal disorders were the main reason for sick leaves (83% of the cases).
A total of 495 randomly selected persons received a questionnaire on the expected length of their ongoing sick leave period. The answer categories were: less than 4 weeks, 4 to 7 weeks, 8 to 11 weeks, 12 to 15 weeks, 16 to 25 weeks, 26 to 51 weeks, and at least 1 year. Some 152 persons (30.7%), called the responder group, returned the questionnaire with this question filled in.
Based on SC1s available after 14 days of sick leave, two NIO officers without formal medical competence, but experienced in working with sick-listed persons, and two experienced physicians working part time as insurance medical officers (NIO medical consultants), assessed the expected length in each of the 993 ongoing sick leave cases. In 496 randomly chosen cases, the NIO assessors had additional access to information on sick leave periods during the previous 3 years. Of potentially 1986 assessments in each profession, the officers and medical consultants had 18 and 25 missing assessments, respectively.
SC2s became available in 322 of the 459 cases where sick leave exceeded 8 weeks, and the NIO assessors reassessed these cases.
Reproducibility of assessments by medical consultants were analysed in 20 cases reassessed by the two NIO medical consultants, and assessed by another eight of their colleagues.

Observed length of sick leaves
The reference standard lengths of individual sick leaves within 1 year were collected from the National Sickness Benefit Register. Sick leaves interrupted by only 1-2 days without sickness benefits, typically on weekends, were registered as a single period. The observed length of sick leave thus comprised the total period of continuous fulltime or part-time absence due to sickness within 1 year.

Statistics
The diagnostic accuracy of predicted lengths was compared on the basis of sensitivity, specificity, likelihood ratio and the area under the receiver operating characteristics curves (ROC area) [6,7]. The non-parametric standard error and 95% CI for the ROC area were calculated in SPSS-11. The ROC curve represents plots of the true-positive rate (sensitivity) and the false positive rate (1 -specificity) at the average of two consecutive categories of the assessments (>= 0 weeks, >= 4 weeks, >= 8 weeks etc). The ROC curves of the mean assessment by NIO officers and medical consultants include even intermediate points representing half categories.

Inclusions
The National Insurance Offices in the cities of Harstad and Tromsø included consecutively 993 newly sick-listed persons, certified sick beyond 14 days due to musculoskeletal or mental disorders, during October-November 1997 and February-March 1998.

Self-assessments (N=495)
Randomly chosen 495 of the included sick-listed persons were invited to assess their expected length of sick leave as their sick leave passed 14 days.

Expected length after 8 weeks of sick leave (N=322)
Pre-selected answers categories: 8-11 weeks, 12-15 weeks, 16-25 weeks, 26-51 weeks and at least one year. After 8 weeks of sick leave, the two National Insurance officers and two National Insurance medical consultants reassessed the expected lengths of the 322 sick leaves where an extended medical certificate (Sickness Certificate 2) was received.

Observed length of the included sick leave periods (N=993)
The actual lengths of the sick leaves were collected from the National Sickness Register after one year.

Responders N=152
Half the answers were received within 7 days, and 80 % within 12 days from their sick leave passed 14 days No answer N=343 Returned to work within 8 weeks N=534 Missing extended medical certificate (Sickness Certificate 2) at 8 weeks N=137 The predictive validity is presented as sensitivity, specificity, positive predictive value (PPV) and likelihood ratio at different thresholds, cut-offs, in predicted length [8].
Reliability of predicted length was analysed with agreement between assessors, the kappa value [9,10].

Approval
The Regional Ethical Committee approved the protocol, and the Norwegian Data Inspectorate licensed the necessary register of sick-listed subjects.

Results
The 65.6-93.1 days) in non-responders. Stratification on gender or musculoskeletal or mental disorders did not reveal any significant differences in the length of sick leave between responders and non-responders.
All assessors, including the sick-listed themselves, systematically overestimated the length of short sick leaves (lasting 4-11 weeks) and underestimated the length of long sick leaves (exceeding 16 weeks; Table 1). The proportions of sick leaves lasting longer than 8, 12 or 26 weeks did not differ significantly between the responder group and the rest.

Receiver operating characteristics of prediction
The sick-listed subjects predicted sick leaves equal to or longer than 12 weeks more accurately than the NIO medical consultants and officers, as shown by the ROC curve in Figure 2. The differences in ROC area between responders and non-responders were most marked among younger subjects and in females (Table 2). Generally, the length of sick leave was predicted more accurately in older subjects than in younger subjects, and better in males than in females. Access to past history of sick leaves improved the ROC area of NIO consultants from 60.6% (95% CI 51.3-69.9%) to 75.4% (95% CI 68.2-82.6%) in male sick-listed, but did not improve the ROC area in assessments of female sick-listed. Changing the observed length to be identified from 12 weeks to 8 or 26 weeks did not significantly change the diagnostic accuracy as assessed by the ROC area. The sicklisted identified sick leaves lasting 8 weeks or longer with a ROC area of 79.5% (95% CI 72.2-85.6%), and sick leaves lasting 26 weeks or longer with a ROC area of 75.5% (95% CI 67.9-82.1%). Sick-listed persons with mental disorders or with neck, or shoulder and arm ROC curves of identifying sick leaves lasting at least 12 weeks Figure 2 ROC curves of identifying sick leaves lasting at least 12 weeks. The ROC curve of ability to identify sick leaves lasting at least 12 weeks, plotted at the average of two consecutive categories, in length predicted by sick-listed (n = 152), and mean length predicted by National Insurance officers and medical consultants in the responder group (n = 149, 150) and for all the data (n= 972, 975). The points representing cut-offs in predicted length >= 4 weeks (red), >= 8 weeks (pink) and >= 12 weeks (blue) are identified.

Consultants total
Officers responders Officers total disorders, were most accurate in their assessment ( Figure  3). This was in contrast to NIO assessors, who demonstrated the lowest predictive ability in these diagnostic groups, particularly in responders. The impact on diagnostic accuracy of knowing the occupation was small.

Sensitivity, specificity, predictive value and likelihood ratio
The sick-listed subjects predicted their sick leaves with higher sensitivity and PPV than the NIO assessors (Tables  3, 4). Male sick-listed predicted sick leaves lasting at least 12 weeks with a sensitivity of 0.82% (95% CI 0.60-0.95) and a PPV of 0.78 (95% CI 0.56-0.93) using predicted length of at least 8 weeks. The corresponding sensitivity and PPV of female sick-listed were both 0.61 (95% CI 0.44-0.77).
Duration of at least 8 weeks was the preferable cut-off in predicted length, to identify sick leaves lasting at least 12 weeks (Table 3). A predicted length of at least 12 weeks reduced the sensitivity in all the data to 0.17 in medical consultants and 0.25 in officers. The corresponding improvement in PPV was modest, reaching 0.54 in medical consultants and 0.45 in officers. Using a predicted length of at least 4 weeks would have markedly reduced the specificity (Figure 2).
The sensitivity of identifying sick leaves lasting at least 26 weeks was generally low when medical consultants and officers predicted on the basis of SC1s. (Table 4). The sensitivity was improved somewhat by introducing SC2 information, but the effects on likelihood ratio and PPV if prevalence corrected, were minor.
According to the results, the effects of the different predictive strategies can be illustrated by considering a program designed to intervene in all cases where the subject is expected to be sick-listed for more than 12 weeks at 14 days of sick leave. Out of every 1000 sick-listed persons, 333 will be sick-listed for more than 12 weeks according to the prevalence in this study. The random selection of 333 persons will include 111 true positives, while 333 persons selected by officers will include 133 of the 333 persons that will be sick-listed at least 12 weeks. The evaluation of 1000 sick-listed individuals thus increases the number of true positives by 22 in a selection of 333 sicklisted persons. The alternative strategy of asking the sicklisted themselves will include 210 true positives in a selection of 333 persons.

Reliability and reproducibility of the predicted length
Agreement between medical consultants in their initial prediction of sick leaves lasting at least 12 weeks, was fair,

The ability to identify sick leaves lasting at least 12 weeks in the responder group (n = 152) and in all participants (N = 993), presented as ROC area, calculated from length of sick leave predicted by sick-listed, and mean length predicted by National Insurance medical consultants and officers. The range of the individual National Insurance ROC areas is presented for all participants.
Medical consultants Officers ROC area in different diagnostic groups Figure 3 ROC area in different diagnostic groups. ROC area representing ability to identify sick leaves 12 weeks or longer in different diagnostic groups, calculated on length predicted by sick-listed, and mean of lengths predicted by NIO assessors. The ROC area are presented with blue bars of 95% CI in the responder group (n = 152/), and red bars without horizontal lines between upper and lower individual ROC area of the NIO assessors for all sick leaves (n = /958). In the prediction of sick leaves lasting at least 12 weeks based on the SC2, agreement was moderate between medical consultants (kappa = 0.42, 95% CI 0.29-0.54) and fair between officers (kappa = 0.26, 95% CI 0.10-0.42). The corresponding agreements in the prediction of sick leaves lasting at least 26 weeks were moderate between medical consultants (kappa = 0.55, 95% CI 0.40-0.70) and fair between insurance officers (kappa = 0.31, 95% CI 0.17-0.47).
The differences in diagnostic accuracy, between the two participating medical consultants and their eight colleagues in the reproducibility group, were not significant.

Discussion
The results of the present study question any practical value of using information in medical sickness certificates in predicting the length of sick leave, as is the current prac-tice in Norwegian NIOs. Instead, the sick-listed themselves predicted their length of sick leaves far more accurately, but this information is not routinely sought.

Representativeness
The officers in the present study were selected from experienced officers who had shown an interest in the field of sick leave. This might introduce a bias of overestimating the officers' general ability to predict the length of sick leaves. The performances of the two medical consultants were representative of eight of their colleagues who participated in the reproducibility part of the study. We therefore consider the diagnostic accuracy of the assessors to be representative of their professional groups, or at least not underestimated due to bias. Although the diagnostic accuracy varied within each group, the main conclusion of better predictive ability among the sick-listed, was challenged neither by comparing with the mean length predicted by assessors, nor by comparing with the best-performing NIO assessor.  The distributions of gender and diagnosis among the 993 persons included in the study were comparable with those in the National Sickness Benefits Register. The findings of longer sick leaves in women with musculoskeletal disorders, and longer sick leaves in men with mental disorders, are consistent with the Register and other studies [11][12][13].
The low responder rate among the sick-listed introduced a possible selection bias, although we could not identify any selection bias in gender, age, diagnosis or occupation [14]. If there was a selection towards more predictable sick leaves, this should have been reflected in the assessments of officers and medical consultants. The general trend of lower diagnostic accuracy of NIO assessors in the responder group indicates that if any selection bias contributes to the results, it is an underestimate of the selfpredictive ability.

Why did the sick-listed make better predictions?
If the lengths of sick leaves were predominantly related to loss of function caused by sickness, in line with the legislation, we would expect that the medical consultants' professional competence would favour them in predictions of the lengths of sick leaves. The differences we observed between medical consultants and officers in mean ROC area, were however minor. Furthermore, we could not demonstrate any significant differences in diagnostic accuracy between medical consultants and officers when aggregate information on disease, treatment, function related to work, and prognoses were available in the SC2. The improvement in ROC area with this aggregated information was minor, with the area just reaching 70%, which is considered borderline useful for some purposes [7]. The result is in line with Bjørndal's findings of low prognostic impact of the SC2 [15], and is supported by findings of a low predictive power of symptoms and signs in neck and shoulder disorders [16]. The better prediction of the length of sick leave by the sick-listed themselves, is supported by studies that have identified different nondisease determinants of sick leave, such as job satisfaction [17], attitudes towards pain [18], irreplaceability [19] and psychosocial work environment [20][21][22]. Studies identifying that at least the initial sickness certification is predominantly patient controlled [23,24] indicate the competence of the sick-listed. Self-rated health seems to be an independent predictor of return to work [17], disability pension [25] and early retirement [26]. Our findings can be interpreted as indicating that the subjective perception of sickness and work ability is more predictive of the length of sick leave, than the apparently more objective description in medical terms. The differences in predictive ability were especially significant in persons with mental and neck disorders, while the NIO assessors performed equal to the sick-listed in the more clear-cut injuries with more standardised treatment and prognosis. Mental dis-orders, with high prevalence in the population, and an increasing cause of absence [27], are of special interest [13]. This increasing prevalence of sick leaves indicates the presence of factors separate from the diagnosis criteria. It seems that the more clear-cut the disease and the recommended treatment, the lesser the gain in predictive ability achieved by asking the sick-listed, and vice versa. The modest gain in predictive ability caused by introducing more medical information by the inclusion of the SC2 supports this interpretation. A more complete description of symptoms and treatment does not necessarily give better prognostic information when this includes little knowledge of the consequences related to occupation, and the effects of treatment are undocumented or, at best, marginal.

Diagnostic accuracy -practical implication
The Norwegian NIO is obliged by legislation to perform early intervention on the sick-listed in an effort to reduce the length of sick leave and the risk of expulsions from work. Limited resources and the large number of sicklisted individuals make selection desirable before any intervention is initiated. An alternative to selection on the basis of medical certificates is to communicate directly with the sick-listed themselves. This selection for intervention by NIOs might be seen as screening. The aim is to reach -at an acceptable cost -as many as possible of those that might profit from intervention. The potential individual gain by intervention will be greater when longer lasting sick leaves can be anticipated, and greater the sooner individual intervention programs are established.
The marginal predictive ability and modest agreement between NIO assessors questions the use of resources in selection based on information from medical certificates. The predictions of medical consultants tend to be better than those of officers, but not to an extent that makes it more meaningful to use medical consultants in the selection process, rather than officers.
With limited resources for intervention, it might be more cost effective to identify those whose sick listing will last longer than 26 weeks instead of 12 weeks. Based on selfreporting, eight out of ten would be true positives, and one fourth of the individuals would be reached. To reach the same number of true positives at 14 days of sick leave, the ratio of true positives would be reversed from eight out of ten, to two or three out of ten, if the selection were based on medical certificates.
In the search for tests predicting long-lasting sick leaves, such as The Örebro Musculoskeletal Pain Questionnaire [28], the present study indicates that the results of any such tests should be compared with the results of crude self-estimated length.

Conclusions
Sick-listed individuals predicted their length of sick leave far more accurately than did NIO medical consultants and officers based on information from sickness certificates and the history of past sick leaves. The predictions of sicklisted males were better than those of females, and older persons predicted better than younger persons. The availability of more information, as through the SC2, had only a minor effect on the predictive ability of the medical consultants and officers. Neither reliability nor validity of their predictions was satisfactory.
This study demonstrates the need to re-consider the diagnostic usefulness of documentation on sickness absences, and supports a change in strategy from collecting more medical information to more direct communication with the sick-listed themselves, for effective and early interventions to prevent long sick leaves and expulsions from work.