The SA days model poorly discriminated between office workers with and without high SA days, whereas the SA episodes model showed fair discrimination and acceptable calibration. Although gender was associated with SA, particularly SA days, the predictive performance of the models did not improve after adding gender. It would have been interesting to add other readily available work-related or person-related variables from the health check-up, but the number of high SA events restricted the number of predictors in the SA prediction models. Generally, it is advised to include one predictor per 15 or more events . With an effective sample size of 66 employees with high SA days and 67 employees with high SA episodes, the prediction models could only include four predictors in the present validation setting.
Although SRH is easy to obtain without the need for questionnaire surveys, employees have to be asked to rate their health. Thus, SRH can only be gathered at worksite health fairs or from employee visits to health care departments. Our study showed that the predictive performance of the SA episodes model was maintained after deleting SRH from the prediction model. This implicates that age and prior SA, which are regular SA register data, would suffice to identify white collar worker at risk of high SA episodes. However, it should be noted that SRH was a stronger predictor in the health care setting where the prediction models were developed. Excluding strong predictors considerably reduces the predictive ability of prediction models. Thus, if available, SRH should be included in the SA episodes model, because SRH is a health measure and SA is, at least partly, a health-related phenomenon.
The discriminative ability of both prediction models degraded in the population of office workers, although the SA episodes model still showed fair performance. Furthermore, the cut-off probabilities of the SA episodes model confirm those of the development setting. At a cut-off risk of high SA of 10%, the sensitivity was acceptable, but the specificity was low due to high false-positive rates. A sensitive cut-off point can be used to identify as much office workers at risk of high SA as possible. For example, workers with high SA episodes may suffer chronic recurrent conditions that are not yet diagnosed or treated. From a societal perspective, it may be desirable to select workers with a ≥10% probability of high SA episodes for further diagnosis and treatment to prevent worsening of chronic conditions, long-term SA and subsequent disability pensioning. Alternatively, more specific cut-off points can be used to reduce false-positive rates, for instance to select high-risk office workers for costly interventions.
Why did the prognostic performance degrade?
The purpose of a prediction model is to provide valid predictions for new subjects [24–26]. External validation refers to the transportability of a prediction model to other settings than where the model was developed [18, 30]. Prediction models tend to perform better in the subjects used to develop the model than in other subjects, a phenomenon known as over-optimism . For internal model validation, bootstrapping methods are recommended to provide bias-corrected estimates of model performance. In the development sample of health care workers, internal validation by bootstrapping revealed an over-optimism of 0.06 for the SA days model and 0.03 for the SA episodes model. Subsequently, the performance parameters were shrunken to adjust for this over-optimism [24–26, 28]. Although adjustment for over-optimism by bootstrap techniques may not be sufficient in relatively small data sets , this low over-optimism made it unlikely that the poorer performance of the prediction models in the sample of office workers was due to overfitting to the development sample.
Alternatively, underfitting occurs when important predictors are missing from the prediction models. Internal validation by bootstrapping techniques will not detect underfitting because the bootstrap samples are drawn from the same population. The poorer performance of the prediction models in the present study may well be explained by underfitting, in particular because the Nagelkerke pseudo R2 values were lower than in the development sample of health care workers. The Nagelkerke’s pseudo R2 reflects the variance in high SA between office workers that is explained by the covariates fitted in the prediction models . Low Nagelkerke’s pseudo R2 values indicate that other factors than those included in the model may be important for predicting high SA among office workers. Hence, future studies should further update the prediction models with other predictors, e.g. work variables and personal characteristics, provided that these variables are readily available or easy to obtain by physicians.
Another explanation for the lower performance may be the different case-mix in the population of office workers. Case-mix refers to the distribution of known and unknown predictors of SA in the studied populations. The population in which the prediction models were developed consisted of 535 health care workers, predominantly female nurses who were younger than the office workers in the present study. One-third of the development population of health care workers reported excellent health as compared to a quarter of the present population of office workers. Furthermore, 8% of healthcare workers reported less than good health as compared to 18% of office workers. The distribution of prior SA did not differ between the development and the validation populations.
Finally, the regression coefficients may really differ between the two working populations i.e., the working populations were not plausibly related. The prediction models were developed in health care workers, predominantly working in physically and emotionally demanding nursing care. Possibly, this development sample differed too much from the current validation sample of office workers performing mentally demanding work at an insurance company. Furthermore, the ‘healthy worker effect’, which selects the healthiest employees to work until older age, may be greater in nursing care which is more physically demanding than office work. This may explain why the inverse association between age and high SA was stronger in the development sample of health care workers than in the validation sample of office workers. The ‘healthy worker effect’ may also explain why SRH was a stronger predictor of high SA in health care workers than in office workers, particularly since SRH was found to reflect physical functioning rather than mental health .
Practical implications and future directions
Prediction models have practical perspectives if they accurately predict outcomes for different populations [18, 30]. This study showed that the SA episodes model accurately predicted the risk of high SA episodes in both health care workers and office workers. Therefore, this prediction model may be a promising tool to select employees at risk of high SA episodes for preventive occupational health consultations. Such consultations were found to reduce SA duration [7, 8], but not SA frequency . Duijts et al. reported that in employees who received preventive coaching the mean SA duration was 11.7 days during 8 – 12 months follow-up as compared to 13.1 days in the control group. The mean SA frequencies were 1.07 and 1.40 respectively, though none of the differences in SA measures was statistically significant . In the current study, the SA episodes model identified employees at risk of a high SA frequency, but the model may also indirectly identify employees at risk of future long SA duration, because frequent SA has been recognized as a risk factor for long-term SA [35–37]. Further research is needed to clarify which frequent absentees develop long-term SA in the future.
It is also important to further validate the SA episodes model, for example in large heterogeneous populations and in multiple settings [18, 20]. The more numerous and diverse the settings in which the SA episodes model accurately predicts high SA, the more likely it will generalize to untested working populations . Furthermore, the SA episodes model should be developed into a nomogram or score chart that is easier to understand and use in daily practice than the regression formula. Simpler presentation formats provide approximate predictions, but this will not be problematic for identifying employees at risk of high SA.