### Study design and participants

The Northern Swedish Cohort was initiated in 1981. In that study, all pupils, most of whom were born in 1965, who were in their last year of compulsory school in a middle-sized town in Northern Sweden were invited to participate. For the cohort, there have been four follow-ups (1983, 1986, 1995, and 2007) [11]. Comprehensive questionnaires were distributed at the initial time of inclusion and at the four follow-ups, and the response rates have been very high, ranging from 1080 (99.7%) of 1083 invited individuals in the baseline investigation to 1010 individuals at the latest follow-up in 2007 (94.3% of those still alive). Further information about the Northern Swedish Cohort is available elsewhere [11].

### Inclusion criteria

In our study, survey data from the follow-ups in 1995 (when the participants were ~30 years of age) and 2007 (when the participants were ~42 years old) were used, which were available for 1001 individuals. We restricted our selection of individuals to the 654 participants who were active in the labor market at the follow-up in 1995 and who had no self-reported unemployment in between the follow-ups of 1995 and 2007. We used the inclusion criteria to detect differences in health on a longer time perspective due to unemployment and not due to unemployment spells between the follow-ups of 1995 and 2007. For all of our analyses, we required eligible responses on all of our candidate variables, leading to the exclusion of 34 individuals and resulting in a final selection of 620 individuals. Our final selection of individuals corresponded to 58% of those invited and still alive from the original cohort at the time of the 2007 follow-up. The definitions of “active in the labor market” and “no self-reported unemployment in between follow-ups” are given in the section “Definition of labor market status”. A flow chart of the inclusion criteria is shown in Fig. 1.

### Questionnaire data

From the 2007 questionnaire, besides using no self-reported unemployment as an inclusion criterion, only our outcome variable, which measures self-rated health through three response alternatives (“good”, “fairly good”, and “poor”) was used. For self-rated health, “fairly good” and “poor” were used to represent poor health, and “good” was used to represent good health. From the 1995 questionnaire, the same question was used and recoded identically, but questions about labor market status, education level, marital status, occupation, social support, cash margin, smoking, alcohol intake, weight, and height were also used. The labor market status variables were chosen to define the exposure to unemployment, while other variables were chosen due to them being listed among the most commonly used variables in studies similar to ours in a recent review [2], which hints that they are potential confounders in the relationship between health and labor market status, and because these variables were collected in the Northern Swedish Cohort [11]. Self-rated health in 1995 allowed the health difference due to current/recent unemployment in 1995 to be taken into account in our analyses.

### Definition of labor market status

For labor market status, unemployment was used as the exposure and compared with employment as the non-exposure. We defined the participants’ labor market status based on self-reported labor market status during the last three years using questionnaire responses from 1995. In the questionnaire, a tick for employment status(es) was given for each of the half years between autumn 1986 and autumn 1995 (the time period since the previous follow-up). From the listed employment statuses in the questionnaire, the alternatives “Full-time employment”, “Part-time (20–39 hours a week) employment” and “Labor market measure” were defined as “Employed”, the alternative “Unemployed” was defined as “Unemployed”, and the alternatives “University/high-school”, “Other education”, “Casual job (< 20 hours a week)”, “Sick leave”, “On parental leave”, and “Other” were defined as not being active in the labor market. A tick as unemployed during any of the six half-year periods between autumn 1992 and autumn 1995 defined the participant as unemployed in our study. Participants not defined as unemployed were defined as employed in our study if they had a tick for any of the “employed” alternatives for at least three half-year periods during the same time period of 1992 to 1995.

To be considered to have no unemployment between follow-ups, which was a criterion for being included in our study, participants’ responses to employment statuses between spring 1996 and autumn 2007 in the 2007 questionnaire were used. Participants were defined as having no unemployment during this time period if besides no unemployment being reported they also had at least three ticks for alternatives defined as employed during the period between spring 1996 and autumn 2007. Thus, we compared those with unemployment (the exposed group) during the ages of 28–30 years old with those who were employed (the reference group) during these ages, and we only made comparisons between individuals with no unemployment between 30 and 42 years of age. Requiring employment during at least 12 of the 24 time periods would have decreased our sample from 620 to 608 individuals, thus having only a small influence on the results.

### Potential confounding variables at age 30

Education level was divided into three groups – at most 2 years of secondary education, 3–4 years of secondary education (referred to as “upper secondary education”), and university studies (bachelors degree or completion of other education at higher level) – with at most 2 years of secondary education being the reference group. For marital status, the alternatives “living with wife/husband” and “living with cohabitant/partner” were defined as “married” and were used as the reference group, and the other alternatives of “alone”, “with a friend”, and “other” were defined as “single” and used as the exposure group. A socio-economic index was derived for each respondent from their description of their occupation based on the nomenclature used by Statistics Sweden [22], with blue-collar workers (codes 11–22 and 89) as the reference group and low-level white-collar workers (codes 31–36) and medium- to high-level white-collar workers (codes 42–79) as the exposure groups. For gender, men were used as the exposure group.

For measuring social support, we used the Availability of Social Integration (AVSI) and Availability of Attachment (AVAT) instruments, which are part of the Interview Schedule for Social Interaction [23]. The AVSI consists of four questions with six response alternatives, and the AVAT consists of six questions with four alternatives. In both cases the questions are summed together for a maximum score of 24. For the AVSI, we used a cut-off of 13 or lower as the reference group, and for the AVAT we used a cut-off of 10 or higher as the reference group. For cash margin, the availability of 13,000 SEK (corresponding to 1366 euro on 28 March 2017) within a week was used as the reference group, and no availability of 13,000 SEK within a week was used as the exposure. For smoking, “not a current smoker” was used as the reference group and was compared with i) “smoking at most 10 cigarettes a day” or “smoking pipe or smoking cigar”, and ii) “smoking more than 10 cigarettes a day”.

The total alcohol consumption in centiliters of pure alcohol per year for a study participant was calculated based on six questions, one for frequency and one for the amount of intake on each drinking occasion for each of the alcoholic beverages of beer, wine, and spirits. The frequency questions were almost identical for all three alcoholic beverages, with “never” valued as 0, “every or every second day” as 250, “1–2 times a week” as 80, “a few times a month” as 12, and “more seldom” as 6. The questions for the amount of intake of alcohol on each occasion had 7–9 response alternatives for each beverage. The scores for these responses are presented in Additional file 1: Table S1. For each of the beverages, there were also weights corresponding to the alcohol percentage – 0.05 for beer, 0.14 for wine, and 0.40 for spirits. The total alcohol consumption was calculated as the sum of alcohol intake for each beverage. For each beverage, the alcohol intake during a year was calculated by multiplying the frequency score, the amount score, and the weight. The alcohol intake score has been used in previous studies of the Northern Swedish Cohort and is considered to work well [7]. An alcohol intake score below 140 was used as the reference value for our analyses.

Cutoff-values for the AVSI, AVAT, and alcohol intake were chosen to build two groups containing approximately equal numbers of individuals in both. Body mass index (BMI) was derived from self-reported weight and height and calculated as weight/height^{2}. Those with BMI ≥ 30 kg/cm^{2} were defined as obese and those with BMI between 25 and 30 kg/cm^{2} were defined as overweight, and the two groups were used as the exposed groups.

### Statistics

Logistic regression, G-computation, and a method using propensity scores were used to analyze the effect on health from unemployment. The logistic regression model studies the effect of unemployment on health by comparing groups, and it is the most commonly used method for studies of the relationship between unemployment and health [2]. The other two methods estimate the risk difference using counterfactual arguments. The risk difference is estimated with E[Y(1)] − E[Y(0)], where E[Y(1)] corresponds to the expected effect if all individuals are unemployed and E[Y(0)] corresponds to the expected effect if all individuals are employed. Thus, the advantage with these methods estimators are that they correspond to the marginal effect of becoming unemployed.

The procedure for our analysis was to first include all identified potentially confounding variables in a *full* model. Thereafter we applied a *reduced* model using only the significant variables in the full model. In the reduced model, we used the same participants as in the full model to allow for comparisons between models, which meant that 15 individuals with complete information for variables in the reduced model were excluded in these analyses. Interactions between variables were not considered in our analyses. We tested models that included and excluded our candidate variables in the reduced and full model, but these only had a limited effect on the estimate of unemployment on health. We therefore included education level, marital status, and occupation in the reduced and full models despite these potentially being collinear variables.

Propensity scores were introduced in 1983 by Rosenbaum and Rubin [24], but these have rarely been used to study the effect of unemployment on health [2]. The propensity score is the conditional probability of being assigned to the exposure group based on baseline covariates. This implies that an exposed and unexposed individual with the same propensity score should have had the same probability of being exposed. If the estimates of the propensity score are unbiased, which cannot be assumed, these groups would then correspond to those in a randomized controlled trial (RCT). The bias of the estimates depends on how well the propensity score can balance both measured and unmeasured confounders. The strength of the RCT is that the balance of confounders can well be accomplished due to the randomization if the study protocol is followed, something that an observational study cannot accomplish in the study design.

The inverse probability weight (IPW) was defined as \( w=\frac{X}{PS}+\frac{1- X}{1- PS} \), where *X* refers to the exposure (employed/unemployed), and *PS* to the estimate of the propensity score. We used an IPW estimator, as suggested by Lunceford and Davidian [25], to estimate the risk difference

$$ {RD}_{IPW}=\frac{1}{n}\left[\sum_{i=1}^n{Y}_i\left(\frac{X_i}{PS_i}-\frac{\left(1-{X}_i\right)}{1-{PS}_i}\right)\right], $$

where *Y* refers to the outcome (self-rated health). The marginal effect from this estimator corresponds to the average treatment effect [26]. In our study, logistic regression, with covariates from the statistical model, was used to estimate the propensity scores, which correspond to the probability of being unemployed for an individual based on his or her characteristics.

We calculated the standardized difference for each covariate in the reduced model to assess the balance of covariates between the employed and unemployed group, both with and without a weight [27]. For the unweighted sample the standardized difference was defined as

$$ d=100\frac{\left({\widehat{p}}_{unemployed}-{\widehat{p}}_{employed}\right)}{\sqrt{\frac{{\widehat{p}}_{unemployed}\left(1-{\widehat{p}}_{unemployed}\right)+{\widehat{p}}_{employed}\left(1-{\widehat{p}}_{employed}\right)}{2}}}, $$

where the denominator is the pooled standard deviation and \( \widehat{p} \) is the estimated proportion of exposed individuals for the covariate. For categorical variables with three outcomes, two dummy variables were created and the reference group for the covariate was set to 0. For the weighted sample, the pooled standard deviation was calculated with

$$ \sqrt{\frac{\sum {w}_i}{{\left(\sum {w}_i\right)}^2-\sum {w}_i^2}\sum {w}_i{\left({x}_i-{\overset{-}{x}}_{w eight}\right)}^2,} $$

where \( {\overset{-}{x}}_{w eight}=\frac{\sum {w}_i{x}_i}{\sum {w}_i} \) is the weighted prevalence for the covariate, and \( \widehat{p} \) was calculated with \( {\overset{-}{x}}_{w eight}=\frac{\sum {w}_i{x}_i}{\sum {w}_i} \), but for employed and unemployed groups separately.

G-computation is a similar regression method as logistic regression, but it differs in that it aims to estimate marginal effects [28]. For the G-computation, logistic regression was first performed with all variables in the statistical model, including labor market status. The risk difference was thereafter estimated based on the logistic regression as the difference between the expected effect if all individuals are unemployed and the expected effect if all individuals are employed. Results from our G-computation estimator can be directly compared with those from the IPW estimator.

We performed stratified analyses for the variables in the reduced model. We also performed stratified analyses for men and women because it has been shown in several studies that the effect of unemployment on health differs for the two groups [2]. In some cases our stratified analyses used smaller samples than has been recommended for logistic regression [29], which is at least 10 events per variable for the less-common outcomes. Such situations only occurred rarely for the logistic regression when self-rated health in 2007 was used as the outcome variable, but it became more of a problem when labor market status was used as the outcome variable for the estimation of propensity scores. We have highlighted in the results section when this criterion was not fulfilled.

R Studio was used for all statistical analyses (R Studio, Boston, MA). The GLM procedure in R was used for logistic regression estimates, and confidence intervals for the estimator were derived with the profile likelihood [30]. The Bootstrap technique with replacement was used to derive estimates of the mean square error for the IPW and G-computation estimators [31]. The 2.5% and 97.5% percentiles of the Bootstrap simulations were used to calculate the 95% confidence intervals. Sensitivity analyses were performed for the exclusion criteria of no unemployment during follow-up. In the first analysis, no individuals were excluded due to unemployment during the follow-up. In the second analysis, a variable was introduced with those defined as having unemployment during the follow-up as the exposed group and those without unemployment as the reference group. Pearson’s χ^{2}-test was used to test if the exposure variable (labor market status) was associated with potential confounders. Statistical significance was defined at the 5% level.