### Data sources

The data material comes from a cohort consisting of all males born in Norway between 1971 and 1976 (n =184 951). The gender restriction allowed for using military conscript data (IQ, BMI, military eligibility check). Baseline characteristics and data on work participation, education and health-related absence are available from several national population-wide registries. Personal identification numbers allowed for linking within study subjects and between subjects and parents across several registries: Statistics Norway’s events database on employment and welfare, FD-Trygd, The National Education Database, The Armed Forces Personnel Data Base and the registries of the Norwegian Labour and Welfare Administration.

### Study population

All individuals were included in the study from the 1st of July the year they turned 23 (1994–1999) and followed for 12.5 years, until 31st of December (2006–2011). Individuals who emigrated before the study start were completely removed from the dataset. Emigration occurring after the start of follow-up resulted in temporary removal from the dataset, given that the individual returned. This is also how paternal leave (paid leave a father takes off work at the birth or adoption of a child) was handled. Furthermore, we only included those who had started an upper secondary education program before the year they turned 21 (96.2% of the alive population). Individuals also needed complete conscript data, information on parental SEP (education and income) and valid follow-up data to be included. Invalid follow-up data in our study, could for instance be to start at disability pension or only being observed in paternal leave. After applying exclusion criteria, we ended up with a sub-cohort of 155 852 individuals for analysis; 62803 students of general studies and 93049 students in vocational tracks. See Fig. 1 for an overview of the study selection and exclusion process.

Throughout the follow-up period we had individual state trajectories for every subject. The register databases from where we create state trajectories exhibit a high degree of completeness. More so, near 100 percent of the individuals in the study (after applying the exclusion criteria listed in Fig. 1) were under observation at the inclusion date. There is a small number of individuals not registered in any state at that time who enters observation shortly after. At the end of follow-up, 95 percent were still under observation, and it was assumed that missingness was not informative, although this could be investigated and, if necessary, alleviated with inverse probability of censoring weights [32, 33].

### Multi-state model for work participation, health-related absence and education

To assess the effects of completion, we fitted a multi-state model for analysing time-to-event outcomes. An event is here a transition between different states. These states were *work*, *unemployment*, *education* (tertiary), *sick leave* and *disability*. The states and all possible transitions between them are summarised in Fig. 2. In some cases e.g. for students working part time or individuals on partial sick leave while working part-time, a decision had to be made regarding which state they belonged to. To handle such issues and keep the number of states manageable, states were given different precedence when individuals qualified to more than one state at the same time. In order of decreasing precedence we prioritized disability, sick leave, unemployment, work over lastly, education. An exception was made whenever work resulted in yearly income less than 2G; 1G being defined by the Norwegian Labour and Welfare Administration to calculate various types of welfare pensions (1G as of 1. May 2017 equals 93 634 NOK). In the case of income less than 2G, education would have precedence over work. 2G was used to separate between students and workers, as it is slightly over the maximum allowed income for entitlement to full student loan and grant. This ensured that most students working part-time were considered students (education state), while full-time employees, earning more than 2G, could be attending educational courses and still be considered workers (work state). Only a few observations were made of people leaving the disability state during follow-up, we therefore considered disability as an absorbing state. Additionally, we did not consider transitions directly into disability from work and education due to the rarity of such events, and because such transitions would normally not follow regulatory laws concerning disability allowance and are likely to be purely administrative artefacts.

### Exposure variable

We considered our exposure *completing upper secondary education* as having obtained a degree by July the 1st the year the subject turned 23. Individuals who did not finish within this time frame were either dropouts or had delayed completion, hereafter referred to as non-completers. The information on obtained degrees came from The National Education Database.

### Baseline covariates

Through various administrative registers we had access to numerous covariates at individual level. The variables included in our analysis fell into three categories: 1) *family background variables*: parental education, income, disability history and mother’s marital status – together accounting for an individual’s background and upbringing. 2) *Individual variables*: IQ, BMI, military eligibility check (mental and physical health) and childhood chronic disease history – accounting for cognitive ability and health. 3) *Societal/contextual variables*: regional unemployment rate and year of birth. Year of birth is included to adjust for an increasing completion rate in later birth cohorts and economic cycles of recession and boom periods. Summarising statistics of the covariates are shown in Table 3. Reference categories used in the Cox models are marked by (*) in the table.

### Fitting multi-state models

To estimate state probabilities, we first introduce some standard multi-state notation. Let *X*(*t*) denote the state of an individual at time *t*. The transition probability matrix **P**(*s*,*t*) shall consist of elements *P*_{
hj
}(*s*,*t*)=*P*(*X*(*t*)=*j*|*X*(*s*)=*h*), which represent the probability of going from state *h* to state *j* during the time interval (*s*,*t*]. By assuming that the model is Markov, which is an assumption we discuss later, this transition probability matrix can be estimated with the matrix product-formula:

$$ \hat{\mathbf{P}}(s,t) = \underset{u \in (s,t]}{\prod} (\mathbf{I} + d \hat{\mathbf{A}}(u)), $$

(1)

where \(\hat {\mathbf {A}}(u)\) is the cumulative transition intensity matrix at time *u* [34]. The matrices \(\hat {\mathbf {A}}(u)\) can for instance be identified with transition specific Nelson-Aalen estimates.

If we model transitions by conditioning on baseline covariates *Z*, the matrix product-formula becomes:

$$ \hat{\mathbf{P}}_{Z}(s,t) = \underset{u \in (s,t]}{\prod} \left(\mathbf{I} + d \hat{\mathbf{A}}_{Z}(u)\right), $$

(2)

Now \(\hat {\mathbf {A}}_{Z}(u)\) is the conditional cumulative transition intensity matrix at time *u*. The elements of \(\hat {\mathbf {A}}_{Z}(u)\) can, for example, be estimated from Cox proportional hazards models together with a non-parametric estimator for the baseline hazard [34].

Given the transition probability matrix, we can calculate the probability of being in state *j* at time *t* when starting in state *h* by *P*_{
hj
}(0,*t*). More generally, the probability of being in state *j* at time *t*, so-called state probabilities, can be calculated by:

$$ \hat{P}(X(t)=j) = \sum_{k} \hat{P}_{kj}(0,t) \cdot \hat{P}(X(0)=k). $$

(3)

Without covariates, *P*(*X*(0)=*k*) is calculated by the proportion of subjects entering the study in state *k* at time equal zero. In covariate adjusted models, we may need to estimate *P*(*X*(0)=*k*|*Z*) – for instance by predicting from a multinomial logistic regression using starting state as outcome.

The multi-state models in our analyses rely on a Markov assumption, which requires that the instantaneous risk of transition to any other state, only depends on the current state and not the state history. With data on unemployment and health-related absence, which abides regulatory laws regarding maximum allowed length of stay, the Markov assumption will typically be violated for some types of transitions. Deviations from the assumption could be explored and perhaps alleviated through semi-Markov models [26, 27, 29, 30]. However, when focusing on estimating state probabilities, the estimator in Eq. 3 has been proven to be consistent, also in the presence of violations to the Markov assumption [35].

### Inverse probability weighted multi-state models

We can adjust for covariates by fitting a multivariate hazard model, e.g. by Cox regression, for each possible transition in the multi-state model. However, when calculating state probabilities, we must then explicitly specify values for all covariates. It is impractical and often infeasible to do such calculations for all covariate patterns. When the aim is to adjust for confounding when identifying the effect of a main exposure, a multivariate regression approach will also not give such an effect directly. A better alternative is then to estimate the *average effect* of the exposure over all observed covariate patterns in the population using inverse probability of treatment weighting (IPTW) [36]. In order to do this, we estimate each subject’s probability of exposure given covariates. The idea is then to weight all observations with the corresponding individual’s inverse probability of treatment. Before weighting, completion and non-completion are unevenly spread over different covariate patterns, while in the weighted dataset both exposures are balanced (equally represented) across the patterns. This means completion and non-completion can be compared without adjusting for other baseline covariates.

To estimate the weights, we fitted a logistic regression model for completion versus non-completion, which was used to calculate probabilities of each individual’s exposure based on their covariates. After applying the weights, the only covariate that remains to be controlled for is the main exposure. This means we may predict state probabilities for completion and non-completion using either weighted univariate hazard regression models that only includes the exposure, or weighted Nelson-Aalen estimates for transition hazards in each exposure group separately.

Interpreting the effect from the IPTW analysis as the marginal effect of completion in the population, depends on various model and causal assumptions [18]. The most central among the latter assumptions, is the one of no unmeasured confounding; that we sufficiently adjust for all common causes of completing upper secondary education and later states. Furthermore, there must be a positive probability of completion and non-completion for all observed covariate values and consistency of the exposure [37, 38]. In addition, the model for estimating the weights must also be correctly specified.

### Technical remarks on model fitting

All analyses were done in R [39] version 3.4.1 using the survival- and mstate libraries [34]. Note that fitting multivariate Cox models for multi-state data of this magnitude and subsequently applying the matrix-product formula are computationally intensive tasks and could be problematic on standard computers and laptops. This work was partly performed on the Abel Cluster, owned by the University of Oslo and the Norwegian Metacenter for High Performance Computing (NOTUR), and operated by the Department for Research Computing at USIT, the University of Oslo IT Department [40].

The inverse probability weighted analysis, on the other hand, is more manageable on a regular computer. This is due to the fact that the logistic model is quick to optimise with an iterative reweighted least squares algorithm. The subsequent multi-state analysis can either be based on a univariate weighted hazard regression model or weighted Nelson-Aalen estimates for each exposure group, which is considerably less demanding in terms of computing power than a multivariate model.