Data
This study utilised data from the Understanding Society UK Household Longitudinal Study [43]. This longitudinal survey collects data through a self-completion online survey or through face-to-face interviews with UK adults aged 16 and above. It aims to study the effect of social, economic and policy change on the well-being of the UK population. Data from wave 9 (2017–19) and wave 10 (2018–2020) were used in this study as loneliness was only included as a variable in data collection from wave 9 onwards. Analysis was restricted to the working age population, with those aged over 65 excluded from the sample.
Loneliness was assessed though the direct question ‘how often do you feel lonely?’. Responses were recorded using three levels: hardly ever/never, some of the time, often. Missing responses for loneliness were excluded from analysis and loneliness was transformed into a binary variable. Loneliness was considered present where an individual reported feeling lonely often. Unemployment was assessed by the question ‘which of these best describes your current employment situation?’ Employment status was described across 12 categories and transformed into a binary variable, being unemployed or employed, for analysis. The data did not differentiate between unemployed individuals looking for work and those not looking for work. Therefore, in this study unemployment includes both individuals experiencing involuntary and voluntary unemployment. Individuals in full-time education, or with ambiguity on work status were excluded from analysis (see Table A1, Appendix 1). Those permanently sick or disabled were considered unemployed but were omitted from the data in sensitivity analysis given the uncertain extent to which long-term sickness or disability were due to loneliness in our sample.
Econometric analysis
The analysis was based on propensity score matching (PSM) [44,45,46,47]. PSM enables causal inference from observational data by mimicking experimental randomisation [48] of observed individuals to treatment and control groups, defined respectively by their having and not having a relevant condition, as individuals are matched across the two groups on the basis of the propensity score. The propensity score is defined as the latent probability of exposure to the condition and encompasses all selected confounders in a single value bringing simplicity and reducing bias in estimation of a treatment effect as confounders are balanced between treatment groups [47]. The use of PSM in this study builds on recent research in health, loneliness, and occupation status where PSM is used to balance covariates, minimise potential confounding and emulate randomisation [32,33,34,35,36,37]. The main purpose of this study was to use PSM to estimate the treatment effect of loneliness on unemployment. A secondary objective was to estimate the effect of unemployment on loneliness, conducted as exploratory analysis. The use of PSM rather than regression adjustment provides greater flexibility, particularly in analysis of the effect of unemployment on loneliness where exposure to unemployment (risk factor) is common but loneliness (outcome) reported less frequently [48]. PSM also mimics a randomised trial where participant characteristics do not dictate exposure and thus reduces selection bias. Finally, PSM demands overlap in propensities across exposure groups to ensure a sufficient number of similar individuals in the treated (or exposed) and untreated (or non-exposed) groups are included in the analysis. This ensures a representative comparison between exposed and unexposed as a whole, rather than focus on the upper and lower bounds or best and worst case scenarios [48].
Data were analysed in STATA/SE 16.0. PSM was conducted in three steps: estimation of propensity score; matching; and stratification. First, given the shortcomings of the linear model, particularly with skewed data, the propensity score was estimated using a probit regression [44]. Covariates were selected based on known risk factors for loneliness. Proposed variables were confirmed by comparison with studies considering the effect of loneliness on daily outcomes [49, 50] and variable availability in the survey data. The final selection of covariates included: age, gender, ethnicity, education, marital status, household composition, number of own children in household and region. Consistent with recent PSM studies in health and employment [38, 39], square terms were not used for matching. The propensity score (e(xi)) is the individual probability of exposure (z = 1, in this case loneliness) vs non-exposure (control (z = 0), in this case no experience of loneliness), given a number of observed characteristics (xi) [51]:
$$e({x}_{i}) = Pr(z=1|{x}_{i})$$
where xi is a vector of covariates including agei, genderi, ethnicityi, educationi, marital statusi, household compositioni, number of own children in householdi and regioni for individual i in the sample.
The propensity score provides an indicator of similarities between individuals by combining a set of covariates (xi) into a single dimension (scalar) thus facilitating their comparison. Covariates included in the model simultaneously affect the treatment decision and the outcome under consideration, and are unaffected by participation in the treatment decision or the anticipation of participation [44]. This allows us to mimic a randomised trial where the participant characteristics (covariates) do not dictate treatment allocation in the sample selected for analysis using PSM. However, a balance had to be struck between excluding only covariates unrelated to the outcome [44, 52], while also avoiding over-parameterisation which exacerbates the support problem and increases variance [44, 53]. Covariate selection was also informed by existing PSM analyses of loneliness [33, 34, 37], however these focussed on loneliness as an outcome and therefore not all cited covariates were considered appropriate at this stage.
Secondly, individuals were matched based on the aforementioned covariates across exposed and non-exposed groups, according to their propensity scores. Loneliness was the exposure and unemployment the outcome. Matching was based on the nearest-neighbour method as used in previous studies utilising PSM with loneliness as an outcome [33, 37]. Nearest-neighbour matching ensured a matching partner for an exposed individual was selected from the comparison group based on the closest propensity score. As fewer respondents had experience of loneliness (exposed) than no experience of loneliness (unexposed) each individual in the exposed group was matched with at least one individual from the unexposed. Meanwhile a single match was found for each of the unexposed individuals. Respondents were matched to their closest neighbour. There was no maximum distance for matches as no caliper was specified in order to retain all observations and ensure no data were lost. This approach was consistent with recent PSM study in loneliness and major life events [37]. Finally, stratification was conducted by comparing mean outcomes across people in different groups (strata), exposed vs non-exposed (or treated vs non-treated), with similar propensity scores.
Following the process of matching based on age, gender, ethnicity, education, marital status, household composition, number of own children in household and region, the Average Treatment Effect (ATE) comparing outcomes between those with any experience of loneliness to no experience of loneliness was estimated [51]:
$$ATE=E\left({r}_{1}\right)-E({r}_{0})$$
where E(rj) denotes the expected probability of outcome r (unemployment) in group j. Here j = 1 for individuals with any experience of loneliness in waves 9 or 10; j = 0 otherwise (no experience of loneliness).
The exposure effect of loneliness on unemployment was tested from both a cross-sectional and longitudinal perspective using waves 9 and 10 of the Understanding Society dataset. This allows comparison to existing cross-sectional studies while additionally expanding findings to achieve causal inference. In the cross-sectional analysis, each wave was considered independently estimating whether present day loneliness contributes to current unemployment in wave 9 (model 1) and wave 10 (model 2). In longitudinal analysis, data from both waves 9 and 10 of the dataset were utilised to understand the ATE of loneliness at any time point (j = 1), when compared to no experience of loneliness (j = 0), on unemployment in wave 10 (model 3). In all models exposure to loneliness was compared to a control of no experience of loneliness. Statistical significance was tested by p-values and 95% confidence intervals, and the null hypothesis rejected where at least 5-percent statistical significance was not achieved.
Model heterogeneity and balance
Heterogeneous treatment effects were assessed in cross-sectional data using the matching-smoothing method. Matching-smoothing retains individual-level information before making cross-individual comparisons and so overcomes the assumption of homogeneity within strata in detecting heterogeneous treatment effects [54]. Results are represented by a plot of exposed and non-exposed individuals against a continuous propensity score before a local polynomial fit of degree 1 (local-linear smoothing) is fitted to the matched difference yielding a pattern of treatment effect heterogeneity [54]. Additionally, the two cross-sectional models were combined in single panel data probit random effect models allowing for contemporaneous effects while adjusting for unobserved heterogeneity across individuals. Random effect models were run on the propensity score matched sample using panel data probit regression. As no caliper was specified all respondents had a match and so random effects were run on the full sample. The model was first run with only loneliness included as a covariate then run again including the additional aforementioned covariates:
$${Y}_{it}+1={X}_{it}\beta +\alpha +{\varepsilon }_{it}$$
where Yit is the unemployment outcome; Xit is a vector of covariates including lonelinessit, ageit, genderit, ethnicityit, educationit, marital statusit, household compositionit, number of own children in householdit and regionit for individual i in the sample; and α is the unobserved normally distributed mean zero random effects varying across individuals but not waves, and εit is a mean zero normally distributed random error varying across individuals and waves. Age was included as a continuous variable in years; female, white ethnicity, lower than higher education, married/civil partnership, presence of ≥ 2 adults in household, presence of ≥ 1 children in household, Southern England residence, loneliness and unemployment were coded as binary. Further detail of this coding is provided in Table A2, Appendix 1.
Random effects were then run using baseline propensity score as an importance weight in panel data probit regression. Fixed effect methods were also explored however not included in this paper since only a few individuals changed employment status. Covariate balance was assessed through analysis of standardised differences and variance ratios across raw and matched data. Balance was signified by standardised differences less than 0.1 and variance ratios close to 1 [48].
Sensitivity analysis
Sensitivity analysis was first conducted by reclassifying loneliness to include those individuals who reported experiencing loneliness ‘sometimes’. Unemployment was reclassified in sensitivity analysis by excluding those who were permanently sick or disabled at baseline from the analysis. Analysis was also run excluding early retirees from the sample. The addition of covariates for life-satisfaction and physical health limitations was also explored. Overall life satisfaction was measured by a single item question with seven response levels ranging from completely dissatisfied to completely satisfied. Physical health was measured with respect to how much current physical health limited the amount of work conducted by an individual in the last 4 weeks. While neither were included as a covariate in previous studies, they are risk factors for loneliness [7, 9, 10, 55] and their impact was assessed in sensitivity analysis. Furthermore, the inclusion of work limiting physical health facilitated some understanding of the potential for health-related interaction effects. While included in previous studies, it is conceivable that both loneliness in the baseline period and incident unemployment in the intervening period may influence marital status and household composition. These variables are more likely to affect than be affected by loneliness, however for robustness they were excluded in sensitivity analysis.
Subgroup analysis
Subgroup analysis was conducted on the matched sample to understand the impact of change in loneliness over time (between waves 9 and 10). Indicator variables were created to distinguish between sustained loneliness across the two waves, the onset of loneliness in the intervening period, the change from experiencing to not experiencing loneliness, and no experience of loneliness in either time point. These variables were included in a probit regression model with wave 10 unemployment as the outcome (model 4a) and latent linear index:
where D1 = 1 if sustained loneliness, 0 otherwise; D2 = 1 if onset of feeling lonely, 0 otherwise; D3 = 1 if moving from experiencing loneliness to not experiencing loneliness, 0 otherwise; and e is the error term. The constant a represents the expected outcome for the reference group without any feeling of loneliness throughout the period of analysis (i.e. D1 = D2 = D3 = 0). A dummy variable for ‘never’ feeling lonely was not included thus avoiding the dummy variable trap. This analysis was also modelled including all aforementioned baseline covariates (model 4b) to further explore potential heterogeneity. Covariates were coded as in the random effects models (Table A2, Appendix 1) to aid interpretation of coefficients.
Additionally, analysis of change in employment status was conducted in the subgroup of individuals employed in wave 9 to eliminate the contemporaneous impact of unmeasured background confounders. Probit regression of whether the respondent became unemployed in wave 10 or not as a function of loneliness in wave 9 was conducted. Analysis was first undertaken with only wave 9 loneliness included as an independent variable (simple model) and then repeated including covariates at baseline (full model). Again, variables were coded as in Table A2, Appendix 1. Finally, subgroup analysis considered the difference in outcomes across male and female respondents, and across age groups.
Exploration of bidirectionality
Additional exploratory analysis was conducted with unemployment treated as the exposure to consider the reverse impact of being unemployed on experience of loneliness. PSM analysis evaluated the impact of unemployment in wave 9 and/or wave 10 on the outcome of loneliness at wave 10. Furthermore, subgroup analysis on the matched sample revealed the differential impact of sustained unemployment, becoming unemployed, and becoming employed on loneliness, relative to being employed in both waves. A covariate for general health was included in the main model given the potential association of health outcomes and unemployment, while covariates for household composition and number of own children in the household were omitted, in line with existing PSM research on the effects of unemployment [34, 37]. Thus, individuals were matched based on age, gender, ethnicity, education, marital status, region of residence, and health at baseline.