Study design and participants
Data were drawn from the COVID-19 Social Study; a large panel study of the psychological and social experiences of over 75,000 adults (aged 18+) in the UK during the COVID-19 pandemic. The study commenced on 21 March 2020 and involves online data collection from participants for the duration of the COVID-19 pandemic. Data were initially collected weekly (through August 2020), then monthly thereafter. The study is not random and therefore is not representative of the UK population. But it does contain a well-stratified sample that was recruited using three primary approaches outlined in the Supplemental Materials and in the study User Guide (https://osf.io/jm8ra/). The study was approved by the UCL Research Ethics Committee [approval number 12467/005], performed in accordance with the Declaration of Helsinki, and all participants gave informed consent. Participants were not compensated for participation.
We included participants who met the five criteria outlined in Fig. 1. First, participants were included if they had participated in the November 2021 survey and said that they had at some prior point been infected with COVID-19 (see Supplemental Table S1 for question wording). Second, the date given for their COVID-19 infection had to be non-missing and had to be no earlier than 27 April 2020 and at least 5 weeks prior to completion of the specific questions on long COVID. 27 April 2020 was chosen as we were interested in health behaviours in the month prior to COVID-19 infection, and the collection of all individual items comprising these variables commenced 13 April 2020. Five weeks was chosen as the minimum time period as many studies on long COVID apply a threshold of “more than four weeks of symptoms” to be experienced for the term long COVID to be applied [5, 6]. Third, participants who had had COVID-19 only once were included; participants who reported more than one infection were excluded to avoid overlapping symptoms from the two infections. Fourth participants had to have participated in the study in the month prior to the date of their infection to gather health behaviour data. Fifth, participants had to have non-missing data on long COVID outcome variables (presence/absence and specific long COVID symptoms) and study variables required to calculate statistical weights (gender, age, ethnicity, country, and education). The final analytic sample comprised 1581.
We used multiple imputation by chained equations to generate 50 imputed datasets for participants who met all study inclusion criteria but had missing data on other study variables (Supplemental Table S2). Imputation models included all study variables as well as auxiliary variables (e.g., home ownership status, depressive symptoms at baseline). Substantive results using cases without any missing data and the imputed sample were similar (Supplemental Tables S3-S6). See Supplemental Table S7 for a comparison of excluded and included participants on study variables.
Patient and public involvement
The research questions in the UCL COVID-19 Social Study built on patient and public involvement as part of the UKRI MARCH Mental Health Research Network, which focuses on social, cultural and community engagement and mental health. This highlighted priority research questions and measures for this study. Patients and the public were additionally involved in the recruitment of participants to the study and are actively involved in plans for the dissemination of findings from the study.
Measures
Outcome variables
The presence of long COVID was measured with a binary variable in response to a study-developed question (Supplemental Table S1): no vs yes (formally diagnosed or suspected). Sensitivity analyses tested whether results were consistent when including participants who were “unsure” about whether they had had long COVID within the case group.
To look at the presence of three specific long COVID symptoms, three variables were operationalised from questions assessing the extent to which participants had difficulty with (i) mobility, (ii) cognition, and (iii) self-care (Supplemental Table S7). Response options were treated as binary (present vs absent) in analyses due to low numbers within response categories.
Predictor variables
Health behaviours
Six health behaviours in the month prior to COVID-19 infection were considered (Supplemental Table S1). Data starting with 2 weeks before the COVID-19 infection were used, and if unavailable, data from 3 weeks, then four, up to 6 weeks (Supplemental Table S8). Weekly exercise frequency was operationalised as none vs < 30 minutes to 2 hours vs 3 hours or more, the latter of which reflects current weekly physical activity recommendations in the UK [23]. A count of the number of days participants had left the house in the past week for at least 15 minutes was also included. Weekly sleep quality was operationalised as very good/good vs average vs not good/very poor. Smoking (non-smoker/no smoking vs any smoking), and a binary variable indicating 14 or more weekly alcoholic drinks (vs < 14) were also included. Fourteen was chosen as the cut-off for alcohol consumption to reflect current recommendations on alcohol intake per week in the UK [24]. Finally, the number of mental health care behaviours was included (e.g., taken medications, spoke to somebody on a support line). Because increasing weight and obesity are associated with long COVID [6, 8, 9], and are also risk factors for chronic disease independent of physical activity [25], we conducted sensitivity analyses with a variable reflecting overweight status collected in June 2020 (slightly underweight or normal weight vs slightly overweight or very overweight).
Covariates
COVID-19 infection variables
COVID-19 infection severity in the first 2 weeks was categorised into (i) asymptomatic, (ii) mild (experienced symptoms but was able to carry on with daily activities), (iii) moderate (experienced symptoms and had to rest in bed), and (iv) severe (participant was hospitalised).
A variable indicating which strain of the virus was dominant in the UK [26] at the time of infection was coded as (0) the original COVID-19 variant (31 January to 31 October 2020, (1) Alpha (1 November 2020 to 30 June 2021), (2) Delta (1 July 2021 to 30 November 2021), and (3) Omicron (1 December 2021 onwards).
Socio-demographics
Socio-demographics were collected at baseline, which was participants’ first time taking part in the study: gender (male vs female), age (60+, 45–59, 30–44, and 18–29) ethnicity (white vs ethnic minority groups [i.e., Asian/Asian British, etc. See Supplemental Table S1 for a full listing of response options]), education (undergraduate degree or higher, A-levels/vocational training, and up to GCSE (General Certificate of Secondary Education), low income (<£30,0000), employment status (not employed [i.e., at school/ university, unable to work due to disability, etc] vs employed, government’s identified key worker status (vs not a key worker), crowded household (< one room per person), living arrangement (living alone vs living with others but not including children vs living with others, including children), and area of dwelling (urban vs rural).
Pre-existing health conditions
Participants reported whether they had received a clinical diagnosis of a mental health condition (e.g., depression, anxiety) or chronic physical health condition (e.g., high blood pressure, diabetes). Two binary variables to indicate the presence of pre-existing physical and mental health conditions.
Statistical analysis
First, binary logistic regression models were fitted to examine associations of health behaviours in the month before infection with COVID-19 and the development of long COVID. Second, binary logistic regression models were fitted to examine associations between health behaviours in the month prior to COVID-19 infection and the presence of each of the three specific long COVID symptoms (difficulty with mobility, cognition, and self-care) amongst participants with long COVID.
For both sets of analyses, Model 1 included only health behaviours in the same model, Model 2 additionally adjusted for COVID-19 infection variables, Model 3 additionally adjusted for socio-demographic characteristics, and Model 4 additionally adjusted for pre-existing health conditions. Robust standard errors were used in all analyses. Coefficients from the binary logistic regressions were exponentiated and presented as odds ratios (OR).
To and increase representativeness of the UK general population, weights were applied throughout all analyses. The sample was weighted to the proportions of gender, age, ethnicity, country, and education in the UK population obtained from the Office for National Statistics [27]. A multivariate reweighting method was implemented using the Stata user written command ‘ebalance’ [28]. Analyses were conducted using Stata version 16 [29].