Data source and study population
Data were drawn from a Korean national cohort study, the Korean Children & Youth Panel Survey (KCYPS), conducted by the National Youth Policy Institute. This longitudinal survey has been administered annually from 2010 to 2016 to monitor individual development including health-related behaviors and environment of children and youths over time. By leveraging stratified multi-stage cluster sampling methods, the KCYPS selects nationally representative sample of Korean adolescents and is composed of three cohorts: the first and fourth grades of elementary school and the first grade of middle school. The survey was conducted at randomly selected schools by stratifying into 16 administrative districts. One class was randomly selected and all students in the selected class conducted interviews with interviewers. However, if more than 80% of the student questionnaire was non-responded due to the student’s disability or disease, it was excluded from the final data. Questions that were difficult for students to answer directly, such as household income, were measured through a telephone interview with the parents.
To demonstrate developmental smoking patterns from adolescence to young adulthood, this study selected the cohort data for first grade middle school students (n = 2,351; 78 schools). The study population began the survey at 12–15 years old (the 1st grade), then they were followed up until 18–21 years of age. The response rate for the final survey wave in 2016 was 80.0% (n = 1,881). For the study, the last follow up point represents a young adulthood period, when most participants attended college or began their career. For the GBTM analysis, only respondents who participated in the final wave were included. Because the first wave in 2010 did not include a survey on smoking status, those data were excluded from the trajectory analysis. To reduce bias, along with the outcome variable, the data from the second wave in 2011 were used as the baseline measure for independent variables including age, gender, family income, number of days not supervised by a guardian after school, smoking friends, drinking experience, school adjustment, experience of health-related education, and mobile phone dependency. Subjects with missing sample weight values, and those who had not provided information on smoking status during all waves, were excluded. To account for the missing covariates, multiple imputation was employed. Overall, the study population comprised 1,723 subjects (853 males and 870 females).
Measurements
Tobacco use
Smoking experience and frequency were measured during the survey. If the subjects smoked occasionally within a year, they reported the smoking frequency in the past year. If the subjects smoked regularly, they reported the daily smoking frequency. Since most subjects were nonsmokers and the distribution of smoking frequency was skewed, we categorized subjects as ‘nonsmokers’ (no smoking within the past year), ‘experimenters’ (smoked occasionally within the past year), ‘daily smokers’ (smoked 1–9 times per day), or ‘heavy daily smokers’ (smoked > 10 times per day) [25]. Through the GBTM, this study used the smoking trajectory groups as the outcome variables.
Covariates
Sociodemographic, environmental, and intrapersonal characteristics were examined. Age, gender, family income at baseline (wave 2), type of high school, and college status at the last follow-up (aged 18–21 years) were the sociodemographic factors. Age at baseline was adjusted for the analyses. Family income was classified as low (tertile 1), medium (tertile 2), or high (tertile 3). Each categories of family income approximately ranged from less than 35 Million Won, 35–49.99 Million Won, to 50 Million Won and above. Type of high school, as measured in wave 4, was included as a dichotomous variable (general/vocational). The general high schools include all types of academic schools, while the vocational high schools include agricultural, technical, commercial schools and so on. According to college status at the last follow-up (aged 18–21 years), subjects were classified as ‘college students’ or ‘non-college students.’ The environmental characteristics were the number of smoking peers at baseline (classified as ‘none’ or ‘more than one’, and ‘almost none’) and number of days not supervised by a guardian after school per week at baseline (classified as ‘1–2 days’, or ‘ > 3 days’, respectively).
Intrapersonal factors included school adjustment, alcohol drinking experience within a year, experience with health-related educational activities, and mobile phone dependency. School adjustment was measured using a 5-item survey with four-point response scales (Supplementary Table S1). We transformed the response data so that higher scores reflected more successful adjustment (at baseline). For this variable, the subjects were classified into tertile groups (low/middle/high) which ranged from less than 13, 13–14, to 15 and above. For alcohol drinking experience, the subjects were asked if they had ever drank alcohol within a year. Alcohol drinking experience at baseline was applied as a dichotomous variable (yes/no) for this study. Experience with health-related educational activities was included to examine the effects of such activities during early adolescence. Youth extracurricular activities reflect experiential learning occurring within the school environment; students typically voluntarily participate in such activities [26]. For this study, experience with health-related educational activities at baseline was dichotomized (yes/no). Mobile phone dependency was measured using a 7-item survey with four-point response scales (Supplementary Table S1). We transformed the response data so that higher scores reflected greater dependency. Mobile phone dependency questionnaire was developed and validated by Lee et al. (2002) [27]. To examine the third hypothesis, we used two models of mobile phone dependency. In Model 1, we added all responses at baseline, and classified subjects into tertile groups (low/medium/high) which ranged from less than 14, 14–17, to 18 and above. In Model 2, we added all responses for each wave, and used the trajectories of mobile phone dependency identified by the GBTM. Thus, we explored the association between trajectories of mobile phone dependency and smoking throughout the life-course of adolescents and young adults.
Statistical analysis
To address missing covariate data and compare two multinomial logistic analysis models, we conducted multiple imputation analyses. Multiple imputation increases analysis efficiency and obtains unbiased estimates of the association between outcome and predictor variables [28]. The proportion of missing data at baseline ranged from 0.1% to 5.0% among variables, including family income (n = 90), type of high school (n = 17), number of days not supervised by a guardian after school (n = 53), experience of health-related education (n = 1), and mobile dependency (n = 83). Using the traditional listwise deletion method, about 6–10% of the 1,723 samples would have been excluded.
A frequency analysis of individual characteristics and differences in covariates by smoking trajectory group was conducted. A group-based approach was used to identify distinct developmental trajectory groups of tobacco use. For the GBTM analysis, data from wave 1 (which did not include a survey of smoking status) were excluded. PROC TRAJ, a macro in SAS software (SAS Institute, Cary, NC, USA), was used for the GBTM analysis. Because most of the subjects were nonsmokers and the distribution of the outcome data was skewed, we used ZIP for the smoking trajectory analysis [9, 29]. Furthermore, to identify life-course trajectory groups of mobile phone dependency from adolescence to young adulthood, we used censored normal distribution (CNORM), which is appropriate for continuous data. To identify the optimal number of trajectory groups and best-fitting model, we used Bayesian information criterion (BIC) values as a measure of goodness-of-fit. We selected the model with the lowest negative BIC value [9, 30].
Lastly, multinomial logistic regression analyses were conducted to identify the associations between covariates and smoking trajectory groups. We applied the longitudinal weights from the last survey wave to adjust for attrition and sample non-representativeness. Using the PROC SURVEYLOGISTIC procedure of SAS, the weighted odds ratios (OR) between covariates and smoking trajectories were calculated. In all multinomial regression analyses, multiple imputation was performed and average estimates for five imputed data sets were obtained. Using the PROC MI procedure in SAS, we created five imputed data sets. For each set, multinomial logistic regression analyses were conducted using PROC SURVEYLOGISTIC, and PROC MIANALYZE was used to combine the estimates and generate final averaged parameter estimates [28]. Model 1 examined the associations between predictors at baseline and smoking trajectory groups. Model 2 controlled for predictors at baseline, except the trajectory groups of mobile phone dependency. The trajectory of mobile phone dependency was included in Model 2. To compare the results, complete case analyses were also conducted. Further information on the missing covariates is given in the Supplementary Material (Supplementary Tables S2, S3 and S4).