Sleep quality, ability to concentrate, and fruit consumption suce to predict young adult wellbeing: Findings from Bayesian predictive projection

The search for reliable lifestyle predictors of psychological well-being is a key research pursuit in psychology and public health. Researchers often gather data on many candidate predictors and then are faced with the dicult task of variable selection – selecting only few “relevant” predictors from the large set of candidates. Many statistical methods that are currently being used for variable selection suffer from known shortfalls. Fortunately, there are newer methods available. The aim of the study is to illustrate the use of predictive projection – a novel Bayesian method for variable selection with many attractive properties – and identify the predictors of psychological well-being in a sample of young adults.

An important challenge when investigating the predictors of well-being is how to identify the truly important and reliable predictors from a large set of candidate predictors. Ideally, the goal of a researcher is to construct the smallest, most parsimonious model with few relevant predictors that can explain as much of the variance in well-being as possible. However, with this goal in mind, researchers run into two complex statistical issues -variable selection and over tting.
Variable selection is the process of nding a small subset of "relevant" predictors that can maximally predict the outcome measure (Heinze et al., 2018;Höge et al., 2018;Müller et al., 2013). Traditional regression modelling assumes that the exact set of the predictors to be included in the model is known apriori. However, often the exact set of relevant predictors is not known a-priori (as is the case when it comes to searching for predictors of well-being) and so the issue of variable selection commonly arises across many elds (Heinze et al., 2018;Kadane & Lazar, 2004;van Erp et al., 2019). Under perfect circumstances, variable selection should be performed by tting all possible models (all possible combinations of predictors) and selecting the best one. However, with an increasing number of predictors and model complexity, tting all possible models quickly becomes computationally unfeasible -with K predictors there are 2 K possible models to t (Heinze et al., 2018). Even when an optimal subset of predictors is found, another issue becomes how to do inference with the selected model, i.e. determine the "signi cance" of model parameters, given that it is now necessary to condition on the selection procedure, and this is known as post-selection inference (Taylor & Tibshirani, 2015). Despite the fact that psychology and public health researchers have been dealing with the variable selection and postselection inference for a very long time (Heinze et al., 2018), the two issues are by no means "solved" and remain hot areas of research in statistics and machine learning (Benjamini, 2010;Taylor & Tibshirani, 2015).
When doing variable selection, one also always runs the risk of over tting -selecting predictors that t to pure noise to the data (Höge et al., 2018;James et al., 2013). Over tting occurs because extra predictors always make a model more exible, and thus even irrelevant predictors can seemingly improve a model's t (James et al., 2013). However, while over t models seem to explain the data at hand well (as indicated by e.g. high values or low root-mean-squared error; RMSE), they show poor predictive performance on new data (Hawkins, 2004). Additionally, the fact that over t models often fail to generalize to new data may be one of the less known causes of the replication crisis (Yarkoni & Westfall, 2017). Research ndings may fail to replicate not because of any ill-will of the researchers but because they are upheld by fragile trends idiosyncratic to the training data, resulting from over tting The issue of over tting is particularly relevant to psychological and public health research in that the goal often is to nd reliable and modi able individual-level predictors of well-being (or health outcomes), and the results of studies are often evaluated based on in-sample explanatory power (R 2 ).
The issues of variable selection and over tting are just as relevant in the well-being literature as in other elds. There are three traditional regression-based approaches for identifying predictors of well-being. The rst traditional approach is to t many simple regression models, regressing the various candidate predictors on well-being scores individually, and then compare the size or signi cance of the predictor slopes across these models (e.g. . The main drawback of this simple regression approach is that it can only be used assess the effect of each candidate predictor in isolation, without determining whether the effects remain after controlling for the other predictors. The second approach is to t a single large multiple regression model with all predictors and then compare the size or signi cance of the predictor slopes within this one model. However, a challenge with this large multiple regression model approach is that when two or more predictor slopes are correlated (as is almost always the case with many candidate predictors), the slope and their estimated errors can be wildly distorted due to multicollinearity (Alin, 2010;Graham, 2003), and thus become uninterpretable. The last traditional regression approach is to t a single multiple regression model with several predictors, manually picking the predictors based on theoretical considerations or using an automatic stepwise variable selection, often based on p-values (e.g. Boyes et al., 2011;Etxeberria et al., 2019;Keyes & Simoes, 2012;Sletta et al., 2019). Even with strong theory, however, there are many possible choices as to which predictors to include and exclude (Gelman & Loken, 2013), and, while it may not be common knowledge, stepwise selection does not account for the selection process and as a result fails to adequately control type-I error and tends to over t (Flom & Cassell, 2007;Kuhn & Johnson, 2013;Smith, 2018;Steyerberg et al., 2001;Whittingham et al., 2006).
Predictive projection feature selection is a novel method for variable selection within the Bayesian framework (Piironen et al., 2018) that helps to stem issues related to variable selection and over tting.
The method consists of two steps. At the rst step, a exible reference model is tted, using all candidate predictors. The reference model can be a "simple" multiple regression model, or a more complex, exible model such as random forest, or even a neural network (Afrabandpey et al., 2019;Piironen et al., 2018). At the second step, smaller submodels with subsets of the candidate predictors are sequentially tted via projection to approximate the reference model's predictions. After that, it is possible to select the smallest possible submodel which makes predictions that are "similar enough" to those of the reference model (i.e. within some uncertainty bound such as standard error; Piironen et al., 2018). To avoid over tting, the submodels can be compared with the reference model on cross-validated prediction accuracy, most conveniently via the Bayesian approximation of leave-one-out cross-validation -Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO; Vehtari, Gelman, & Gabry, 2015;. The key advantages of predictive projection is that it can select a parsimonious model with good predictive accuracy, is robust to over tting, and also has a valid posterior and thus it can be used for inference just like any other Bayesian model (Piironen et al., 2018).
In the present study, we used predictive projection feature selection to construct a parsimonious model that would predict well-being in a sample of 791 young adults. The young adults had participated in the 2013 and 2014 waves of the Daily Life Study, a daily diary study of the health and well-being of young adults in Dunedin, New Zealand. Well-being was surveyed every day for 13 days using the 8-item Flourishing Scale (Diener et al., 2010) adapted for daily measurement.. We rst tted a large Bayesian multiple regression reference model, predicting average daily ourishing cross-sectionally from 28 candidate well-being predictors, including demographic and background variables (e.g., age, gender, childhood SES, BMI), and an extensive range of lifestyle variables also assessed in the daily diary related to stress (stress, most stressful event of the day), somatic symptoms (e.g. tiredness, lack of ability to concentrate), diet-related variables (e.g. fruit and vegetable consumption), and health habit variables (e.g. sleep quality/quantity, exercise). Then, we used predictive projection to nd a smallest possible submodel that would predict average daily ourishing almost as well as the reference model.

Participants and procedure
Participants came from the Daily Life Study, a daily diary study assessing the psychological well-being and daily health habits in a large sample of young adults living in Dunedin, New Zealand (total n = 1470). The majority of the participants were University of Otago students, and were all between 17-25 years old. The participants had rst completed an initial survey on demographics, and then were tracked across 13 days via an online daily diary completed each night between 3-8 pm. Participants also attended a clinic visit where they were measured for height and weight to compute BMI. Figure 1 presents a ow diagram of participant selection. The Daily Life Study was run across four years from 2011 to 2014; however, we selected only participants from the 2013 and 2014 years because the assessment of daily health habits during those years was more extensive (with more relevant variables measured). We further excluded those participants who had provided fewer than 7 diary records (out of 13 possible) across the course of the study (n = 25). Before data cleaning, our sample consisted of 796 participants (72.4% female), aged between 17-25 years (m = 19.72, sd = 1.73). The majority of participants were New Zealand European/pākehā (77.5%), followed by Asian (10.8%), and Māori/Paci ca participants (5.5%).
Participants of all other ethnicities made up only 6.2% of the sample and were thus aggregated into one category.
Measures, data cleaning, and preprocessing A list of measures with reliabilities and descriptive statistics can be found in the Supplementary Materials. We selected the following demographic variables (n = 5) from the initial survey: age, gender, ethnicity, self-reported childhood socioeconomic status (SES; Griskevicius, Delton, Robertson, & Tybur, clinic visit. From the daily survey, we selected the daily ourishing scale items (Diener et al., 2010), as well as all stress self-assessment, somatic self-assessment, diet-related, and health habit variables (total n = 22) that were measured for the entirety of 2013-2014 and could be theoretically linked with ourishing. While most of the 22 variables assessed the participants' state on the day of reporting, some of the variables inquired about the participants' state the night before. Of the 22 variables, the stress selfassessment variables (n = 2) were: felt stressed today, most stressful event today. The somatic selfassessment variables (n = 5) were: felt tired today, felt rundown today, felt cold or u today, had hangover today, and had trouble concentrating today. The diet-related (n = 11) variables were: servings of fruit today, servings of vegetables today, servings of sweets today, servings of soft drink today, servings of chips today, servings of fruit last night, servings of vegetables last night, servings of sweets last night, servings of soft drink last night, servings of chips last night, and standard drinks of alcohol last night. The health habits (n = 4) were: hours slept last night (sleep quantity), felt refreshed after waking up today (sleep quality), minutes physically active today, and minutes spent in nature today. Most variables were measured on a 5-point Likert scale, (0 = not at all, 1 = a little, 2 = somewhat, 3 = moderately, 4 = very). Drinking alcohol, sleep quantity, time spent in nature, physical activity, and the diet related variables were freely reported in the speci ed units (standard drinks, hours, and minutes, respectively, and servings).
Across all daily variables, we dropped the rst two days of observations to account for initial elevation bias -the tendency of participants to over-report symptoms in the beginning of longitudinal studies (Shrout et al., 2018). To make sure that dropping the rst two days did not bias our results, we ran a sensitivity analysis (see Supplementary Materials). Additionally, there were 0.5% of missing values across all 29 daily variables (8 ourishing items + 22 lifestyle habits). All variables except for minutes spent in nature today were missing less than 1.5% of all values. The minutes spent in nature variable was missing 4.35% of all values, and there were several participants for whom all or majority (> 50%) of values were missing. Because of the number of missing values, we decided to drop it after conducting sensitivity analyses to show that the variable inclusion/exclusion did not affect predictive projection feature selection (see Supplementary Materials). After excluding the time spent in nature variable, the daily (n = 21) and demographic (n = 5) made up a total of 26 predictor variables. Because the ethnicity variable contained four levels which added two additional dummy predictors, there were in the end 28 predictors.
We did not impute 0.4% of missing values that were left after excluding minutes spent in nature today variable, as the values for each variable would be averaged across days for each participant. As for demographic variables, there were four participants missing their BMI information, and one participant missing two out of the three of the childhood SES items. These ve participants were dropped from the analysis. Thus, the nal sample consisted of 791 participants.
The eight ourishing scale items were averaged into one daily ourishing variable. Following that, all variables from the daily survey were averaged across days (including the newly created daily ourishing variable). Additionally, childhood SES was averaged across items as well. Finally, all continuous variables were centred and scaled to 1-unit standard deviation.

Statistical analyses and modeling
Prior to tting any models, we randomly split our data into training (75%; n = 593) and test set (25%, n = 198). The training data was used to t all models. The test data was used to validate the models' predictions on an independent dataset. All relevant statistics are reported for both the training and test data.
We rst tted a Bayesian multiple regression as the reference model, predicting average daily ourishing from all 28 candidate well-being predictors (21 daily predictors, 5 demographic predictors, 2 additional dummy predictors for ethnicity). We used weakly informative normal priors for the predictor slopes and the intercept (normal, mean = 0, sd = 1). For model standard deviation, we also used a weakly informative prior (half-normal, mean = 0, sd = 1). As for sampling, we ran four chains of 4,000 iterations, with 2,000 iterations of warm-up and 2,000 iterations of sampling each. The chains were run in parallel to speed up convergence.
After tting the reference model, we used predictive projection feature selection to t a submodel with fewer predictors that would give similar predictions to the full model. To implement predictive projection, we used the projpred package (Piironen et al., 2018). The variables were sequentially entered into the submodels using L1 search (projpred default for > 20 variables) and the submodels' predictive performance was compared using expected log predictive density (ELPD) information criterion obtained via PSIS-LOO. For the optimal submodel, we chose the smallest submodel for which the difference between its own and the reference model's ELPD was no more than 1 standard error from zero (projpred default), that is, we chose the smallest submodel which was expected to perform the same as/outperform the reference model with at least 16% probability (and perform worse with 84% probability).
Finally, we compared the predictive performance of predictive projection feature selection to ve other variable selection approaches: the original Bayesian reference multiple regression model, a frequentist multiple regression model with all predictors entered simultaneously, frequentist stepwise selection using p-values, frequentist stepwise selection using Akaike Information criterion (Akaike, 1974), and many simple regression models corrected for multiple comparison. R does not have a default function for stepwise selection using p-values so we used a publicly available R code that implements an SPSS-like stepwise selection with p-values (Meys, 2009). The p-values for the simple regressions were adjusted via Bonferroni correction. Besides the 1 SE rule projected submodel featured above, we also included in the comparison a projected submodel with cross-validated predictive performance matching the reference model, that is, the smallest possible submodel for which there was at least 50% probability that it would perform as well as/better than the reference model. In total, seven models were compared.
In the Bayesian models, we interpreted predictors as "signi cant" when their 95% credible interval did not overlap zero, and, in the case of predictive projection, when they were selected into the optimal submodel. For the frequentist models, we used the conventional but arbitrary alpha < 0.05 threshold. To evaluate the performance of all models, we used root mean squared error (RMSE) and Bayesian .

Results
The reference model converged well, with no divergent transitions and good values (all ≈ 1), indicating that the chains mixed well (Gelman, Andrew & Rubin, 1992). All parameters had a good effective sample size (all > 5,000). As a posterior predictive check, we compared the distribution of observed daily ourishing to that of 500 simulated datasets drawn from the posterior predictive distribution, and found that the observed data matched the densities of the simulated data fairly well (see Supplementary Materials). There were few outliers with very low average daily ourishing in the observed data, however, when we evaluated the model via Pareto-smoothed importance sampling leave-one-out cross-validation (PSIS-LOO cv; Vehtari, Gelman, & Gabry, 2016), we found no evidence of these observations having a disproportionate in uence on the model t, as indicated by satisfactory Pareto-k values (all "good", k < 0.5; see Supplementary Materials). Visual checks of predicted values vs residuals did not indicate any heteroscedasticity or gross non-linear trends (see Supplementary Materials). On the training data, the reference model had a Bayesian R 2 of 0.359 (0.308 -0.405, 95% credible interval; CI) and a model standard deviation/RMSE of 0.821 (0.774 -0.870 95% CI), indicating a moderately good predictive performance.
Using the predictive projection, we found that a submodel with only 3 predictors made predictions that were "similar enough" to those of the reference model with all 28 predictors, and the sensibility of this rule-based approach was con rmed with a visual check of the feature selection trajectory (see Figure 2A).
On the training data, the optimal submodel had a Bayesian R 2 of 0.267 (0.213-0.323, 95% CI) and a model standard deviation/RMSE of 0.878 (0.826 -0.931 95% CI). The optimal submodel's predictions were strongly correlated with the reference model's predictions (Pearson r = 0.897), and this was again con rmed by visual checks (see Figure 2B).
We tested the submodel's performance on holdout test data. The submodel performed well on the test data (slightly worse than on the training data but within 95% CI bounds), with an observed Bayesian R 2 of 0.253 and an observed RMSE of 0.884 (see Figure 3B).
The results of the comparison of different variable selection methods are shown in Table 1. The two multiple regression models had the best out-of-sample-predictive accuracy as assessed by test data R 2 of 0.33. After that, the stepwise selection based on AIC had the best out-of-sample predictive accuracy, followed by projection for which the mean expectation matched the reference model, stepwise selection based on p-values, and projection with the 1 SE rule. One simple regression (felt refreshed after waking up) had a high out-of-sample R 2 of 0.218, however the other simple regressions had much lower predictive accuracy, with the mean R 2 of 0.028 and sd of 0.046. Projection lead to the most parsimonious inference, however, with the 1 SE projection submodel selecting 3 predictors and the matched projection submodel selecting only 6 predictors, fewer than any of the other methods. Overall, larger models tended to have better out-of-sample predictive accuracy, however, the projection methods resulted in more parsimonious models -the multiple, stepwise, and simple regressions variable selection approaches all resulted in greater number of "signi cant" predictors. Across all models, felt refreshed after waking up and had (less) trouble concentrating were universally selected predictors. Additional predictors that tended to be selected relatively often across models included gender: female, servings of fruit today, (fewer) servings of sweets last night, (fewer) servings of sweets today, (fewer) servings of soft drink last night, felt (less) tired today, and lower BMI.

Discussion
Using predictive projection feature selection, we found that only three of the 28 candidate well-being predictors were su cient to approximate the predictions of a large reference model with all 28 predictors. Speci cally, participants who reported feeling more refreshed after waking up, having less trouble concentrating, and eating more servings of fruit scored highest in their average daily ourishing. We tested the optimal 3-predictor submodel on test data and found that it predicted the unseen participants' average daily ourishing well, with performance measured by observed RMSE only slightly worse than on the training data. Lastly, we also found that when comparing predictive projection to other variable selection methods, we found that generally, larger models tended to have better predictive accuracy on test data, however, predictive projection tended to select more parsimonious models, and the projection submodel matched to the reference model performed better than stepwise selection based on p-values, with fewer predictors.
The fact that only three predictors were su cient to predict almost as much variation in average daily well-being as the reference model with all 28 candidate predictors does not suggest that the left out 25 predictors have no relationship with ourishing. In fact, with simple regression, we saw that 10 predictors had a statistically signi cant association with ourishing, after Bonferroni correction was applied. However, whereas the predictors selected by simple regressions only described the relationship between themselves and ourishing in isolation, the three predictors selected by predictive projection were su cient to explain most of the variance in average daily ourishing while simultaneously controlling for each other. As such, our results suggest that the three selected predictors are the strongest predictors of average daily ourishing, at least among groups of related predictors. For example, inasmuch as diet is related to ourishing, our results suggest that fruit consumption may be the strongest indicator of good diet and ourishing, as indicated by the fact that it was the earliest from the group of diet-related predictors in the feature selection trajectory, and was the only diet-related predictor that was present in the optimal submodel. Likewise, while there were several correlated candidate predictors related to fatigue and somatic issues, the fact that having (less) trouble concentrating was the rst and only predictor from this group selected into the optimal submodel suggests that it may be the strongest somatic predictor of well-being.
The projected submodel based on the 1 SE rule had worse predictive accuracy on test data than the other variable selection methods, however, we still chose to do inference with this model. The reason why we chose the 1 SE rule submodel over the other projection submodel that was matched on expected predictive accuracy is that the 1 SE rule leads to simpler models, more robust inference, and, thus, greater expectable reproducibility. The choice of rule for the predictive projection submodel size is a trade-off between explanation and robustness (Piironen et al., 2018), similar to the 4-to-1 trade-off between type-I and type-II error that underlies the conventional 0.05 alpha and 0.8 power recommendations (Chase & Tucker, 1976;Cohen, 1965;Lakens, 2013). Additionally, it is important to highlight that while the two stepwise models had higher predictive accuracy on the test data, these models cannot be used for inference as the p-values and standard errors in the selected models are not adjusted for the selection procedure and as such do not adequately control for type-I error (Flom & Cassell, 2007;G. Smith, 2018). The marginal credible intervals from the projection, on the other hand, come from a valid posterior distribution (Piironen et al., 2018) and thus can be used for inference.
The rst and therefore strongest predictor selected into the submodel was sleep quality -how refreshed a participant felt after waking up, on their average day. Sleep quality has been consistently shown to be one of the strongest predictors of well-being, especially in young adults, with poor sleep quality being strongly linked to poor mental health outcomes including symptoms of depression (Pilcher et al., 1997;Ridner et al., 2016;Wallace et al., 2017;Wilson et al., 2014). Further, it is important to note that while sleep quality is often shown to be an important predictor of well-being, sleep quantity is not (Pilcher et al., 1997;Wallace et al., 2017). Likewise, in our study, while sleep quality was entered into the submodels early along the feature selection trajectory, re ecting its predictive strength, sleep quantity was only entered long after any improvement in predictive accuracy was shown. Our ndings thus strengthen past evidence which shows that sleep quality (but not sleep quantity) is one of the strongest predictors of young adult well-being.
The second predictor selected into the submodel was how much trouble concentrating a participant had on their average day. More trouble concentrating was associated with lower well-being. Having trouble concentrating is one of the key symptoms of the major depressive disorder (MDD), and is often assessed by diagnostic scales, such as the popular CES-D scale (Kohout et al., 1993) and the DSSS scale (Hung et al., 2006). Less research has linked lack of concentration to lower well-being per se. However, there is evidence that "mind wandering" -thinking about things not present in the immediate environment -is associated with less happiness (Killingsworth & Gilbert, 2010;Smallwood & Schooler, 2015). As such, lacking ability to concentrate could be a proxy measure for MDD symptoms and mind wandering, and may potentially be a stronger predictor of well-being than other somatic symptoms measured in our study, such as feeling tired or run down.
The third and last predictor entered into the submodel was daytime fruit consumption. As was discussed previously, the fact that daytime fruit consumption was the rst diet-related predictor to be selected in the feature selection trajectory and the only diet-related predictor that made it into the submodel suggests that it may be one of the strongest indicators of diet quality, as it relates to ourishing. Fruit and vegetable consumption has been previously shown to predict ourishing (Blanch ower et al., 2013;Conner et al., 2015Conner et al., , 2017Hong & Peltzer, 2017;Mujcic & Oswald, 2016;. There were other diet related predictors entered into the submodel early along the feature selection trajectory, namely (lower) night-time softdrink consumption and daytime vegetable consumption as the 4th and 5th predictors, respectively. However, based on the 1 SE rule, fruit consumption only was su cient to approximate the reference model's predictions. The most straightforward interpretation of this fact is that fruit consumption may contain enough information about the quality of one's diet (as it relates to ourishing) to make other diet-related predictors redundant. Additionally, there is evidence that raw fruit and vegetables are a stronger predictors of well-being than cooked fruit and vegetables (Brookie et al., 2018), and since fruit is more often eaten raw, general fruit consumption may be a stronger indicator of a good diet than vegetable consumption. Lastly, given that poor diet quality is associated with ill-being (Jacka et al., 2010), and that our dependent variable ( ourishing) may capture more positive aspects of mental health and well-being, it is possible that some of the predictors associated with unhealthy diet would have greater predictive utility if the dependent variable speci cally focused on illbeing, such as depressive symptoms. Be it as it may, our ndings suggest that fruit consumption may be one of the strongest diet-related predictors of ourishing.
Interestingly, despite the fact that physical exercise has often been linked to well-being in the past literature (Diener et al., 2018;Goodwin, 2003;Hassmén et al., 2000;, exercise was not selected into the optimal submodel in our study, and was in fact added very late along the feature selection trajectory as the 20th predictor. Additionally, as a predictor in simple regression, the coe cient for exercise did not survive Bonferroni correction either. One possible explanation for why exercise was not associated with ourishing in our sample may be that the sample was relatively homogenous (all young university students), and so the range of exercise across participants may have been too narrow to predict ourishing in a meaningful way. On a related note, the cited studies on the associations of well-being and exercise investigated older populations, and so it is possible that the association between exercise and well-being becomes more pronounced over the course of one's life, and might not be as strong initially in young adults.
The fact that no demographic predictors were selected into the optimal submodel is somewhat surprising. Gender was the rst demographic factor along the selection trajectory, entered the submodels as the sixth predictor, and had a moderately large point estimate for slope in the reference model (with female gender predicting greater ourishing). The other categorical demographic predictor, ethnicity, had one of its four dummy variables (ethnicity: Māori/Paci c Islander) entered as the 16th predictor. One possible explanation for why gender and ethnicity were not included in the optimal submodel is because there was large uncertainty around their estimates in the reference model, as evidenced by wide marginal posterior credible intervals (see Supplementary Materials). The width of the credible intervals likely re ected the fact that the sample was unbalanced (72.6% female, 27.4% male; 77.6% European/Caucasian, 10.6% Asian, 5.4% Maori/Paci c Islander, 6.2% Other). Alternatively, some of the variance in ourishing explainable by gender and ethnicity may have been already explained by any of the ve selected predictors. Lastly, while some studies do show evidence of higher ourishing in women (Piqueras et al., 2011), there is also evidence suggesting that gender differences in ourishing may not be very large (Huppert & So, 2009). However, in our study, gender did seem to play some role as indicated by the moderately large point estimate for gender in the reference model (see Supplementary Materials).
Rather, the more likely explanations for why gender and ethnicity were not selected into the optimal submodel is that there was large uncertainty around their estimates and they did not improve the out-ofsample prediction of average daily ourishing above and beyond the preceding predictors.
The three continuous demographic predictors -age, childhood SES, and BMI -were not selected into the optimal submodel either. All three predictors had relatively narrow marginal credible intervals in the reference model (see Supplementary Materials), and while the marginal CIs of age and SES overlapped with zero, the CI of BMI did not. A potential explanation for why BMI was not included in the optimal submodel may be its moderately weak correlation with fruit consumption, that is, the bulk of ourishing variance explainable by BMI may have already been explained by fruit consumption. It is interesting that neither age nor SES were selected into the submodel, however, a potential explanation of that fact may be that our sample was relatively homogenous regarding these two variables, as it consisted of young university students.
Our ndings have several practical implications for the study of well-being and policy. First, we employed rather conservative variable selection and model validation methods, such as the PSIS-LOO crossvalidation and the use of the hold-out data set. Thus, we believe that the results of our study should be safe from over tting, and in turn more robust and replicable. Second, using predictive projection, we were able to select a very parsimonious model with few theoretically relevant predictors and show good predictive accuracy on a hold-out sample. Thus, we have demonstrated that predictive projection can be a very powerful tool when dealing with the issue of selecting relevant predictors from a set of many potential candidates. We believe that predictive projection may be especially useful in of the study of health and well-being, where the issue of variable selection arises frequently. Lastly, our study identi ed potential targets for future interventions. While poor ability to concentrate is more likely to be a symptom rather than a cause of low ourishing and may be di cult to change systematically, sleep quality and day-time fruit consumption are modi able health habits that could be targeted in future studies.
Our study has several limitations. First, as was mentioned earlier, the fact that only three variables were selected into the optimal submodel does not mean that the other predictors are not related to well-being. Therefore, we cannot conclude that the three predictors are the only predictors of ourishing, only that they are su cient to predict it with similar degree of accuracy as all 28 predictors. Second, given that our data is observational, we cannot make causal claims about the predictors' in uence. Inasmuch as we say that a predictor predicts higher or lower average daily ourishing, we mean that the predictor has a positive or negative association with ourishing and, that the model with the tted predictors has been shown to perform well on unseen data. However, correlational data can be used in an exploratory way to determine possible targets of intervention in follow up research. Third, our sample consisted almost entirely of young, college-age adults (age-range 17-25 years), and thus some of the ndings may not generalize to an older (or younger) population. For example, young adults tend to be at an increased risk for poor sleep quality (Lund et al., 2010), and thus it is possible that the in uence of sleep quality on wellbeing might be more pronounced in this population. It may be interesting to apply predictive projection on well-being and health habit data from other age cohorts to see whether there are age-related differences in the su cient predictors of ourishing in future research. While our data comes from a microlongitudinal daily diary study, we only applied our analyses to the data cross-sectionally. There are several reasons for why we did not analyse the trends within participants. First off, predictive projection feature selection is not yet implemented for mixed effects models (Piironen et al., 2018) and so we do not have the ability to build a parsimonious within-participants model in the same way as we did across participants. Second, there is no guarantee that the three strongest cross-sectional predictors of ourishing across participants will also be the strongest predictors of ourishing within-participants -in fact, it is rather unlikely (Simpson's paradox; see Wagner, 1982). As such, we believe that a proper repeated measures analysis using mixed effects models is outside of the scope of this article and may require more involved, theory-driven manual modelling and model comparison.

Conclusion
Using predictive projection feature selection, we were able to accurately predict average daily ourishing in young adults using just the information about how refreshed they felt after waking up, how much trouble they had concentrating, and how many servings of fruit they ate on their average day. That is, we were able to approximate the predictive accuracy of a large model with 28 candidate predictors with a much smaller submodel containing only the above three predictors. Our ndings reinforce and extend our understanding of modi able behaviours that might contribute to well-being, in that sleep quality is more strongly associated with better well-being than sleep quantity, that having trouble concentrating is related to lower well-being, and having good dietary habits (as indicated by fruit consumption) is related to higher well-being. Moreover, our ndings suggest that having trouble concentrating may be one of the strongest indicators for poor outcomes among somatic-related predictors, and that fruit consumption may be one of the strongest indicators for good outcomes among diet-related predictors. We have used conservative methods to arrive at our results, and as such they should be robust to over tting and more replicable. Finally, while the search for predictors of well-being is challenging, we believe that it can be greatly helped by employing the new statistical methods used in the present paper. Availability of data and materials

Abbreviations
The Daily Life Study dataset contains sensitive demographic information that could potentially be used to identify individual participants. De-identi ed dataset will be made available from the corresponding author upon reasonable request. All R code used for the analysis is available as part of the Supplementary Materials. predictors, and the names of the significant predictors (significant predictors ranked by absolute slope size). Figure 1 Flow diagram of the data selection procedure

Supplementary Files
This is a list of supplementary les associated with this preprint. Click to download. dailylife5.pdf