The usefulness of “corrected” body mass index vs. self-reported body mass index: comparing the population distributions, sensitivity, specificity, and predictive utility of three correction equations using Canadian population-based data

Background National data on body mass index (BMI), computed from self-reported height and weight, is readily available for many populations including the Canadian population. Because self-reported weight is found to be systematically under-reported, it has been proposed that the bias in self-reported BMI can be corrected using equations derived from data sets which include both self-reported and measured height and weight. Such correction equations have been developed and adopted. We aim to evaluate the usefulness (i.e., distributional similarity; sensitivity and specificity; and predictive utility vis-à-vis disease outcomes) of existing and new correction equations in population-based research. Methods The Canadian Community Health Surveys from 2005 and 2008 include both measured and self-reported values of height and weight, which allows for construction and evaluation of correction equations. We focused on adults age 18–65, and compared three correction equations (two correcting weight only, and one correcting BMI) against self-reported and measured BMI. We first compared population distributions of BMI. Second, we compared the sensitivity and specificity of self-reported BMI and corrected BMI against measured BMI. Third, we compared the self-reported and corrected BMI in terms of association with health outcomes using logistic regression. Results All corrections outperformed self-report when estimating the full BMI distribution; the weight-only correction outperformed the BMI-only correction for females in the 23–28 kg/m2 BMI range. In terms of sensitivity/specificity, when estimating obesity prevalence, corrected values of BMI (from any equation) were superior to self-report. In terms of modelling BMI-disease outcome associations, findings were mixed, with no correction proving consistently superior to self-report. Conclusions If researchers are interested in modelling the full population distribution of BMI, or estimating the prevalence of obesity in a population, then a correction of any kind included in this study is recommended. If the researcher is interested in using BMI as a predictor variable for modelling disease, then both self-reported and corrected BMI result in biased estimates of association.


Background
Obesity's rise in prevalence over the past 30 years [1], coupled with knowledge of its public health burden [2][3][4][5][6], has opened debate over the best way to measure adiposity in populations. The body mass index -BMI [weight (kg)/ height 2 (m 2 )] -is a common measure in population-based surveys: it is relatively inexpensive, simple, and nonintrusive. Notwithstanding the limitations of BMI as a measure of adiposity [7][8][9][10], it continues to be recommended by the World Health Organization [10] as the appropriate criteria to assess obesity status in populations due to the high correlation of high BMI with excess body fat and poor health outcomes [1].
The practice of gathering self-reported height and weight data from survey respondents has raised concerns about the inaccuracy of self-reported data. Where comparison to measured weight is possible, studies have demonstrated misreporting (both under-and over-reporting) of weight, which varies by sex, age, race/ethnicity, and BMI [11][12][13][14][15][16][17][18][19]. The consequence of this misreporting is that the potential exists for large bias in estimates of prevalence and measures of association in studies that use self-reported BMI [7].
By using population-based datasets which contain both measured and self-reported values of BMI for the same individuals, it is possible to develop statistical adjustments that bring self-reported values of BMI closer to measured values. These correction methods can then be applied to datasets that only contain self-reported height and weight, resulting in a corrected height and weight and thus improving the usability of those datasets. Attempts to develop correction methods for use with Canadian data have resulted in two important papers. Connor Gorber et al. [20] used data from the 2005 Canadian Community Health Survey (CCHS) and, after considering many potential covariates, concluded that the most parsimonious and effective correction equations for men and women rely on one variable: self-reported BMI. Shields et al. [21] used data from the 2007-2009 Canadian Health Measures Survey to develop a correction equation, which was then used to correct self-reported BMI measures from the 2008 CCHS. This second paper likewise concluded that a correction equation using only self-reported BMI is appropriate for correcting BMI values. In both studies, additional covariates did not add enough predictive power to the models to justify the added complexity of including them in the correction equations.
The practice of correcting based on BMI alone (i.e., no other covariates) is appealing in its simplicity. Other authors have included covariates in an effort to increase the accuracy of their correction equations; including age [22], leisure time physical activity, self-reported health [14], education level [23], and ethnicity [24], among others. These studies find that including additional covariates in a correction equation can increase the correction equation's accuracy when adjusting BMI or categorizing individuals by obesity status (e.g., BMI > 30). In this paper we did not include additional covariates to maximize comparability with the existing Canadian correction equations.
A separate issue is whether corrections should be based on BMI, or weight alone. Studies from the United States [11,13], as well as from Sweden [14], and France [15] verify that average misreporting increases as measured weight and/or BMI increases, suggesting that correcting on either BMI or weight may be acceptable. Correcting on weight only raises the question of whether or how to deal with height. It is well-documented that height is subject to bias in the elderly, i.e., those over age 60 [11,22] and that bias in height has been shown to be substantial [25,26]. Furthermore, international evidence shows that height bias also exists among those under age 60, is strongest for the shortest males and for females, and that bias might be changing over time [23,24,27]. Correction equations developed by other authors have directly corrected for either BMI as a whole [14,20,21,23,26], or height and weight separately and calculated a corrected BMI from those values [12,23,24] to incorporate both weight and height. However, there is evidence to suggest that average bias in self-reported height among adults as a whole is quite small: based on five national surveys conducted in Canada and the United States on males and females aged 18 to 74, the range of average bias in self-reported height ranged from 0.2 cm to 1.4 cm [28]. Because the bias in working age adults is almost wholly from bias in self-reported weight [11][12][13][14][15][16][17]26], it is worth considering the value of correcting on weight only rather than overall BMI for the general population. This paper will consider such a correction.
The purpose of this study is to evaluate the usefulness of existing and new correction equations for BMI in population-based research. To accomplish this, we have three objectives: 1) compare the self-reported and corrected BMI distributions; 2) compare self-reported and corrected BMI to measured BMI based on sensitivity and specificity of measured obesity; and 3) compare self-reported and corrected BMI to measured BMI in regression models of various health conditions, in terms of statistical significance, coefficient magnitude, and direction of the coefficient (above or below the measured coefficient). We compare three correction equations: first, an existing Canadian correction equation [20] (a correction that used self-reported BMI, so will be referred to as the "BMI-only" correction); second, a new correction equation developed here which corrects values of weight only; and third, another correction equation developed here which is a computationally simpler version of the weight-only correction. The term "weight-only" means that we use a corrected value for weight but self-reported height to correct overall BMI.

Data
We used data from two cycles (2005 and 2008) of Statistics Canada's Canadian Community Health Survey (CCHS). The CCHS is a repeated cross-sectional survey that provides socio-demographic and health information for individuals living in the ten Canadian provinces. The CCHS uses a multi-stage cluster sampling procedure to derive a sample that is representative of the Canadian population, excluding those that live in institutions, on First Nations reserves, on Canadian Forces bases, and in certain remote areas; the CCHS is representative of approximately 98% of the Canadian population over age 12 [29]. The overall response rates for households were 87.0% for the 2005 CCHS was and 85% for the 2008 CCHS [21].
In the 2005 and 2008 iterations of the survey, a random sub-sample of individuals was asked to self-report their height and weight; those values were subsequently measured by the interviewer. The respondents were not told they would be measured when they self-reported their height and weight. The response rate for the subsample was 64.2% for 2005 CCHS and 59.7% for the 2008 CCHS; no information is available for reason of refusal to be measured [21]. We focused on individuals aged 18 to 65 for whom both self-reported and measured BMI data were available. We excluded adults over age 65, because of observed over-reporting of height in the over 65 age group [11,25,26].
We used the master file versions of the CCHS, accessed through the Research Data Centres (secure data laboratories) program in Canada. Access was granted by Statistics Canada via the Canadian Research Data Centre Network (CRDCN) through a standardized application process. All analyses incorporated sampling weights as directed by Statistics Canada and were conducted in Stata 11.2. Ethics approval for this project was obtained from the University of Calgary's Conjoint Health Ethics Research Board (Ethics ID: E-23704).

Procedure
Below, we first describe the development of the weightonly correction equations. Then, we describe the procedure for achieving our two objectives that compare self-reported and corrected BMI to measured BMI. All analyses, including those involved in developing the correction equations, are conducted for males and females separately.
The justification for modelling misreporting based on weight only is best shown graphically. Additional file 1 shows three quantile-quantile plots comparing measured BMI to three other BMI measures: self-reported BMI (graph a); a BMI constructed from self-reported weight and measured height (graph b); and a BMI constructed from measured weight and self-reported height (graph c), for males and females. The graphs show the average BMI for each percentile of measured BMI against the average BMI for each percentile of the BMI measures containing at least one self-reported value. Note that the quantile-quantile plots in graphs a and b look similar, that is, there is very little improvement in modelling measured BMI by replacing self-reported height with measured height. Graph c shows that there is a large improvement in modelling measured BMI using only measured weight, which indicates that the majority of the measurement error of the distribution of self-reported BMI comes from self-reported weight, not height, in our sample. Thus, the weight-only correction should be addressing the main source of measurement error in the sample of working age individuals.

a) Development and estimation of weight-only correction equation
We can model the self-reported value of BMI as being a function of an individual's measured (true) BMI multiplied by a misreporting term: That is, an individual's self-reported BMI (which is their self-reported weight (W SR ) over their self-reported height squared (h 2 SR ) is equal to their measured BMI and a misreporting term that is made up of random noise, ε, and measured weight W M . Equation (1) was chosen because, with the right parameters a , we can mimic the nonlinear relationship between measured and self-reported BMI described in the literature, whereby the discrepancy increases across measured BMI at an increasing rate due to increases in weight. By including weight in the exponential term, we allow the difference between self-reported BMI and measured BMI to grow at an increasing rate as measured weight increases, a relationship that is supported by published literature [11][12][13][14][15][16][17]. A linear error term would not accurately capture that association.
Next, we can take the natural logarithm of both sides of the equation to reduce the BMI relationship into its constituent parts, which can then be rearranged into the following equation: where ln(W SR ), the natural logarithm of an individual's weight, is a function of measured weight and the ratio of self-reported to measured height (the relative misreport in height). This is an equation for which we can estimate regression parameters, using a sample of individuals with both measured and self-reported height. The estimated regression equation is: Where i denotes individual values. If the ratio of selfreported to measured height is not related to selfreported weight on average (which we would expect from the literature on non-seniors), thenβ 3 would be statistically equal to zero, leaving us with the equation: The restriction necessary for equation (3) to be an appropriate step, thatβ 3 is statistically equal to 0, was tested using our dataset during the model building exercise and is not an assumption. Specifically, using a t-test, we could not reject the null hypothesis that the coefficientβ 3 was equal to 0 (p-value of 0.609 in males and 0.559 in females). Removal of the ratio of self-reported to measured height did not impact the values of the other coefficients in the model.
Equation (3) can be rearranged to put measured weight in terms of self-reported weight and estimated parameters (constants): One can solve this equation numerically using a dataset that contains both measured and self-reported weight. Using self-reported values, we back-solved for measured weight by iteratively substituting in values for measured weight until the equality held at a predetermined tolerance (number of decimal places, in our case 0.001) to solve for a corrected weight in place of measured weight.
The parameter associated with the measured weight term in misreporting,β 2 in equation (2), could be quite small in practice.β 2 could be quite small because it is the coefficient of a variable that is measured in kilograms while the other terms in the equation are measured in the natural log of kilograms. Thus, measured weight, the variable thatβ 2 is associated with, need only have a small effect to make a large impact on the natural log of selfreported weight. It might be the case that, for the ranges of BMI exhibited by the majority of the population,β 2 is essentially zero. To accommodate this possibility, we also consider an alternative form of equation (1): Equation (5) assumes that natural log of self-reported weight depends only the natural log of measured weight and on a constant term that captures the average effect of unobserved variables. This equation leads to a much simpler regression equation: and correction equation if we follow the same steps as outlined for equation (1): For the remainder of this paper, equation (4) will be referred to as the "weight-only correction", and equation (7) as the "simple weight-only correction". Equation (6) is similar to the equation developed by Connor Gorber et al. [20] in the sense that it does not include other covariates other than the measured and self-reported versions of the variable being corrected. The evaluation process below aims to test correction equations that would be widely usable due to their simplicity. Connor Gorber et al. [20] showed that the inclusion of other covariates such as age, perception of one's own weight, life dissatisfaction, ethnicity, and activity limitations did not importantly improve the accuracy of their models. Thus, we do not include any other variables in the correction equations to facilitate comparison with the existing recommended Canadian correction equations.
We first defined the full sample and identified outliers. The full sample consists of the pooled cross-section of the 2005 and 2008 CCHS respondents who provided measured and self-reported height and weight (n = 6294: 3208 female, 3086 male), restricted to working-age individuals (age 18 to 65) and non-breastfeeding, non-pregnant women. Outliers (n = 145) were defined as those for whom the discrepancy between self-reported and measured height or weight exceeded three standard deviations from the sex-and cycle-specific mean discrepancy. The identification and removal of outliers served to remove their potentially undue influence during correction generation.
The full sample of non-outliers was split randomly in half. One half, randomly selected, was used to calibrate the weight-only correction model parameters (the "model generating group", n = 3084). The other half ("test group", n = 3210) was used to test the model parameters to see how well the adjusted BMI values compared to measured BMI. The previously excluded outliers were included in the test group to simulate a real dataset where outliers may appear to have reasonable values of self-reported height or weight and may be impossible to identify and exclude. The regression equations of interest, (3) and (6), were run on the model generating group for males and females separately to obtain the necessary parameters for the correction equations. Male-and female-specific correction equations were developed separately to allow for known different trends in misreporting weight for males and females [12,13], and to match the convention used in other Canadian corrections [20,21] and international corrections [14,23,24]. This stratification by sex was maintained for all analyses.
The model-generating group consisted of only working age adults (i.e., 18-65), but the correction equations are appropriate to apply to adults over age 65. This was confirmed by a Chow test for equation (3) which was run separately for males and females. The mutually exclusive groups under consideration were 1) working age adults and 2) adults over age 65. The null hypothesis that the models have the same coefficients for both working age adults and adults over age 65 could not be rejected (for males the p-value was 0.167 for females the p-value was 0.292).
After obtaining estimates for the parameters, we applied correction equations (4) and (7) to the test group. This model solves for corrected weight, so to make a corrected value of BMI with the weight-only model we used corrected weight with self-reported height. For the test group, we report weight-only as well as simple weight-only corrections, along with BMI-only corrections using the Connor Gorber et al. model [20]. b) Evaluation of correction equations: comparison of BMI distributions; estimation of sensitivity and specificity for weight categories; and prediction of health outcomes The different weight-only correction equations (regular and simple), which are calibrated versions of equations (4) and (7), for males and females, are given below: Males (Weight-only Correction):

À Á
We evaluate the correction equations for usefulness in three ways: first, we compare self-reported, corrected, and measured BMI distributions. Second we compare selfreported and corrected BMI to measured BMI by estimating the sensitivity/specificity of the equations vis-a-vis BMI categories of normal weight (18 < BMI < 25), overweight (25 ≤ BMI < 30), and obese (BMI ≥ 30). Lastly, we compare self-reported and corrected BMI to measured BMI by association with selected health outcomes in regression models. We compare the regression results in terms of statistical significance, coefficient magnitude, and direction of the coefficient (above or below the measured coefficient). Assessment of association with health outcomes entailed comparing coefficients (statistical significance, magnitude, and direction) across the different corrected BMI values in regression equations modeling arthritis, heart disease, diabetes, high blood pressure, selfreported health, and activity limitation, following Connor Gorber et al. [20]. We use the BMI categories of normal weight (18 < BMI < 25), overweight (25 ≤ BMI < 30), obese (30 ≤ BMI < 35), and obese class II or higher (BMI ≥ 35) for this part of the analysis. We further restrict this part of the analysis to individuals aged 40 or older, to follow the convention set by Connor Gorber et al. [20]; however we also test the disease association models with the full age range . Throughout our analysis we do not report results for underweight individuals because there were so few underweight individuals by measured BMI. Figures 1 and 2 show the BMI distributions estimated from the measured, self-reported, and corrected BMI values. For males the corrected distributions all trend together, and all are closer to the measured distribution than to the self-reported distribution. For females, the weightonly corrections resemble the measured BMI distribution more closely than the BMI-only correction does, most notably between BMI 23 kg/m 2 and 28 kg/m 2 . Figure 1 The distribution of BMI as measured by different correction equations and self-report compared to measured BMI in males.

Distribution of BMI
Sensitivity and specificity of corrected and self-reported BMI Table 1 displays the sensitivity and specificity estimates based on the self-reported and corrected values for BMI for males and females. Focusing on those instances with at least a 5 percentage point difference in sensitivity or specificity within sex-BMI category groups, we observed the following patterns: All corrections were similar to one another and superior to self-report in specificity of normal weight among women, sensitivity of overweight among both men and women, and sensitivity of obese among both men and women. There were two instances in which self-report was superior to corrected values: sensitivity of normal weight among both men and women. Overall, any corrected BMI was superior to selfreported BMI in estimating prevalence within weight categories. Tables 2 and 3 show the results of models regressing six health conditions on BMI (measured; self-report; BMIonly correction; weight-only corrections) for men and women, controlling for age. Focusing on differences in statistical significance (i.e., presence/absence) between coefficients in the measured BMI model and coefficients in each of the other models, the following is apparent: For men, of the 18 coefficients in each column, 16 in the selfreported BMI column had the same statistical significance status as measured BMI. The numbers for the BMI-only, weight-only, and simple weight-only corrections were 15/18, 16/18, and 15/18, respectively. For women, of the 18 coefficients in each column, 13 in the self-reported Figure 2 The distribution of BMI as measured by different correction equations and self-report compared to measured BMI in females. BMI column had the same statistical significance status as measured BMI. The numbers for the BMI-only, weightonly, and simple weight-only corrections were 15/18, 15/18, and 14/18, respectively. From this preliminary consideration of statistical significance, the correction equations and self-report BMI appear to perform similarly for men, while for women the correction equations appear to perform similarly to each other and better than selfreported BMI.

Predictive utility of corrected BMI in health condition models
In terms of the magnitude and direction of the coefficients for the relationships between BMI (self-reported and corrected) and disease outcomes (Tables 2 and 3), the self-reported and corrected BMI measures did not exhibit a consistent pattern for males or females across diseases. For males, the corrected odds ratios were higher than the measured odds ratios in almost all comparisons; exceptions were heart disease and diabetes (obese class I), high blood pressure (obese class II+), and self-reported health (overweight). All of the corrected odds ratios for activity limitation were lower than the measured estimates. For females, the corrected odds ratios are higher than the measured odds ratios for heart disease, diabetes, high blood pressure, and self-reported health (except for the simple weight-only correction for obese class I). The corrected odds ratios for activity limitation are all lower than the measured odds ratios, and for arthritis the weight-only corrections were lower than the measured odds ratios for overweight (simple correction only) and obese class II+. Thus, the magnitudes and directions of the estimated coefficients in the health condition models do not clearly point to a superior correction equation b . Furthermore with respect to estimates in Tables 2 and 3, the corrected odds ratios tended to have wider confidence intervals (suggesting less precision of estimate) than the corresponding measured odds ratios, but not in every case. Considering only obese class II + as an example, for males the corrected confidence intervals were wider than the measured confidence intervals for arthritis, heart disease, diabetes, and self-reported health. For females the corrected confidence intervals were wider than the measured confidence intervals in all cases but one (simple weight-only correction for arthritis).

Distribution of BMI
A corrected BMI distribution, regardless of whether BMIonly or weight-only, was found to be more accurate than the self-reported BMI distribution (see Figures 1 and 2). However, this only refers to the ability of the correction equations to bring the distribution of self-reported BMI into line with the distribution of measured BMI, unconditional on any other variables or restrictions. In terms of which correction is better, for males, it is not obvious that there is a superior corrected distribution. For females, the weight-only corrections follow the measured BMI distribution more closely than the BMI-only correction, making the weight-only corrections the best overall candidate correction for simply constructing the BMI distribution.

Sensitivity and specificity
Findings from the sensitivity and specificity analyses were mixed, with the weight-only and BMI-only performing similarly well in some cases (specificity for normal weight women, sensitivity for overweight men and women, and sensitivity for obese men and women), and self-report best in others (sensitivity for normal weight men and women). The high sensitivity for self-reported BMI in the normal categories for females is consistent with past observations that normal BMI females, on average, under-or overreport weight to a lower degree than other BMI groups [20,21]. In the absence of a superior correction equation, researchers must trade-off sensitivity and specificity when choosing a correction equation. The outcome of these trade-offs depend on the situation: for example, if a researcher were interested in studying a sample of predominantly normal weight males, where the weight-only corrections result in losing 10 percentage points in sensitivity in exchange for 6 percentage points in specificity, it is not clear that a weight-only correction is superior to self-reported BMI. Simply stating that one percentage point gain offsets another is inappropriate, especially when specificity may be more important for conditions like obesity, where the majority of the population is not obese [7].

Predictive utility of corrected BMI in health condition models
Taking the population distribution and the sensitivity/ specificity findings as a whole, our results suggest that if a researcher is interested in BMI statistics across a population, including estimating prevalence within weight categories, any correction presented here will be preferable to self-reported BMI. However, findings from the analysis of associations with health outcomes tell a different story: namely, findings were mixed, such that no correction was uniformly superior and in some cases the self-reported data outperformed the corrected data. None of the correction equations were able to consistently provide coefficient estimates closer to the measured BMI estimates than those provided by the self-reported estimates. The implication of this finding is that in a disease modelling context correction equations are not necessarily more useful than self-reported BMI. Our findings differ from other Canadian correction equation studies that have showed the BMI-only correction consistently provides coefficient estimates closer to the measured values [20,21]. In statistics, it is recognized that including a variable exhibiting measurement error on the right hand side of the regression equation can provide a biased estimate of effect [30]. The magnitude of this bias depends on the variance of the mis-measured variable (in our case, measured BMI) and the variance of the measurement error itself (in our case, the difference between measured and self-reported BMI). While the corrections presented here adequately correct the average BMI at percentiles of interest and can be used to estimate the distribution of BMI or the prevalence of obesity, they are unable to correct the variance of self-reported BMI. As a consequence, corrected BMI measures provide biased and inconsistent estimates of association when used as regressors, just as any mis-measured variable would. The key issue in our case is whether the magnitude of the bias is substantial (i.e., clinically or socially significant). We suggest that the magnitude of this bias is too large, and the direction of the bias too unpredictable, for corrected BMI variables to be used in this context of modelling health outcomes.
The health outcome association analysis shows that, given a disease in a logit model and a categorical BMI variable (a very common modelling convention), researchers should not necessarily use a correction equation in an effort to improve self-reported BMI. Further, if a researcher, seeking a conservative range of estimates, chose to report both a corrected and a self-reported estimate, it is not clear that the difference between the two should be meaningful anyway in this regression framework: when the corrected estimate is larger than the self-reported coefficient, it does not conform to the idea that the self-reported BMI estimate serves as an upper bound; when the corrected estimate is smaller than the self-reported coefficient, it is not a guarantee that the corrected estimate is closer to the measured BMI estimate or even on the correct side of the null value.
Resorting back to self-reported BMI is going to provide biased estimates of association between BMI and the dependent variable, as well as the biased and inconsistent estimates of association for all the other regressor variables in the model [30]. In short, our results suggest that if a researcher is interested in using BMI as a predictor variable for modelling disease, then both selfreported and corrected BMI result in biased estimates of association.

Limitations
It is important to acknowledge that, because correction equations are based on particular populations, which change over time and place, the equations themselves are likewise somewhat time and place dependent and should be updated over time. Changing misreporting patterns over time have been shown using data from the United States [27], and Ireland [31] and may reflect, in part, changing social attitudes about obesity. The fact that correction equations can change across time is important for modelling the BMI distribution, but updating a correction equation will not fix the issue of a corrected BMI providing biased and inconsistent regression results unless the update somehow deals with the variance of the misreported variable.
Although the response rate for the CCHS surveys as a whole were reasonably high (87.0% for the 2005 CCHS was and 85% for the 2008 CCHS), another limitation of the study is the lower response rate for the subsamples among which both self-reported and measured data were available and which constituted the basis for this study. The overall response rate for the subsample that provided measured height and weight was 55.9% for 2005 CCHS and 50.7% for the 2008 CCHS, most of the non-response was from refusal to be measured [21]. It is unlikely that those who refuse to be measured are randomly distributed across the BMI distribution, so if the individuals who are heavier are refusing to be measured, any correction equation based on this dataset will be inaccurate, including others that have been developed [20,21]. Strategies to improve response rate across the population are thus desirable.

Conclusions
The BMI-only correction has been applied to Canadian data extensively (e.g., Orpana et al. [32], Janssen et al. [33], and Barberio and McLaren [34]), attesting to the value of such corrections in the literature. Our findings support the use of BMI-only corrections if the researcher is interested in reporting the distribution of BMI, the prevalence of those above and below the obesity threshold of BMI 30 kg/m 2 , or any other cut-point. On the other hand, if the researcher is interested in estimating the effect of BMI on a health condition, then our findings suggest that corrected BMI, using any of the methods examined here, does not represent an improvement over self-report data.
Endnotes a "The right parameters" implies that the correct equation would not have a coefficient of 1 on every term, as it stands in equation (1). For instance, a data generating process with the coefficients 0.01 on the measured weight term in the error and 0.01 on the noise term would generate an exaggerated version of the relationship we observe in the literature. So it is just a matter of finding the right parameters to match the data. b The disease models were repeated for the entire sample (age 18 to 65). The results of those models (not reported) are consistent with those from the truncated sample; namely, they do not clearly point to any correction equation being superior.

Additional file
Additional file 1: Quantile-quantile plots comparing measured BMI with BMI constructed from combinations of measured and self-reported height and weight.
Abbreviations BMI: Body mass index; CCHS: Canadian Community Health Survey.

Competing interests
The authors declare that they have no competing interests.
Authors' contributions DJD conceptualized the study and led the analysis. LM contributed to study conceptualization and interpretation of results. DJD and LM contributed equally to the writing of the paper. Both authors read and approved the final draft.