Comparing hemoglobin distributions between population-based surveys matched by country and time

Background Valid measurement of hemoglobin is important for tracking and targeting interventions. This study compares hemoglobin distributions between surveys matched by country and time from The Demographic and Health Survey (DHS) Program and the Biomarkers Reflecting Inflammation and Nutritional Determinants of Anemia (BRINDA) project. Methods Four pairs of nationally representative surveys measuring hemoglobin using HemoCue® with capillary (DHS) or venous (BRINDA) blood were matched by country and time. Data included 17,719 children (6–59 months) and 21,594 non-pregnant women (15–49 y). Across paired surveys, we compared distributional statistics and anemia prevalence. Results Surveys from three of the four countries showed substantial differences in anemia estimates (9 to 31 percentage point differences) which were consistently lower in BRINDA compared to DHS (2 to 31 points for children, 1 to 16 points for women). Conclusion We identify substantial differences in anemia estimates from surveys of similar populations. Further work is needed to identify the cause of these differences to improve the robustness of anemia estimates for comparing populations and tracking improvements over time.

surveys, the HemoCue® analyzer is the most routinely used device to measure hemoglobin concentrations in field settings (HemoCue AB, Angelholm, Sweden) system [4]. This device is relatively inexpensive, easily portable, does not require a cold chain, and produces a result within a minute. It is generally considered to be capable of providing precise and accurate measurements of hemoglobin concentrations using either capillary blood-obtained by puncturing the ball of the finger or heel-or venous blood-obtained by direct puncture to a vein [13,14]. However, emerging evidence indicates many methodological factors, including pre-analytic factors [15], can influence hemoglobin readings using the HemoCue®. Variance in the collection of capillary blood can lead to falsely low readings and in some cases higher readings [7,16,17]. Incorrect techniques include not allowing alcohol solution to dry before puncturing the finger, shallow puncture, squeezing or milking the finger, microcuvette contains air bubbles, incomplete filling and re-dipping of microcuvette, damaged or expired microcuvettes and blood clotting [17]. Apart from the blood collection technique, studies have cited inherent drop-to-drop variability in hemoglobin concentrations [18].
Compared to research on anthropometric data quality in population-based surveys [19][20][21], research aimed at assessing the robustness of hemoglobin and anemia estimates in population-based surveys is still in its infancy [6,22]. Given the importance of international population-based anemia estimates for tracking progress toward global health goals and prioritizing action for the most vulnerable populations, there is a need for work exploring the stability and replicability of these estimates from nationally-representative surveys. For this study, we identified pairs of nationally-representative surveys collected from the same country within relatively short periods of time (0 to 2 years) and compared hemoglobin concentration distributions (e.g., means, standard deviations, skewness and kurtosis) and estimates of anemia prevalence between these surveys. The rationale for this is that anemia estimates from such surveys would have a high likelihood of being used interchangeably to derive estimates for a country close to that point in time [2]. Paired surveys were selected from two sources of population-based survey data in low-and middle-income countries (i.e., the BRINDA project and The DHS Program). Our objective was to identify potential differences between these populationbased surveys in these key indicators.

Data
The Demographic and Health Surveys (DHS) Program is one of the principal sources of global data on hemoglobin and uses capillary blood to assess hemoglobin concentrations with the HemoCue® device. The second set of surveys came from the BRINDA project-a database composed of harmonized micronutrient surveys. This database was created as part of a multiagency collaboration to address programmatic issues related to inflammation, micronutrient biomarkers, and anemia etiology [23][24][25]. Like surveys from The DHS Program, the existing set of surveys in the BRINDA project database includes low-and middle-income countries from a range of world regions, including sub-Saharan Africa, South Asia, and Central Asia. The surveys in the BRINDA project database considered here used venous blood to assess hemoglobin concentrations using the HemoCue® device.
To control for population characteristics, we compared different nationally representative surveys from the BRINDA project and The DHS Program in the same country within two years of each other. Both platforms sample households from the official, national sampling frame. As such, while samples from survey pairs were independent, the target populations were approximately the same. One potentially crucial distinction across all survey pairs was the use of venous versus capillary blood collection, with BRINDA project surveys using venous and surveys from The DHS Program using capillary blood.
The DHS Program collects data on anemia through two kinds of surveys-Demographic and Health Surveys (DHS) and Malaria Indicator Surveys (MIS). For each nationally representative survey from The DHS Program with hemoglobin data, we searched for a nationally representative survey in the BRINDA project database from the same country which was within two years of the DHS survey. When more than one eligible DHS or MIS survey was matched to a single survey in the BRINDA project database, we chose the DHS or MIS survey with interview dates that were closest to the interview dates of the survey in the BRINDA project database. Two BRINDA project datasets were derived from the same data as the corresponding DHS survey, and so these were excluded from analyses. This selection process resulted in four pairs of surveys conducted between 2009 and 2016 from four countries in sub-Saharan Africa and South Asia.
All surveys were based on cluster-based sampling designed to provide nationally representative population estimates. In the DHS surveys used in this analysis, hemoglobin concentrations were assessed either in all households or in a randomly selected subpopulation of households. In half of all sampled households in one DHS survey (Cameroon), and one-third of households in two DHS surveys (Bangladesh and Malawi) the hemoglobin measurements were taken on all children (6-50 months) and reproductive age women  years) present in the households. In the only MIS survey included in these analyses (Liberia), all sampled households were selected, and hemoglobin was collected from all children, but not reproductive age women, in the selected households (Table S1).
Among the surveys in the BRINDA project database, all sampled households in Liberia and Cameroon were selected, whereas 20% of sampled households were selected in Bangladesh for hemoglobin measurement. In Malawi, all sampled households were selected for hemoglobin data collection for children (6-59 months) but only 45% of sampled households were selected for reproductive age women (15-49 years). Within selected households in Cameroon, one eligible child (12-59 months) and one eligible woman (15-49 years) were randomly selected. In Bangladesh, Liberia and Malawi, all eligible individuals within selected households were selected [children (6-59 months), reproductive age women (15-49 years), except in Liberia children (6-35 months)].
The Malawi 2015-2016 survey in the BRINDA project database was conducted in conjunction with the Malawi 2015-2016 DHS survey and was collected from a subsample of the households selected for the DHS survey.
In both The DHS Program and BRINDA project surveys analyzed here, hemoglobin concentrations were assessed with a portable hemoglobinometer. The type of device used in the four surveys included here is unknown; during the time period of these surveys the DHS and MIS surveys typically used the Hemocue 201+ but did occasionally use Hemocue 301. All surveys in the BRINDA project database used HemoCue 201+, except Malawi which used HemoCue 301. In the DHS and MIS surveys, the protocol was to obtain blood from heel pricks for children 6-11 months and finger pricks for individuals 12 months or older [22]. The DHS and MIS final reports did not specify the blood drop used, but the typical procedure has been to use the third drop of blood. The exception is when the assessment of hemoglobin concentrations has been combined with malaria or HIV, in which case the fourth and fifth drops of blood have been used, respectively [22]. In the surveys in this analysis, malaria was measured in Liberia and Cameroon in children, and HIV was measured in countries Cameroon and Malawi in women. The four BRINDA project surveys used a drop of blood from the vacutainer (pooled sample).

Exclusions
Consistent with prior analyses of surveys from The DHS Program, we included only usual or de jure residents in analyses to ensure that altitude measurements were relevant to the participants' living conditions [22]. Such information was not available for BRINDA project surveys. A number of exclusions were also made to increase comparability between the BRINDA project and The DHS Program. First, we restricted analyses to cases that had values for age, sex, and hemoglobin. We consider children of age 6-59 months. Due to restricted samples in the surveys in the BRINDA project database, the Cameroon analysis considered children 12-59 months and the Liberia analysis considered children 6-35 months. We excluded women who reported being pregnant. Based on guidelines proposed by previous researchers [26], we excluded adjusted hemoglobin concentrations outside of the following ranges-4.0 g/dL to 18.0 g/dL for women and children [22,26].

Adjusting for altitude and smokin
Only some of the surveys in the BRINDA project database included data for altitude. To ensure comparability between surveys from the same country while also adjusting for altitude when possible, we adjust for altitude only in countries when both surveys for that country include values for it (Malawi). Two other countries were uniformly below 1000 m altitude and thus required no adjustment (Bangladesh and Liberia). Altitude was missing from the Cameroon survey in the BRINDA project database and therefore no adjustment was made; however, altitude may have been relevant in this case.
There were no BRINDA-DHS country pairs where both surveys included information on smoking. In those countries where smoking was recorded, it was very rare (< 1% of women in countries Cameroon and Malawi). In the two countries without smoking data for either BRINDA project or The DHS Program (Bangladesh and Liberia), the World Bank estimates that fewer than 3% of women smoke [27]. Given the negligible prevalence of smoking among women in all BRINDA-DHS country pairs, we do not adjust to maintain comparability between surveys from the same country.

Key variables
For the primary analyses, we examine distributional properties of hemoglobin, including dispersion, skewness and kurtosis, and estimates of any, mild, moderate, and severe anemia. We also report survey-specific digit preference in supplemental materials (Table S1).

Distributions of hemoglobin concentrations
To examine differences in the distributions of hemoglobin between samples from the same country, we estimated the standard deviation, skewness, and kurtosis of the distributions of concentrations. To our knowledge, there are no systematically validated guidelines for dispersions of hemoglobin concentrations that reflect acceptable data quality. However, a commonly cited review article proposed that "surveys or surveillance systems with apparently acceptable quality technique using HemoCue© tend to be in the 1.1-1.5 range", an observation based on the authors' empirical experience [26]. Based on this guidance, we flagged standard deviations that lie outside of 1.1 to 1.5 as potentially worth investigation [26]. We report these because they may be of interest to researchers, but using this range to identify acceptable quality technique is premature pending more systematic study.

Contextual variables
When available, we also report survey-level variablesmonths in which survey was conducted and prevalence of iron supplementation and malaria among childrenthat may be relevant for anemia estimates ( Table 2). Reported iron supplementation among children in the last seven days was collected for DHS surveys in Bangladesh and Malawi and BRINDA project surveys in Bangladesh and Cameroon. In the Malawi survey in the BRINDA project, iron supplementation was reported in the last month. Children were tested for malaria with: (1) rapid diagnostic tests in DHS surveys in Liberia and Cameroon and BRINDA project surveys in Liberia and Malawi, and (2) blood smears in the DHS survey from Liberia and the BRINDA project survey from Cameroon. Population estimates are reported for children ages 6-59 months children in Bangladesh and Malawi, 6-35 months in Liberia, and 12-59 months in Cameroon.

Analyses
For each pair of surveys, we used unweighted and nondesign effect analysis for distribution estimates (e.g., standard deviations, skewness, kurtosis), to focus on the variance of individual measurements and not population-level estimates. Mean adjusted hemoglobin and the prevalence of anemia classified as any, mild, moderate or severe was adjusted for survey design to compare nationally representative estimates between surveys. Analyses in Additional file 1: (Table S2) show that estimates unadjusted and adjusted for survey design were highly correlated (of 20 indicators, 15 with r > 0.90 and all but one with r > 0.80; child skewness). We also reported mean hemoglobin (Additional file 1: Figure S2) and prevalence of anemia (Fig. 3) by age (6-11.9 months, 12-23.9 months, 24-35.9 months, and 36-59 months for children; 15-19.9 years, 20-29.9 years, 30-39.9 years, 40-49.9 years for women).
To assess any potential differences between paired surveys, we used tests adjusted for survey design to ensure inferences are based on appropriate standard errors. We used a design-based F-test for comparing proportions for dichotomous variables (implausible values, any, mild, moderate, and severe anemia), and t-tests of mean difference for continuous variables (adjusted hemoglobin concentrations). We used the delta method to test differences in variance, skewness, and kurtosis [28]. To test whether distributions of hemoglobin concentrations deviate from normality, we used a Shapiro-Wilk test as an overall test of normality as well as specific tests of skewness and kurtosis [29]. Given the large number of tests of differences between paired surveys and of deviations from normality, we set alpha at 0.001 for these tests. To examine the correlation between mean and standard deviation of adjusted hemoglobin across all surveys, we used a Pearson correlation coefficient.  Table 2). and heavy-tailed (mean child kurtosis = 3.9, range = (3.2,5.7); mean women kurtosis = 4.6, range = (3.6,5.7)). Six of the fourteen surveys had hemoglobin distributions that fell outside of previously described bounds based on surveys with "acceptable quality technique using the HemoCue® system" [26]. In two of 8 child surveys, hemoglobin dispersions were greater than 1.5 standard deviation, and one survey-Bangladesh BRINDA-was less than 1.1 standard deviation [26]. In three of the 6 women's surveys, hemoglobin dispersions were greater than 1.5 standard deviation (Tables 3 and  4). Distributions with dispersions that fell outside of these bounds did not differ between the BRINDA project and The DHS Program surveys.

Distributions of adjusted hemoglobin concentrations
The adjusted hemoglobin distributions from the six women's surveys and eight children's surveys all significantly deviated from normality (Shapiro-Wilk p < 0.001). Moreover, all of the women's surveys had negative skew and kurtosis that significantly deviated from a normal distribution (p < 0.001). Six of the eight child distributions had significantly heavy-tailed kurtosis (p < 0.001). The exceptions were the DHS and the BRINDA Project surveys from Cameroon, and the MIS survey from Liberia.

Differences between distributions and prevalence estimates
Mean adjusted hemoglobin concentrations were significantly higher in BRINDA project child surveys in three countries (difference of 0.7 to 1.0 g/dL) and BRINDA project women's surveys in two of the same countries (difference of 0.3 to 0.4 g/dL; Tables 3 and 4, Fig. 2). These three BRINDA project surveys also showed significantly lower prevalence of any anemia (18-31 percentage point difference in children, 9-16 percentage point difference in women) compared to the matched survey in The DHS Program (Table 5). Overall, the median difference in anemia prevalence between the BRINDA project and The DHS Program surveys was 19% for children and 9% for women. Fig. 3 illustrates these differences by age. At the same time, there was also substantial variation in the magnitude of these differences across countries, ranging from 2 percentage points in Cameroon to 31 percentage points in Malawi for children (Table 5).
Standard deviations were significantly different in one pair of child surveys and two pairs of women's surveys, and in both cases the standard deviation was greater in the survey from The DHS Program. There were no significant differences in kurtosis or skewness between any of the paired surveys (Tables 3 and 4).

Contextual differences between surveys
Two survey pairs-Bangladesh and Malawi-were collected during overlapping periods (Table 2), while two others-Liberia and Cameroon-were collected in nonoverlapping periods. Notably, the two Malawi surveys were collected as part of a joint effort, with the micronutrient survey in the BRINDA project collected from a subsample of the DHS survey. Child iron supplementation was generally quite low in all surveys. In the two countries-Liberia and Cameroon-where both surveys tested children for malaria, the results were roughly comparable, though somewhat higher in the DHS surveys.

Discussion
In this study, we compared hemoglobin distributions and anemia estimates between nationally representative surveys from two sources-The DHS Program and the BRINDA project database-matched by country, within two years of each other, and using a similar method for hemoglobin assessment, Hemocue©. Overall, we found marked differences in mean hemoglobin concentrations and anemia prevalence estimates between paired surveys in three of the four survey pairs, with consistently higher  prevalence estimates in The DHS Program surveys (median percentage point difference for children = + 19%, for women + 9%). These differences are generally larger in the child surveys. The differences between paired surveys were also generally consistent across age categories (Fig. 3). Given that some paired surveys were conducted in different seasons, it is possible that malarial seasonality may have played a role in these differences. Specifically, lower levels of anemia are typically found in the dry season and higher levels in the rainy season in places with high burden of malaria [8][9][10]. However, the paired surveys with the largest difference in anemia estimates (61% versus 30% in Malawi) were conducted within weeks of each other. Moreover, the other two surveys with available malaria data (Liberia and Cameroon) showed comparable prevalence of malaria among children. Another possible explanation for such differences in anemia estimates is variation in child iron intervention. However, the only country with data for both paired surveys (Bangladesh) on iron supplementation showed practically no iron supplementation (2% in both 2011 DHS and 2012 BRINDA) despite large differences in anemia estimates.
Differences in blood specimen type (venous or capillary) may also contribute to the observed differences in anemia prevalence [6]. However, if specimen type were the primary reason for differences in estimates, one would expect the percentage differences in prevalence to be similar across country contexts. Instead, substantial variability in difference between paired survey prevalence were observed (ranging from 1 to 31 percentage points). This variability in the difference between surveys using venous and capillary specimens is consistent with previous studies that have shown capillary blood can variously produce higher [30][31][32], lower [33], and comparable [5,14] average hemoglobin concentrations when  a within-pair difference significant at alpha = 0.001. Prevalence of 0 indicate prevalence less than 0.5%. Mean hemoglobin and % anemia estimates account for complex survey design. SD skew and kurtosis do not. All tests of between-survey differences are based on complex survey design. BR BRINDA, Biomarkers Reflecting Inflammation and Nutritional Determinants of Anemia project. DHS, The Demographic and Health Survey Program. any anemia < 11.0, mild 10.0-10.9, moderate 7.0-9.9, severe < 7.0. Hemoglobin concentrations adjusted for altitude compared to venous blood [6,34]. A recent paper found venous and capillary blood produced comparable results in a controlled environment, which points to the importance of good training to ensure the pre-analytical phase of data collection is done well [5]. A recent large-scale metaanalysis of hemoglobin measurement methods suggests that capillary fingerprick usually produces higher hemoglobin concentrations than venous blood-the opposite of our findings [35]. Others have found variability between blood drops even when collected by well-trained technicians in a controlled environment, which may indicate a pooled sample would reduce variability [18,36]. Thus, there remain a number of unanswered questions about the direction and magnitude of differences in results based on the type and collection of blood specimen . There may also be other as yet unmeasured factors that can lead to differences between surveys in anemia estimates. These may include the environmental factors that differentially affect accuracy of each method (e.g., humidity) [5], HemoCue® model [7,37,38], and biological variability [6,11]. Differences in survey sampling procedures may also have played a role [39]. Surveys from The DHS Program selected all eligible participants, whereas in the surveys from the BRINDA project database, some selected only a subset of individuals within the household. Nevertheless, this is unlikely to solely account for the substantial differences seen between surveys. Another possible source of variation is that surveys from The DHS Program are implemented with the support and guidance of a single organization while micronutrient surveys included in the BRINDA project are carried out by varying organizations and are less harmonized. Further, the HemoCue© model was not consistent across surveys (e.g., 201+ versus 301). For example, in Malawi the DHS Program used the 201+ model, while the BRINDA survey used the 301.
While it is unlikely that any of these differences solely account for the substantial differences in estimates observed, the relative contribution of each is unknown, limiting the ability to interpret trends in estimates of anemia from national, population-representative surveys. At a minimum, these findings suggest that further study is necessary to understand the causes of such substantial variation. This will help determine what additional  variables need to be recorded and reported to assist in contextualizing and interpreting hemoglobin estimates. These variables potentially include the specific details of blood specimen collection, HemoCue® model, and ambient humidity. A better understanding of how these and other nuisance variables influence hemoglobin estimates will assist researchers and policymakers interpret and compare estimates derived from diverse global sources. Despite these unexpected differences in mean hemoglobin and anemia prevalence, we also found some common properties of hemoglobin distributions across all samples. First, hemoglobin distributions deviated significantly from a normal distribution, all with negative skew and heavy tails (i.e. kurtosis > 3). This is consistent with other studies of hemoglobin that have had sufficiently large samples to identify deviations from normality [22,40,41].
Notably, nearly half of the distributions (6 of 14 surveys) had standard deviations outside of the range noted in a previous review of cross-sectional surveys and surveillance systems (1.1 to 1.5) with apparently acceptable quality technique using the HemoCue© [26]. However, as noted earlier, these guidelines were based on authors' empirical experience rather than a systematically designed validation study. Future systematic study of larger datasets and clear validation measures should assist in determining how properties of hemoglobin may indicate lower data quality and increased measurement error.
In these analyses, we made every effort to identify comparable surveys and to restrict samples to maximize comparability between surveys from the same countries. However, some critical differences deserve attention. First, we were unable to compare data completeness between survey type because there was insufficient information on refusal rates or other reasons for missing data among those eligible for hemoglobin testing for the surveys in the BRINDA database. Conceivably, if more caregivers of younger children refused testing, the prevalence of anemia could be artificially lower given younger children are at a greater risk of anemia [22]. Second, comparing surveys that were collected in the same year and where data collection covered similar months would be ideal for a more controlled comparison. Controlling by month may be particularly important  in situations where disease burden and nutrition follow seasonal changes. The small number of survey pairs in this study also constrains our ability to examine potential reasons for those differences.

Conclusion
By comparing nationally-representative surveys from the same country conducted within two or less years, we identified substantial differences in anemia estimates (median 16 percentage points (pp), range 1-31 pp) across all children's and women's surveys. Substantial differences in anemia estimates from population-based surveys conducted close in time can cause confusion for governments, program planners, and global anemia reduction trackers. Further work is needed to identify the cause of these different estimates, including potential effects of survey implementation or supervision, systematic method variance, and ecological conditions such as seasonal variation in disease burden and food insecurity.
Additional file 1: Table S1. Percentage of sampled households with hemoglobin measurements. Table S2. Correlations of unweighted and survey-weighted estimates of key variables. Figure S2