Skip to main content
  • Research article
  • Open access
  • Published:

Neighborhood clustering of non-communicable diseases: results from a community-based study in Northern Tanzania



In order to begin to address the burden of non-communicable diseases (NCDs) in sub-Saharan Africa, high quality community-based epidemiological studies from the region are urgently needed. Cluster-designed sampling methods may be most efficient, but designing such studies requires assumptions about the clustering of the outcomes of interest. Currently, few studies from Sub-Saharan Africa have been published that describe the clustering of NCDs. Therefore, we report the neighborhood clustering of several NCDs from a community-based study in Northern Tanzania.


We conducted a cluster-designed cross-sectional household survey between January and June 2014. We used a three-stage cluster probability sampling method to select thirty-seven sampling areas from twenty-nine neighborhood clusters, stratified by urban and rural. Households were then randomly selected from each of the sampling areas, and eligible participants were tested for chronic kidney disease (CKD), glucose impairment including diabetes, hypertension, and obesity as part of the CKD-AFRiKA study. We used linear mixed models to explore clustering across each of the samplings units, and we estimated absolute-agreement intra-cluster correlation (ICC) coefficients (ρ) for the neighborhood clusters.


We enrolled 481 participants from 346 urban and rural households. Neighborhood cluster sizes ranged from 6 to 49 participants (median: 13.0; 25th–75th percentiles: 9–21). Clustering varied across neighborhoods and differed by urban or rural setting. Among NCDs, hypertension (ρ = 0.075) exhibited the strongest clustering within neighborhoods followed by CKD (ρ = 0.440), obesity (ρ = 0.040), and glucose impairment (ρ = 0.039).


The neighborhood clustering was substantial enough to contribute to a design effect for NCD outcomes including hypertension, CKD, obesity, and glucose impairment, and it may also highlight NCD risk factors that vary by setting. These results may help inform the design of future community-based studies or randomized controlled trials examining NCDs in the region particularly those that use cluster-sampling methods.

Peer Review reports


Non-communicable diseases (NCDs) are a growing global epidemic that disproportionately affect low- and middle-income countries (LMICs) [1]. In sub-Saharan Africa, they are now one of the leading causes of death among adults, and in order to begin to address this burden, high quality community-based epidemiological studies from the region are urgently needed [25]. Additionally, outcomes-related research either through observational cohort studies or randomized-controlled trials (RCTs) will be an important component of the public health response moving forward [6].

Nonetheless, many challenges exist in carrying out these studies. Poor infrastructure and a lack of resources in many of the sub-Saharan African countries limit rigorous studies, in part due to inadequate methodological capabilities. Physical addresses, phonebooks, and reliable census data are often unavailable for many populations in the region which means that representative community-based samples often require labor-intensive, prospective household surveys. In this context, cluster-designed sampling methods offer an efficient, practical, and cost-effective means of obtaining a representative sample from the population of interest [7, 8].

However, studies that use cluster sampling methods require extra considerations in their design and analyses, and cluster-designed studies in sub-Saharan Africa continue to inadequately address many of these considerations [9]. Because study participants or households are drawn from clusters, which serve as the primary sampling unit, they can demonstrate more homogeneity than would otherwise be expected from a simple, random sample. For NCDs, similar lifestyles, environmental risks, economic stress, and genetic backgrounds may all increase homogeneity within clusters, and consequently, this increased homogeneity within clusters, or intra-cluster correlation (ICC), can significantly affect the precision of population parameter estimates [10, 11]. The ICC is typically quantified by the ICC coefficient, and although the ICC coefficient can be calculated post hoc during the analysis stage, this method may not be preferable or ethical in many sub-Saharan African settings due to cost and limited resources. Accounting for the design effect beforehand allows for more accurate estimations of sample size, budget requirements, and logistical needs; however, for NCD-related research, few ICCs have been reported in the region [9, 10].

The Comprehensive Kidney Disease Assessment for Risk Factors, epidemiology, Knowledge, and Attitudes (CKD-AFRiKA) study is an ongoing project in northern Tanzania with the goal of understanding and addressing the health burden of chronic kidney disease (CKD) and CKD-related NCDs. As part of the study, we conducted a cluster-designed, community-based epidemiologic survey. In the design stage, we were unable to identify any comparable ICCs for health outcomes related to CKD or CKD-related NCDs, and we had to extrapolate them from data derived from high-income settings. To fill this gap, we report here the observed intra-cluster correlations for multiple NCD-related factors from a community-based, sub-Saharan African setting [12].


Ethics, consent, and permissions

The study protocol was approved by Duke University Institutional Review Board (#Pro00040784), the Kilimanjaro Christian Medical College Ethics Committee (EC#502), and the National Institute for Medical Research in Tanzania. Written informed consent (by signature or thumbprint) was obtained from all participants.

Study setting

We conducted a stratified, cluster-designed cross-sectional household survey between January and June 2014 in the Kilimanjaro Region of Tanzania, which has an adult population of more than 900,000 people [13, 14]. The region comprises seven districts, and our study was conducted in two of these districts, Moshi Urban and Moshi Rural, which served as strata for our sampling scheme. Within these districts, there are 21 and 31 administrative wards respectively that range in size from 1500 to 25,000 people. Each ward is then further sub-divided into neighborhoods (also known as streets). Neighborhoods are the most basic governmental administrative unit in Tanzania, and they range in population size from 500 to 5000 people. The 65 urban neighborhoods have a median population size of 2000 people and a median area of 0.50 km2. The 165 rural neighborhoods have a median population size of 2200 people and a median area of 4.00 km2. In total, there are 230 neighborhoods/streets in the Moshi Urban and Moshi Rural districts [14].

Sampling methods

We used a three-stage cluster probability sampling method stratified by urban and rural. We used a random-number generator to select twenty nine neighborhoods within the Moshi Urban and Moshi Rural districts. We based the random neighborhood selection on probability proportional to size sampling according to the 2012 national census [14]. From the twenty-nine neighborhoods, we then randomly selected the starting point for each sampling area (37 in total) using geographic coordinates randomly generated by Arc Global Information Systems (ArcGIS), v10.2.2 (Environmental Systems Research Institute, Redlands, CA). From the randomly-selected geographic point, we then chose households based on a coin-flip and die-rolling technique (Appendix 1). All non-pregnant adults (age ≥ 18 years old) living in the selected households were recruited. A neighborhood cluster, therefore, included a group of individuals living in geographically-related households within the boundaries of an administrative neighborhood.

We targeted an enrollment between 15 and 25 participants per sampling area based on the requirements of the CKD AFRiKA study. The total sample size was designed to estimate the community prevalence of CKD with a precision of 5 % when accounting for the cluster-design effect, assuming a CKD prevalence up to 20 % and an ICC coefficient of 0.05. To reduce non-response rates, we attempted a minimum of two additional visits during off-hours (evenings and weekends) and multiple phone calls using mobile phone numbers.

Data collection

Our data collection methods have been previously described in detail [12]. In brief, participants were tested for CKD and CKD-related conditions including diabetes and hypertension, and anthropomorphic data (including height, weight, and body mass index) were recorded for each participant.

CKD was defined as the presence of albuminuria (≥30 mg/dL; confirmed by repeat assessment) and/or a reduction in the estimated glomerular filtration rate (eGFR) ≤60 ml/min/1.73 m2 according to the Modification of Diet in Renal Disease equation without the race factor [15]. Hypertension was defined as a single blood pressure measurement of greater than 160/100 mmHg, a two-time average measurement of greater than 140/90 mmHg, or the ongoing use of anti-hypertensive medications. Glucose impairment was defined as an HbA1C >6.0 % in the presence or absence of ongoing treatment with anti-hyperglycemic medications. Diabetes mellitus was defined as an HbA1c level was ≥7.0 % or current known use of anti-hyperglycemic medications for the purpose of treating diabetes. Participants with an HbA1C between 6.0 % and 6.0 % in the absence of treatment with anti-hyperglycemic medications were considered to have pre-diabetes. Overweight was defined as a body mass index (BMI) greater than 25 kg/m2 and obesity was defined as a BMI greater than 30 kg/m2.

Data analysis

We used STATA version 13 (STATA Corp., College Station, TX) for all data analyses. Continuous variables were summarized by the mean and standard deviation (SD) or median and inter-quartile range (IQR). Categorical variables were summarized using counts and percentages. To address potential non-response bias, mean and prevalence estimates were sample-adjusted using age- and gender-weights based on the 2012 urban and rural district-level census data [14]. To estimate the level of clustering in health outcome variables at the household level, the sampling area level, and the neighborhood cluster level, we first fitted a mixed effect model with separate random intercepts for neighborhood, sampling area, and household for each of the outcomes of interest. In these models, after accounting for neighborhood, very little clustering (<15 %) remained at the sampling area level and household level indicating that most of the variation in these outcomes was explained at the individual and neighborhood cluster-levels. As such, we estimated the ICC for the neighborhood clusters only.

To estimate the absolute-agreement ICC coefficient for neighborhood clusters (ρ) we used a one-way, random effects analysis of variance (ANOVA) estimator which has been shown to perform well for both binary and continuous outcomes across a wide range of ρ and cluster sizes [1619]. These estimations were performed in STATA using the ‘loneway’ command which uses the F statistic to calculate ρ as described by Hayes and Moulton. Although alternative estimators are available for binary outcomes, given that the ANOVA estimator has been shown to perform well for binary outcomes, we chose to present all estimates based on the common, easily implementable approach as described above [17, 18].

We calculated ρ for the social characteristics, self-reported medical histories, physical and laboratory measurements, and measured health outcomes. Negative values were truncated at zero, and our reporting of ρ is in accordance with the guidelines suggested by Campbell et al. [20].

Variance estimation was based on asymptotic theory, as implemented in the ‘loneway’ command, which accommodates differing cluster sizes. The 95 % confidence intervals for each ICC coefficient were derived from the asymptotic standard error, which has been shown to provide good coverage probabilities for a wide range of parameter combinations including clusters, cluster sizes, and ρ [18, 21, 22]. Confidence intervals with negative values were truncated at zero.


Between January 2014 and June 2014, we enrolled 481 participants from 346 households from a total of 37 sampling areas (30 urban and 7 rural) within 29 neighborhoods (23 Urban and 6 rural) (Table 1). These 29 neighborhoods were located within 18 wards (13 urban and 5 rural). The mean age was 46.9 years (SD 15.1). The household non-response rate was 15.0 %. Men (p < 0.001) and adults 18–39 years old (p = 0.001) were more likely to be non-responders. The median neighborhood cluster size was 13.0 participants (IQR 9–21), and neighborhood cluster size ranged from 6 to 49 participants (Appendix 2).

Table 1 Unweighted proportions for demographic, social characteristics, self-reported medical histories, health outcomes, and design parameters stratified by setting; N = 481 (CKD-AFRiKA, 2014)

The majority of participants lived in an urban setting (n = 370; 77.0 %), were women (n = 358; 74.4 %), ethnically Chagga (n = 288; 59.9 %), and had obtained a primary school level of education (n = 349; 72.6 %), and most participants were occupied as farmers or daily wage-earners (n = 199; 41.4 %) (Table 1). Many participants reported an ongoing use of alcohol (n = 198; 41.2 %) and many reported a history of malaria (n = 427; 88.8 %), diabetes (n = 61; 12.7 %), or hypertension (n = 134; 28.0 %). Few reported a history of stroke, heart disease, tuberculosis, hepatitis, HIV/AIDS, COPD/asthma, cancer or kidney disease. From our assessment of NCD-related health outcomes, 149 participants (31.0 %) had hypertension, 138 (28.7 %) were obese, 57 (11.9 %) had CKD, and 129 (26.8 %) had glucose impairment of which 84 (17.5 %) had pre-diabetes and 45 (9.4 %) had diabetes.

Clustering varied across neighborhoods and differed by urban or rural setting. Overall ICC coefficients ranged from 0.00 to 0.125 with a mean value of 0.30 (SD 0.033) (Table 2). In the rural setting, ICC coefficients ranged from 0.000 to 0.331, and in the urban setting, ICC coefficients ranged from 0.000 to 0.109. Ongoing alcohol use exhibited the strongest neighborhood clustering (ρ = 0.125), which was most prominent in rural neighborhoods (ρ = 0.331). Ongoing tobacco use exhibited modest neighborhood clustering in both rural (ρ = 0.022) and urban settings (ρ = 0.042). Neighborhood clustering of self-reported medical histories was most significant for diabetes (ρ = 0.045), hypertension (ρ = 0.100), HIV (ρ = 0.054), and CKD (ρ = 0.020).

Table 2 Population-based intra-cluster correlation coefficients (ρ) for neighborhood clustering; N = 481 (CKD-AFRiKA, 2014)

Among the NCDs, neighborhood clustering varied with ρ ranging from 0.000 to 0.075. Hypertension (ρ = 0.075) exhibited the strongest clustering within neighborhoods followed by CKD (ρ = 0.440), obesity (ρ = 0.040), and glucose impairment (ρ = 0.039) (Fig. 1). Among those with glucose impairment, neighborhood clustering was more significant for pre-diabetes (ρ = 0.031) than for diabetes (ρ = 0.000). Neighborhood clustering for physical and laboratory measurements paralleled the NCD outcomes. Both systolic (ρ = 0.064) and diastolic (ρ = 0.056) blood pressures exhibited strong neighborhood clustering. Clustering for albuminuria was modest (ρ = 0.038), but it accounted for most of the neighborhood clustering observed for CKD when compared to serum creatinine or eGFR measurements. Similar to obesity and glucose impairment, clustering of BMI was more significant in urban neighborhoods (ρ = 0.049) while clustering of HbA1C was more significant in rural neighborhoods (ρ = 0.025).

Fig. 1
figure 1

Neighborhood clustering of non-communicable diseases in northern Tanzania. Intra-cluster correlation coefficients, presented by prevalence, for CKD, obesity, glucose impairment, and hypertension


In northern Tanzania, prevalence of NCDs, including hypertension, CKD, obesity, and glucose impairment, exhibited clustering by neighborhood. This clustering varied across urban and rural settings, and for NCD prevalence, it was most significant for hypertension and CKD. Based on the ICC coefficients that we observed, cluster-designed studies examining NCDs in the region should account for the design effect on precision or variance caused by clustering. In a region where the NCD burden is quickly growing, these results will be valuable in designing such studies, including cluster RCTs [5, 12, 23].

The urban and rural differences in neighborhood clustering of NCDs may highlight important environmental and lifestyle risk factors for the development of hypertension, glucose impairment, obesity, and CKD. The neighborhood clustering for hypertension and glucose impairment was most pronounced in the rural settings where families tend to remain more environmentally clustered, share meals, and work in similar agricultural jobs which may all contribute to the development of such NCDs that are known to be highly associated with lifestyle [2426]. On the other hand, obesity and CKD were most clustered in the urban neighborhoods. For obesity, this urban clustering highlights the importance that urban lifestyles, which may be clustered within neighborhoods on the basis of socioeconomic status, transportation, or occupation, play in the development of obesity. In the context of CKD, living in an urban setting has been shown to be a significant risk factor, yet specific etiologies associated with the urban environment remain unknown [12]. The clustering of CKD within urban neighborhoods that we observed may be important in highlighting causes of CKD, and it further stresses that public health efforts targeting CKD must take a broad approach that includes urban planning with sanitation improvement, safe drinking water, pollution reduction, and infection control.

Among all measured variables, ongoing alcohol use, hypertension, a self-reported history of hypertension, and a self-reported history of HIV were most highly correlated among cluster-sampled individuals, and the latter two variables may reflect an increased awareness and/or prevalence of these conditions within certain neighborhoods. In northern Tanzania, alcohol is commonly homemade and shared among households which may in part explain the significant clustering that we observed.

To our knowledge, this is the first community-based, household-level survey to report on the neighborhood clustering of NCDs in East Africa. As such, these are the first ICC coefficients reported for hypertension, CKD, obesity, and glucose impairment in the region, and compared to reports of ICC coefficients in high-income countries there are significant differences in several of the physical and laboratory variables [2729]. Because we also measured clustering in both an urban and rural settings we were able to demonstrate important differences which may help inform future studies examining the demographic transition of NCDs in sub-Saharan Africa where rapid urbanization is occurring [30].

Despite these strengths, we also noted a few limitations. Caution must be taken when applying these estimates to other populations and settings. Although the paucity of data currently available for NCD-related measurements and outcomes may make these results useful to researchers more broadly across the region, differences in prevalence and risk factors for NCDs, particularly those that are geographic or environmental-based, mean that even NCDs can cluster at different rates within villages, neighborhoods, or households. Additionally, although we used sample-balancing approaches to address potential non-response bias, the effect of participant non-response upon these estimates is not fully known. Finally, some results, such as self-reported medical history, rely upon the subjective response of individual participants, and as such, they may be prone to recall or response bias.


In conclusion, we have reported on the observed neighborhood clustering for several NCDs from a community-based study in northern Tanzania. The neighborhood clustering, which varied by urban or rural setting, was substantial enough to contribute to a design effect for NCD outcomes including hypertension, CKD, obesity, and glucose impairment, and it may also highlight NCD risk factors that vary by setting. These results may help inform the design of future community-based studies or randomized controlled trials examining NCDs in the region particularly those that use cluster-sampling methods.


  1. Fuster V. Cardiovascular disease and the UN millennium development goals: a serious concern. Nat Clin Pract Cardiovasc Med. 2006;3(8):401.

    Article  PubMed  Google Scholar 

  2. Stanifer JW, Jing B, Tolan S, Helmke N, Mukerjee R, Naicker S, et al. The epidemiology of chronic kidney disease in sub-Saharan Africa: a systematic review and meta-analysis. Lancet Global Health. 2014;2(3):e174–81.

    Article  PubMed  Google Scholar 

  3. Renzaho AM. The post-2015 development agenda for diabetes in sub-Saharan Africa: challenges and future directions. Global Health Act. 2015;8:27600.

    Google Scholar 

  4. Abegunde DO, Mathers CD, Adam T, Ortegon M, Strong K. The burden and costs of chronic diseases in low-income and middle-income countries. Lancet. 2007;370(9603):1929–38.

    Article  PubMed  Google Scholar 

  5. Unwin N, Setel P, Rashid S, Mugusi F, Mbanya JC, Kitange H, et al. Noncommunicable diseases in sub-Saharan Africa: where do they feature in the health research agenda? Bull World Health Organ. 2001;79(10):947–53.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Holmes MD, Dalal S, Volmink J, Adebamowo CA, Njelekela M, Fawzi WW, et al. Non-communicable diseases in sub-Saharan Africa: the case for cohort studies. PLoS Med. 2010;7(5), e1000244.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Organization WH. Training for mid-level managers: The EPI coverage survey. Geneva: WHO Expanded Programme on Immunization; 1991.

    Google Scholar 

  8. Luman ET, Worku A, Berhane Y, Martin R, Cairns L. Comparison of two survey methodologies to assess vaccination coverage. Int J Epidemiol. 2007;36(3):633–41.

    Article  PubMed  Google Scholar 

  9. Isaakidis P, Ioannidis JP. Evaluation of cluster randomized controlled trials in sub-Saharan Africa. Am J Epidemiol. 2003;158(9):921–6.

    Article  PubMed  Google Scholar 

  10. Donner A, Birkett N, Buck C. Randomization by cluster. Sample size requirements and analysis. Am J Epidemiol. 1981;114(6):906–14.

    CAS  PubMed  Google Scholar 

  11. Donner A, Koval JJ. Design considerations in the estimation of intraclass correlation. Ann Hum Genet. 1982;46(Pt 3):271–7.

    Article  CAS  PubMed  Google Scholar 

  12. Stanifer JW, Maro V, Egger J, Karia F, Thielman N, Turner EL, et al. The epidemiology of chronic kidney disease in Northern Tanzania: a population-based survey. PLoS One. 2015;10(4), e0124506.

    Article  PubMed  PubMed Central  Google Scholar 

  13. United Republic of Tanzania. Education sector performance report, 2010–2011. Dar es Salaam: Education Sector Development Committee; 2011.

    Google Scholar 

  14. United Republic of Tanzania. 2012 Population and housing census. Dar Es Salaam: Central Census Office and National Bureau of Statistics; 2013.

    Google Scholar 

  15. Wyatt CM, Schwartz GJ, Owino Ong’or W, Abuya J, Abraham AG, Mboku C, et al. Estimating kidney function in HIV-infected adults in Kenya: comparison to a direct measure of glomerular filtration rate by iohexol clearance. PLoS One. 2013;8(8), e69601.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Donner A, Koval JJ. The large sample variance of an intraclass correlation. Biometrika. 1980;67(3):719–22.

    Article  Google Scholar 

  17. Wu S, Crespi CM, Wong WK. Comparison of methods for estimating the intraclass correlation coefficient for binary responses in cancer prevention cluster randomized trials. Contemp Clin Trials. 2012;33(5):869–80.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Zou G, Donner A. Confidence interval estimation of the intraclass correlation coefficient for binary outcome data. Biometrics. 2004;60(3):807–11.

    Article  PubMed  Google Scholar 

  19. Hayes R, Moulton L. Cluster randomised controlled trials. New York: Chapman and Hall/CRC Press; 2009.

    Book  Google Scholar 

  20. Campbell MK, Grimshaw JM, Elbourne DR. Intracluster correlation coefficients in cluster randomized trials: empirical insights into how should they be reported. BMC Med Res Methodol. 2004;4:9.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Donner A. A review of inference procedures for the intraclass correlation coefficient in the one-way random effects model. Int Stat Rev. 1986;54(1):67–82.

    Article  Google Scholar 

  22. Mian IU, Shoukri MM. Statistical analysis of intraclass correlations from multiple samples with applications to arterial blood pressure data. Stat Med. 1997;16(13):1497–514.

    Article  CAS  PubMed  Google Scholar 

  23. Eldridge SM, Ashby D, Kerry S. Sample size for cluster randomized trials: effect of coefficient of variation of cluster size and analysis method. Int J Epidemiol. 2006;35(5):1292–300.

    Article  PubMed  Google Scholar 

  24. Kuate DB. Demographic, epidemiological, and health transitions: are they relevant to population health patterns in Africa? Global Health Act. 2014;7:22443.

    Google Scholar 

  25. Addo J, Smeeth L, Leon DA. Hypertension in sub-saharan Africa: a systematic review. Hypertension. 2007;50(6):1012–8.

    Article  CAS  PubMed  Google Scholar 

  26. Assah FK, Ekelund U, Brage S, Mbanya JC, Wareham NJ. Urbanization, physical activity, and metabolic health in sub-Saharan Africa. Diabetes Care. 2011;34(2):491–6.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Parker DR, Evangelou E, Eaton CB. Intraclass correlation coefficients for cluster randomized trials in primary care: the cholesterol education and research trial (CEART). Contemp Clin Trials. 2005;26(2):260–7.

    Article  PubMed  Google Scholar 

  28. Smeeth L, Ng ES. Intraclass correlation coefficients for cluster randomized trials in primary care: data from the MRC Trial of the Assessment and Management of Older People in the Community. Control Clin Trials. 2002;23(4):409–21.

    Article  PubMed  Google Scholar 

  29. Singh J, Liddy C, Hogg W, Taljaard M. Intracluster correlation coefficients for sample size calculations related to cardiovascular disease prevention and management in primary care practices. BMC Research Notes. 2015;8:89.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Agyei-Mensah S, de-Graft Aikins A. Epidemiological transition and the double burden of disease in Accra, Ghana. J Urban Health. 2010;87(5):879–97.

    Article  PubMed  PubMed Central  Google Scholar 

Download references


We would like to thank Professors G Ralph Corey and John Bartlett and all the staff of the KCMC-Duke Collaboration in Moshi, Tanzania for all of their efforts. We give a special thanks to Carol Sangawe, Cynthia Asiyo, and Nicola West for their integral role implementing the study and Jeffrey Hawley and Audrey Brown of the Duke Office of Clinical Research for their help in data management. We are also grateful to Estomih Mduma and his team for their help with translation needs. This study was supported by an NIH Research Training Grant (#R25 TW009337) funded by the Fogarty International Center and the National Institute of Mental Health; a Research and Prevention Grant funded by the International Society of Nephrology Global Outreach Committee; and a Master’s of Science research stipend from the Duke Global Health Institute. There was no involvement by the funding sources in the design, analysis, or writing of this study. All authors had full access to the data and had final responsibility for the decision to publish.

Author information

Authors and Affiliations



Corresponding author

Correspondence to John W. Stanifer.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JWS contributed to the study design, data collection, data analysis, and manuscript preparation. JE and ELT contributed to the data analysis and manuscript preparation. NT and UDP contributed to the study design, data analysis, and manuscript preparation. All authors read and approved the final manuscript.


Appendix 1: Standard Operating Protocol (SOP) for household selection


To provide a reproducible, systematic, and random method of selecting households for sampling.


  • Cluster = randomly, pre-selected geographic location that includes multiple households for sampling

  • Dwelling = A free-standing building that is covered by a roof. Buildings that share a foundation or appear to share a foundation should be considered as one dwelling.

  • Household = Persons residing within a dwelling whose food is prepared by the same person(s)

  • ID = Unique Identification Number that is assigned to each participant and each household.

  • Household ID = Two digit Unique Identification Number contained in the study ID number that is assigned to each household.

  • Eligible Individuals: Adults over the age of 18 who are not pregnant. Ex-pats or Temporary Residents should be excluded unless they are FULL citizens who reside in Tanzania full time (i.e. more than 9 months out of every year).


  • Cluster site identification

  • Household identification

  • Household selection process


  1. 1.

    Cluster site identification: the starting point from which household selection will occur has been identified based on a random GPS coordinates.

  2. 2.

    The dwelling physically closest to the starting point will be approached first.

  3. 3.

    Household Selection Process

    1. a.

      The first dwelling should be approached:

      1. i.

        If that dwelling fulfills the definition of a household then assign it a household ID and assess the eligibility of the household adults according to the enrollment protocol.

      2. ii.

        If that dwelling does NOT fulfill the definition of a household then move on to the next dwelling.

      3. iii.

        Unless the dwelling is clearly marked as a business, shop, or restaurant then the field surveyors should assume that it could be a household. They should then approach to confirm. (Remember that sometimes people who own shops also live in the back – if any doubt then they should always approach to confirm).

    2. b.

      To identify the next dwelling to approach for sampling, the following methods should be used:

      1. i.

        The field surveyor will stand with his/her back to the main entrance of the first dwelling.

      2. ii.

        Flip a coin.

      3. iii.

        If the coin lands on TAILS then proceed to your LEFT. If the coin lands on HEADS then proceed to on your RIGHT.

      4. iv.

        Next, roll the die to determine which house to approach. The numbers on the die represent which house number (in sequential order according to physical distance to the front door) will be chosen.

      5. v.

        If the surveyor comes to an intersection or dead-end before reaching the house number on the die, then flip the coin again to determine the continuing direction. Again, TAILS will be LEFT and HEADS will be RIGHT.

      6. vi.

        In instances where there is only one physical direction to go, then proceed in that direction.

      7. vii.

        If a dwelling repeats, then repeat the coin-flip and die process.

  4. 4.

    Protocol for Gated Houses

    1. a.

      House with a Gatekeeper

      1. i.

        First, contact the gatekeeper to explain our intentions. If agreeable, he may allow entry.

      2. ii.

        If not agreeable to entry, then leave a study overview pamphlet along with our contact information.

      3. iii.

        Arrange a follow-up time to see if the owners have expressed interest.

    2. b.

      Closed Gate House

      1. i.

        If a gatekeeper is present then proceed as above.

      2. ii.

        If no gatekeeper and no way to contact the household members, then record as non-response.

      3. iii.

        Two additional visits, including one off-hours visit (i.e. evening or weekend day), should be attempted according to the follow-up protocol.

    3. c.

      Open Gate

      1. i.

        First, ensure that there is no gatekeeper.

      2. ii.

        If no gatekeeper, then approach the household as you would any other dwelling.

Appendix 2

Table 3 Detailed characteristics of the urban neighborhood clusters
Table 4 Detailed characteristics of rural neighborhood clusters

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stanifer, J.W., Egger, J., Turner, E.L. et al. Neighborhood clustering of non-communicable diseases: results from a community-based study in Northern Tanzania. BMC Public Health 16, 226 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: