The Kenya 2015 STEPS survey was a cross-sectional household survey that was carried out in Kenya from April to June 2015,targeting individuals aged between 18 and 69 years. The survey used the fifth national sample surveys and evaluation programme (NASSEP V) sampling frame from the Kenya National Bureau of Statistics, developed using the enumeration areas generated from the 2009 Kenya population and Housing census. The sample size was determined to be 6000 to allow for national estimates as per sex and residence (rural or urban).
A three stage cluster sample design was used. In the first stage, 200 clusters (100 urban and 100 rural) were selected. In the second stage, a uniform sample of 30 households from the listed households in each cluster, while in the third stage, one individual was randomly selected from all eligible listed household members.
Data collection
Socio-demographic and behavioral information was collected in step 1, physical measurements such as height, weight and blood pressure were collected in step 2 while biochemical measurements for blood glucose and cholesterol were taken in step 3 with respondents in a fasting state.
The survey focussed on the four main behavioural risk factors of NCDs: tobacco use, harmful alcohol consumption, unhealthy diet and lack of physical activity; and the four key physiological risk factors for NCDs:overweight and obesity, raised blood pressure, raised blood lipids and raised blood glucose. The survey questionnaire was adapted from the WHO STEPS instrument [9], with information being gathered in three sequential steps. Step one involved asking questions on demographic information such as age, sex, marital status, education and occupation, housing and social amenities as well as dietary history on salt, sugar, fat, fruits and vegetable intakes. Data collection was through a personal digital assistant (PDA) loaded with eSTEPS software provided by WHO.
Twenty multidisciplinary teams (supervisor, two research assistants, a clinician and laboratory technologist) were involved in data collection after undergoing a six day training on survey background, sampling method, questioning techniques, PDA use and ethical procedures.
Key variables
Twelve traditional non-communicable disease risk factors and nine risk factors for injuries were used in our analysis. These measures were both self-reported and objectively measured. The inclusion of these risk factors was based on availability of complete data for the study population. The cut-off points for these variables were based on international recommendations [10,11,12,13].
Risk variables for NCDs and injury
NCD risk variables
Inadequate fruit/vegetable intake, high sugar intake, insufficient physical activity, harmful alcohol use, tobacco use, excessive sitting time, general obesity, central obesity, high blood sugar, high salt consumption, high fat intake, and increased blood pressure.
Injury risk factors
Didn’t use seatbelt, didn’t use helmet, involved in traffic crash, had accidental injury, inappropriate road crossing, driving under influence of alcohol, was a passenger of drunk driver, involved in violence, and substance use/e.g. khat.
Data management and analysis
We used Stata 14.1 (Stata Corporation, College Station, TX) to analyse the data. Analysis was restricted to individuals with complete data on the key analytic variables listed above. Those with missing values were excluded from the analysis.
Cluster identification
For both categories of risk factors, the variables were recoded as 0 (low risk) and 1 (higher risk). Given the nature of the data, binary data, we used K-median cluster analysis approach. We used matching as a measure of distance of proximity. We used the scree plot to determine the ideal number of clusters.
Cluster characterization
The distribution of the risk factors across the clusters was examined to characterize the clusters based on the risk profile. Clusters were named based on their unique dominant risk profiles. The background characteristics of participants in each cluster were also summarized using proportions and the associations were tested using chi-square statistics.
Predictors of cluster distribution
Predictors of the cluster distributions were examined using logistic regression models. The background characteristics included in the model were age, gender, education, employment, residence, wealth index, and marital status. Results of this are presented in tables.
Ethical considerations
Written informed consent was obtained from every participant. Personal identifiers were delinked from the data by coding and the consent forms that contained personal identifiers were stored separately from the coded data. The data collection team was trained on ethical procedures and appropriate data collection techniques.