Ethnicity classification systems for public health surveys: experiences from HIV behavioural surveillance among men who have sex with men

Background Race and ethnicity classification systems have considerable implications for public health, including the potential to reveal or mask inequities. Given increasing “super-diversity” and multiple racial/ethnic identities in many global settings, especially among younger generations, different ethnicity classification systems can underrepresent population heterogeneity and can misallocate and render invisible Indigenous people and ethnic minorities. We investigated three ethnicity classification methods and their relationship to sample size, socio-demographics and sexual health indicators. Methods We examined data from New Zealand’s HIV behavioural surveillance programme for men who have sex with men (MSM) in 2006, 2008, 2011, and 2014. Participation was voluntary, anonymous and self-completed; recruitment was via community venues and online. Ethnicity allowed for multiple responses; we investigated three methods of dealing with these: Prioritisation, Single/Combination, and Total Response. Major ethnic groups included Asian, European, indigenous Māori, and Pacific. For each classification method, statistically significant associations with ethnicity for demographic and eight sexual health indicators were assessed using multivariable logistic regression. Results Overall, 10,525 MSM provided ethnicity data. Classification methods produced different sample sizes, and there were ethnic disparities for every sexual health indicator. In multivariable analysis, when compared with European MSM, ethnic differences were inconsistent across classification systems for two of the eight sexual health outcomes: Māori MSM were less likely to report regular partner condomless anal intercourse using Prioritisation or Total Response but not Single/Combination, and Pacific MSM were more likely to report an STI diagnosis when using Total Response but not Prioritisation or Single/Combination. Conclusions Different classification approaches alter sample sizes and identification of health inequities. Future research should strive for equal explanatory power of Indigenous and ethnic minority groups and examine additional measures such as socially-assigned ethnicity and experiences of discrimination and racism. These findings have broad implications for surveillance and research that is used to inform public health responses.


(Continued from previous page)
Conclusions: Different classification approaches alter sample sizes and identification of health inequities. Future research should strive for equal explanatory power of Indigenous and ethnic minority groups and examine additional measures such as socially-assigned ethnicity and experiences of discrimination and racism. These findings have broad implications for surveillance and research that is used to inform public health responses.
Keywords: Ethnicity classification, Public health, Surveillance, Sexual health, Health equity, Race, Racism, Surveys, Men who have sex with men (MSM), New Zealand Background Race and ethnicity identity classification systems have considerable implications for public health. Surveillance data are frequently used to identify disparities between racial/ethnic groups and to monitor trends [1,2]. Ethnic and racial identities are often used as a proxy for the impact of social processes of racialization such as experiences of racism. The systems used to categorise respondents into race/ethnicity variables have potential to reveal or mask important public health inequities and therefore require attention. Not only do racial/ethnicity classification systems influence research processes such as data collection, analysis, and reporting, but the outcomes of these systems can also activate public health responses, allocate resources, and stimulate further inquiry. As such, poor applications of these systems can undermine social justice-informed public health research and practice, but if used meaningfully, they have the power to redress ethnoracial discrimination and ameliorate health and social inequities by race and ethnicity.
In the human immunodeficiency virus (HIV) field, for example, differences in health outcomes and behaviours between Black men who have sex with men (MSM) and non-Black MSM were reported in the United States, Canada, and the United Kingdom [3]. This review identified that elimination of these disparities would not be achieved without addressing structural barriers such as unemployment, low income, incarceration, and low education [3]. Consequently, race and ethnicity information is an important feature of country-level responses to HIV and sexually transmitted infections (STI) and is fundamental to monitoring progress on social justice objectives [1]. However, few studies have empirically compared alternative racial/ethnicity classification systems used in public health research or surveillance [4,5] to understand better whether they influence findings, how, and for whom.
Given increasing "superdiversity", which describes enhanced levels of ethnoracial diversity as a result of contemporary immigration, [6] and multiple racial/ethnic identities in many global settings, especially among younger generations, different ethnicity classification systems can underrepresent population heterogeneity and can misallocate and render invisible Indigenous people and ethnic minorities. Aotearoa New Zealand offers a unique setting to examine these issues. Ethnicity variables are routinely included in public health research as mandated by crown responsibilities under the Treaty of Waitangi, New Zealand's founding document [7,8]. In New Zealand, ethnicity is a measure of cultural affiliation (self-identity), and is not a measure of race, ancestry, nationality, or citizenship. Ethnicity data in New Zealand are commonly collected using the national census question that allows for multiple responses. Three optional classification systems or methods have been developed to deal with this: prioritisation, single/combination, and total response [9]. Like Canada and Australia, New Zealand's HIV epidemic is concentrated among MSM, but, unlike these countries, Indigenous MSM are not overrepresented in HIV diagnosis statistics [10]. New Zealand also has an established programme of HIV behavioural surveillance among MSM with large and diverse samples, [2] enabling us to explore the effect of different classification systems on ethnoracial disparities in HIV risk and health seeking behaviours.
The overall objective of the current paper is to provide a quantitative evidentiary foundation for future methodological consideration and use of ethnicity data. Our aim was to investigate three different ethnicity classification methods (prioritisation, single/combination, and total response) and their impact on the sample size, resulting demographics, and eight behavioural and health screening indicators among MSM recruited into HIV behavioural surveillance in New Zealand. Although focused on HIV, we believe this study has implications for all medical and public health research interested in disparities by race or ethnicity.

Methods
The research was predominantly led by the first four coauthors who identify as Pākehā or European men, some of whom self-identify as gay. The remaining co-authors identify with a variety of ethnicities. New Zealand's HIV behavioural surveillance among MSM consists of repeated cross-sectional surveys conducted in multiple settings using consistent sampling over time: the Gay Auckland Periodic Sex Survey (GAPSS) and Gay men's Online Sex Survey (GOSS). In 2006, 2008, 2011 and 2014, GAPSS participants were recruited to self-complete voluntary and anonymous questionnaires from a community fair day, gay bars, and sex-on-site venues in Auckland (New Zealand's largest city), and GOSS participants from online dating sites nationally. Eligible men had to be at least 16 years old and report sex with a man in the last 5 years, and had not completed the survey previously that round. Responses from all four rounds were pooled to increase statistical power, and individuals who indicated participation in a previous year were removed in order to satisfy statistical assumptions of independent observations; this resulted in each participant being represented once in the dataset based on their earliest response. The Northern X Regional Ethics Committee granted ethical approval. Detailed surveillance methods are reported elsewhere [2].
Ethnicity, the primary independent variable, was selfidentified by participants using the Statistics New Zealand's 2013 census question (Table 1), asked in both English and te reo Māori. Results for the four largest ethnic groups are reported in alphabetical order: Asian (including Chinese and Indian), European, Indigenous Māori (the indigenous people of Aotearoa New Zealand), and Pacific (including Samoan, Cook Island Māori, Tongan, and Niuean) people. Importantly, people could indicate more than one ethnicity, without a requirement to indicate a sole primary or main one. Three alternative ethnicity classification systems to deal with the multiple responses data from "multi-ethnic" participants were applied: prioritisation, single/combination, and total response.

Prioritisation
If an individual reported multiple ethnicities they were categorised into a single discrete group and prioritised in the following hierarchy: first, Māori; second, Pacific; third, Asian; fourth, Other, and last, European (typically used as the referent group). For example, if someone identified as Māori and any other ethnicity, they were classified as Māori; if someone identified as Pacific and any other ethnicity except Māori, they were classified as Pacific; and if someone identified as Asian and any other ethnicity except Māori or Pacific, they were classified as Asian. Only those who identified as New Zealand European and/or any other European ethnicity were identified as European. This classification system produces a single ethnicity variable with no ability to identify multiethnic participants. Participants were coded as 1 "Māori", 2 "Pacific", 3 "Asian", 4 "Other" or 0 "European".

Single/combination
Similar to prioritisation, an individual is assigned to a discrete group within a single variable, but this method explicitly specifies multiple ethnicity combinations (e.g., European-Māori, Asian-Pacific) alongside exclusive major ethnicities (e.g., only Māori, only Pacific). This approach allows for identification of multi-ethnic participants as one or several discrete groups/levels in a single variable. This method adheres better to principles of self-identity for participants who report multiple ethnicities, but can be hampered by the large number of possible ethnic combinations and limited counts for multiple combination groups. There is also a highly heterogeneous category of multiple ethnicity remainder participants from ethnic combinations not explicitly reported as separate levels of this variable. Participants were coded as 1 "Māori only", 2 "Pacific only", 3 "Asian only", 4 "Other only", 5 "Māori-European", 6 "Pacific-European, 7 "Other ≥ 2 combinations", or 0 "European only".

Total response
Separate variables are established for each major ethnic group and an individual may be included in more than one of these. For example, a respondent identifying with both Māori and Pacific ethnicities would be included in two variables, both the Māori and the Pacific variables. Referent categories then become all other participants who did not indicate that ethnicity (e.g., Māori versus non-Māori). This multiple variable approach does not provide an explicit identification of multi-ethnic participants. In our analysis we use a common "European-only" comparator group for each major ethnicity variable (i.e., Māori versus European-only, Pacific versus Europeanonly, and Asian versus European-only), similar to previously published ethno-specific analyses of hospital intake data [11]. This "modified total response" approach enables comparisons with the ethnic group of greatest population size and social power (i.e. European). It also reduces heterogeneity of the referent group by removing Indigenous Māori and minority ethnicities that might otherwise obfuscate between-group differences. The use of a modified total response approach also means that all three classification methods have a consistent "European-only" referent. Participants were coded into a different variable for each major ethnic group: Māori variable coded as 1 "Māori", 0 "European-only", and 99 for "neither Māori nor European-only", Pacific variable coded as 1 "Pacific", 0 "European-only", and 99 for "neither Pacific nor European-only", etc.
Full descriptions of each method can be found elsewhere [9]. However, prioritisation is noteworthy because although this method has been discontinued in official government statistics as of 2005 as it under-counted Pacific and other minority ethnicities and had purported inadequate theoretical basis, [12] it is still commonly used in the health sector particularly when drawing comparisons between groups [13]. HIV behavioural surveillance in New Zealand has also continued to use this method [2]. Indigenous Māori perspectives on ethnicity classification include their right to name individual and collective identities [14]. We acknowledge and support the rights to self-identify and self-determination, which should be central to any work examining ethnicity.
The dependent variables used to compare methods included recruitment method (online versus inperson), three demographic factors (age < 30 years versus older, gay identity versus other, and at least some tertiary education versus none), and eight sexual health indicators (greater than 20 male sexual partners in the 6 months prior to survey, STI testing or treatment in the year prior to survey, any STI diagnosis in the year prior to survey, lifetime HIV testing, HIV testing in the year prior to survey, any HIV positive test result, any condomless anal intercourse (CAI) with casual partners in the 6 months prior to survey, and any condomless anal intercourse with regular partners in the six months prior to survey). Casual partners were defined as someone the participant had sex with up to three times in the 6 months prior to survey; regular partners were those they had sex with four or more times in the last 6 months.
All statistical analyses were conducted using Stata/SE 13.1. We compared the recruitment method and demographics of respondents who completed the ethnicity question with those who did not using chi-square tests of independence. Descriptive statistics for each variable were prepared based on each of the three ethnicity classification methods. Each demographic and sexual health outcome was regressed on ethnicity separately for each classification method described above using univariate logistic regression (p < 0.05 considered significant). Multivariable logistic regression was used to analyse further differences in sexual health outcomes after controlling for recruitment and demographic factors (i.e., recruitment year and method, age, education, and sexual identity), which were all forced into the models. Adjusted odds ratios (AOR) and 95% confidence intervals (95% CI) are shown.

Results
A total of 12,816 questionnaires were completed across the 2006, 2008, 2011 and 2014 national HIV behavioural surveillance rounds. Of these, 1896 (14.8%) questionnaires were removed as they indicated completion of an earlier survey. Of 10,920 unique responses, 10,525 MSM completed the ethnicity question (n = 395, 3.6% missing). Men who did not complete the ethnicity question were more likely to have participated in-person than online (p < 0.001) and to have some tertiary/post-secondary education than none (p = 0.003); there was no difference in missing ethnicity data by age (p = 0.261) or sexual identity (p = 0.480).
The sample size for each major ethnic group varied by classification method (Table 2). Multiple ethnicities were reported by 762 (7.2%) participants. All three classification systems identify the same number of individuals as European-only (n = 7867, 74.7% of respondents), which was the referent group used for comparisons within each classification method. Using Māori ethnicity as an example, all respondents who reported being Māori (n = 1058, 10.1%) were categorised as Māori for Prioritisation and Total Response classification, and since Māori have first priority for classification in Prioritisation the sample size of Māori is identical for both classification methods (10.1%). However, for Single/Combination, individuals who reported being Māori and no other ethnicity (n = 538, 5.1% of the sample or 50.9% of all Māori-identifying respondents) were categorised into a Māori only group, with the remaining 520 respondents (4.9%) reporting Māori and another ethnicity (most of whom reported being Māori and European, n = 428, 4.1%). The next largest and only other pairwise ethnic combination reported for Single/Combination was European-Pacific (n = 111, 1.1%). The remaining respondents who indicated any other combination of major ethnic groups (n = 223) were grouped together because there were fewer than 100 individuals (< 1% of entire sample) in each category.
Differences by recruitment method, age, sexual identity and educational attainment between ethnic groups and across the classification methods are shown in Table 3. Regardless of classification method, compared with European MSM, Māori MSM and ethnic minority MSM were less likely to be recruited online and more likely to be younger with the exception of Māori-only MSM using the Single/Combination classification, which did not have a statistically significant difference by age. Gay identity was more likely among Asian MSM, less likely among Pacific and Māori MSM, and no difference was seen for MSM of other ethnicities. When compared with European MSM, Māori and Pacific MSM were less likely to report a tertiary degree while Asian and Other MSM were more likely to report one.
Descriptive statistics and univariate differences by ethnicity are shown in Table 4. Overall, 932 men (9.1%) reported having more than 20 male sexual partners in the past 6 months, 2828 men (27.8%) reported CAI with a casual partner in the past 6 months, and 2617 men (25.7%) reported CAI with a regular partner in the past 6 months. Further, 4858 men (48.9%) reported STI testing in the past year, and 977 men (9.5%) reported an STI diagnosis in the past year. With respect to HIV, 7022 men (67.3%) reported an HIV test in their lifetime, 4185 men (40.1%) reported an HIV test in the past year, and 316 men (3.1%) reported an HIV diagnosis. There were univariable differences by ethnicity for each outcome. Table 5 presents the multivariable differences between sexual health outcomes by ethnicity classification method, which controlled for recruitment method and year, age, sexual identity, and education. The patterns of differences between ethnic groups, or lack thereof, were similar across all classification methods for six of the sexual health outcomes: number of sexual partners, any condomless anal intercourse with casual partners, recent STI testing, lifetime and recent HIV testing, and HIV status. However, associations between ethnicity and two sexual health outcomes (i.e. any recent condomless anal intercourse with regular partners and recent STI diagnoses) varied by ethnicity classification system.
First, Māori MSM were significantly less likely to report condomless anal intercourse with a regular partner in the past 6  There were some differences for participants who reported multiple ethnicities when compared with European-only participants. For example, when identified using Single/Combination, multi-ethnic participants were significantly more likely to report greater than 20 sexual partners in the past 6 months (AOR = 1.73, 95%CI: 1.13-2.64) and to report an STI diagnosis in the past year (AOR = 1.61 95%CI: 1.09-2.38).

Discussion
Using a pooled sample of 10,525 respondents who reported an ethnicity within the 2006-2014 rounds of New Zealand's national HIV behavioural surveillance for MSM, we compared three ethnicity classification methods (Prioritisation, Single/Combination and Total Response) and their impact on sample size, demographics, and sexual health outcomes. Different classification methods to categorize participants who reported multiple ethnicities altered sample size, and also revealed and masked associations in two of the eight selected sexual health outcomes by ethnicity. For example, even after controlling for demographic differences, when compared with European MSM, Prioritisation and Total Response ethnicity classification methods showed statistically significant higher condom use with regular partners among Māori MSM, while Single/Combination classification limited the sample size resulting in no observed statistically significant difference. However, for Given the discontinued use of prioritisation as an official government method of ethnicity classification in New Zealand, [9] a decision that was opposed by Indigenous Māori based on their rights to name individual and collective identities, [14] our study aimed to determine the impact of different classification methods when participants select multiple ethnic identities. Each classification method has its own strengths and drawbacks, which range from socio-political to technical statistical issues.  Prioritisation and Single/Combination produce single ethnicity variables that are simpler to report on and include in regression analyses. Ultimately, the choice of classification system should stem from the research question and population focus. If focused on overall populations and equity, we assert that prioritisation provides a single variable, upholds Indigenous Māori rights to self-identify, but undercounts Pacific and Asian participants. Total Response follows tenets of self-determination for all groups by assigning participants to each major ethnic group with which they identify, thus producing individual variables for each major ethnic group. For within-group/specific ethnic group analyses (e.g. among Pacific MSM exclusively), Total Response allows all participants who identified with a specific ethnicity to be counted as such. This will ensure that Pacific and Asian participants will have the most complete and accurate data possible to inform targeted strategies and programmes.
Classification method decisions will impact sample size; classification methods that produce the largest possible sample size for Indigenous Māori (Prioritisation and Total Response) and ethnic minority groups (Total Response) minimize the chance of Type II "missed opportunity" errors as long as differential or nondifferential misclassification is not increased, which is crucial given public health's interest in the experiences of and equity for Indigenous and minority groups. Future research, especially that interested in ethnic/racial inequities, should strive for equal explanatory power to ensure adequate sample sizes of Indigenous and ethnic minority groups to answer research questions [15]. Developed for Indigenous Māori, these approaches provide appropriate sampling methodology, respect for community voices, and are feasible in general and subpopulation studies [16,17].   A major strength of the current analyses was the comparison of three ethnicity classification approaches in a country with a strong history of public health interest in ethnicity. We pooled data across four cycles of an established national HIV surveillance programme providing large samples [2]. We examined behavioural data, extending previous research beyond hospital records [11]. We used multivariable analyses of sexual health outcomes to control for distinct recruitment and demographic differences by ethnicity. In the current study, Māori and Pacific men (as well as European-Māori and European-Pacific men) were more likely to report greater than 20 male sexual partners compared with European men, which may increase the likelihood of reporting any condomless anal intercourse given the increased number of sexual partners, highlighting the sensitivity of using such measures of risk. Data were selfreported, which limits our ability to examine actual HIV status. However, the indicators considered in our study address a causal pathway from risk behaviour (number of sexual partners and condom use) to health service use (HIV/STI testing) to health outcome (STI and HIV diagnoses). It is critically important that future work examine the temporal trends in these behavioural and health service outcomes, along with other structural factors which causally precede HIV exposure and transmission.
Previous analyses of MSM in New Zealand have considered ethnicity, but not as the primary explanatory variable or outcome. An exception is the 1996 Male Call/Waea Mai, Tāne Mā report, which compared Māori with non-Māori participants [18]. Compared with findings for Māori men from this 1996 national telephone survey of MSM, [18] the current study identified a persistent difference that Māori men were less likely to report condomless anal intercourse with regular partners. The current study identified two previously undocumented differences for Māori men, which were a greater likelihood of reporting greater than 20 sexual partners and any condomless anal intercourse with casual partners. Compared with European men in the current study, Pacific men were more likely to report greater than 20 partners and more likely to report any condomless anal intercourse with casual partners, the latter corroborating an analysis of infrequent condom use using 2014 data [19]. Other cross-sectional research focused on ethnic minority MSM in Britain [20] and the United Kingdom [21] found no effect by ethnicity with regard to reports of condomless anal intercourse and crosssectional research in China found that some ethnic minority MSM were less likely to report condomless anal sex [22].
With regards to HIV testing, Pacific MSM in the current study were less likely to have ever had an HIV test, an HIV testing disparity not present in the 1996 Male Call study in New Zealand [23], but noted in a research brief using the 2014 behavioural surveillance data [24]. Asian MSM in the current study were less likely to report any lifetime HIV testing and being HIV-positive, and also less likely to report any recent STI testing or diagnosis. The proportion of Asian MSM who reported never having been tested for HIV (35%) was higher than the proportion reported in the metro United States (8%) [25], Britain (Chinese: 23%, South Asian: 32%) [21], or Canada (21%) [26]. Previous studies had not found differences for Asian MSM in the United Kingdom [21] or in Canada [26] with respect to HIV testing uptake.
Public health and biomedical research has generally been poor at providing adequate detail about how race and/or ethnicity is operationalised [27]. Greater transparency is essential to improve confidence that public health responses are based on fair and rigorous data processes that can identify inequities [28]. Approaches to ethnicity classification should also be primarily driven by the research or public health objectives, particular if they relate to Indigenous and ethnic minority identities and communities [13]. Measures of ethnicity should include locally relevant categories, which may be aggregated into broader categories given practical constraints of population-based questionnaires (e.g., limited counts in some sub-groups) [28]. However, the meaning and utility of any macro-ethnic groupings must be questioned for programming and policy (e.g., Samoan and Tongan persons combined together into Pacific people [29], diverse groups combined as Asian people [30]). Further, in the context where the current study took place an individual's ethnicity is not an ingrained or permanent trait, but is instead influenced by social and environmental factors over time [31].
Ethnicity-focused research must also consider connections with other identities. For example, how sexuality may be influenced by and mutually reinforce constructs of ethnicity, and vice versa [32]. Intersectionality scholars push further and challenge public health and researchers to consider also the meaning of categories, more complex social locations (e.g., interactions with social class, age, gender, immigration status, acculturation), and social and behavioural processes [33,34]. Research should acknowledge both the characteristics and attributes of varying ethnicities, but also the social processes and hierarchies between ethnic groups [31]. This approach may be more relevant to informing health promotion and interventions aimed at achieving health equity by ethnic affiliation [33,34]. For example, previous work has demonstrated the added value of examining socially-assigned ethnicity to expand understandings of racialization and racism as a social determinant of health [35,36]. This may reduce issues of misclassification bias introduced when ethnic identity data are used as a proxy for these social processes and experiences. Future research would benefit from the inclusion of measures beyond ethnic/racial identity to those such as socially-assigned race/ethnicity and experiences of discrimination and racism.

Conclusions
In conclusion, our analyses evaluated ethnicity classification methods and commented on the implications for public health equity and research. As the identification of inequities can activate public health responses, but also (re) produce stigma, our findings have implications for government officials, policy makers, funders, and community stakeholders. While these analyses draw on pooled data specifically related to sexual health, HIV/ STIs, and the MSM population, the implications of this work are pertinent to any medical or public health work examining inequities by ethnicity.