Comparing databases: determinants of sexually transmitted infections, HIV diagnoses, and lack of HIV testing among men who have sex with men

Background Early detection and treatment of STI/HIV are public health priorities. Our objective was to compare characteristics of men who have sex with men (MSM) in Dutch data available in 2010 from EMIS, an international internet survey, Schorer Monitor, a Dutch internet survey, and data from STI- clinic visits, since these might be subject to different and unknown biases. Methods Data from Dutch MSM Internet Surveys (EMISNLN = 3,787; Schorer Monitor, SMON N = 3,602), and 3,800 STI clinic visits (SOAP) were combined into one dataset. We included factors that were measured in all three databases. The socio-demographics included were age (at the time of the survey), zip code, and ethnicity. Behavioural variables included were the number of sexual partners, condom use with last sexual partner, drug use, being diagnosed with STI, being diagnosed with HIV, and HIV testing. Outcomes we investigated were being diagnosed with STI, HIV, and never been tested for HIV. Results Logistic regressions showed that determinants for being diagnosed with STI were having more sexual partners, drug use, and having had an HIV test (aORs 1.3 to 17.1) in EMIS and SMON. Determinants for being diagnosed with HIV in all three databases were older age, living in Amsterdam, and having more partners (aORs 1.8 to 4.4). In EMIS and SMON, drug use, non-condom use, and having STI were additional determinants (aORs 1.6 to 8.9). Finally, determinants associated with never been tested for HIV were being younger (only SOAP), living outside of Amsterdam, having fewer partners, no drug use, and no STI (aORs 0.2 to 0.8). Conclusions Risk factors from internet surveys were largely similar, but differed from STI clinics, possibly because it involves self-reports rather than diagnoses or because of differences in timing. The difference between the internet surveys and STI clinic data is much less pronounced for having never been tested, suggesting both are appropriate for this outcome. These findings shed light on conclusions drawn from different data sources, as well as the comparability of recruitment strategies, the robustness of risk factors, consequences of phrasing questions differently, and on (policy) implications based on different data sources.


Background
In 2013, 88 % of the new HIV infections in STI clinics in the Netherlands were diagnosed among men who have sex with men (MSM) [1]. Furthermore, MSM were more at risk for sexually transmitted infections (STI), such as gonorrhoea and syphilis compared to heterosexual men or women [1]. Therefore, MSM are often targeted for prevention activities and research investigating sexual behaviour.
Comparing findings regarding sexual behaviour among MSM obtained from different sources is challenging for a variety of reasons. First, while most surveillance systems provide good data on diagnoses and positivity rates of STI/HIV among MSM, and these systems increasingly include measures of sexual behaviour, these measures are often less readily available and standardized. Second, data on sexual behaviour among MSM are typically based on convenience samples, such as venue-based surveys. Therefore, research might have resulted in mixed influence of socio-demographic characteristics and behavioural risk factors [2][3][4]. Third, data sources might consist of identical (double participation) or similar (in terms of behaviour) MSM, possibly because of biases in self-selection. Since dating and cruising sites are often used in the recruitment of MSM, highly sexually active MSM are probably overrepresented [2].
In addition, when sexual behaviour is assessed, differences in question phrasing occur, which can result in pronounced shifts in meaning and ultimately answers. For example, in case of reference periods, if a researcher assesses a longer period this could influence participants' memory of and interpretation of the behaviour. Specifically, assessing larger time frames could lead to the interpretation that researchers are interested in more severe situations [5]. In terms of sexual risk behaviour, assessing longer periods could similarly result in shifts in severity of the reported risk behaviour. Besides language comprehension, other factors can also influence questionnaire answers. Specifically, psychological research has shown that minor changes in response formats, question context, question framing, and questionnaire source can have pronounced influences on answers [6][7][8][9].
It has been shown that Dutch participants in the European MSM Internet Survey (EMIS) are somewhat biased towards older, gay identified, HIV positive, and (sexually) active MSM, compared to participating MSM from other countries [10]. Comparing data of several Dutch data sources is helpful to gain insights into which MSM participated. Another reason to compare databases is that different studies found divergent risk factors. An epidemiological review, for instance, showed that having concurrent STI did not influence HIV transmission [11], whereas this effect was observed in clinical studies [12]. Notably, the exclusion criteria of this review specified results obtained from surveys.
Faced with the diversity approaches measuring sociodemographic and behavioural data, an unresolved question is what the effect of using different recruitment methods has on findings, and which findings are stable (comparable and reliable) across studies. The current study compares three databases to gain insights into differences between study populations, measurement methods, and the robustness of risk factors associated with being diagnosed with an STI or HIV, or never been tested for HIV. Lack of HIV testing is increasingly becoming an important topic of investigation, as up to 90 % of new HIV infections could be transmitted by people unaware of their infection [13]. Moreover, some MSM who were never tested for HIV showed risk behaviour and were at risk of contracting HIV. With insights on the (lack of) differences, the findings of studies using these kinds of databases can be interpreted with more certainty and recommendations for future studies and targeted control policies can be made.

Databases
This study was a secondary analysis of three anonymized databases described below. The European MSM Internet Survey (EMIS) is a multilingual, cross-sectional, online evaluation of HIV prevention needs of MSM in 38 countries. In total 3,787 men living in the Netherlands completed the survey from June 4 th -August 31 st 2010. MSM were recruited predominantly via instant messages on PlanetRomeo, Gaydar, and e-mails to Schorer Monitor participants, as well as via banners on websites that are frequently visited by MSM, through gay community organizations, and by using printed materials. An extensive description of the survey methods can be found elsewhere [14,15]. EMIS was approved by the Research Ethics Committee of the University of Portsmouth, United Kingdom (REC application number 08/09:21). Participants had to confirm that they had read the introductory text and consented to participate before proceeding to the questions.
The Schorer Monitor (SMON) is a yearly Dutch survey (up to 2011), investigating health, well-being, and sexuality among MSM in the Netherlands. In 2010, the SMON was filled out by 3,602 MSM; from March 22 th -May 2 nd . Recruitment was done via banners, printed materials, snowballing (men could invite three friends to participate), and as the SMON was a yearly initiative; men that participated in 2009 were invited to participate again [16]. Participants read an introductory text, containing information about the goals of the survey and privacy information. After this information a button was presented with 'I will participate' , which routed them to the questions. SOAP (Dutch abbreviation for 'SOA Peilstation' meaning STI registration system) is a database, containing information on STI consultations, -tests and -diagnoses from STI clinics in the Netherlands for surveillance purposes, but is more limited regarding behavioural information [1]. In 2010, 19,579 MSM STI consultations took place. We selected 3,800 sequential cases from an uninterrupted period, starting January 4 th and ending March 17 th (the 3,800 th case), to attenuate the chance of double cases (MSM visiting the STI clinic more than once in 2010) in our analyses. Ethical approval for the study was not necessary following Dutch law as the study used anonymous patient data collected for routine surveillance [17].

Measures
We included factors that were measured in all three databases in our comparison; as such, SOAP was the limiting database. Socio-demographics included age (at the time of the survey), zip code, and ethnicity. Behavioural variables included the number of sexual partners, condom use with last sexual partner, drug use, being diagnosed with STI, being diagnosed with HIV, and HIV testing. Differences between questions in the databases were present (Table 1). For example, in both EMIS and SMON data on STI/HIV is self-reported. In SOAP, laboratory diagnoses were available for STI/HIV, as well as self-reported HIV infection.
Another notable difference between databases is the assessment of number of sexual partners (Table 1 contains the analysed variables). In SOAP this variable includes female and steady partners, besides casual male partners. Reducing the difference between SOAP, EMIS, and SMON could be accomplished by including steady partners to the non-steady male partner measures. We have not done this as EMIS and SMON measured steady male partners differently. Specifically, EMIS considered the number of steady male partners that MSM had sex with over the last 12 months (e.g., more than 10 was an answer category), whereas SMON assessed sex with a steady male partner over the last 6 months (answer categories yes-no). Therefore, we decided to keep EMIS and SMON as similar as possible, thus not including steady and female partners.

Analyses
Uni-and multivariable logistic regression analyses were conducted to investigate associations between the outcomes, socio-demographic and behavioural factors for each database. The outcomes were being diagnosed with (one or more) STI, being diagnosed with HIV, and never been tested for HIV. Additionally, we analysed the interactions between the variables and databases to assess whether the effect of the variables on the outcomes differed significantly between databases.
We recoded residence to Amsterdam and a rest category, as numbers in the other cities were too limited to analyse separately and showed similar patterns. We also calculated a compound variable for drug use and being diagnosed with STI within the last 6/12 months. Recoded answer options for the two questions were 'No' (I did not have an STI/I did not use any drugs), 'Yes one' (I was diagnosed with one STI/I used one kind of drug), and 'Yes more than one' (I was diagnosed with more than one STI/I used more than one kind of drugs). Backward selection was performed, a priori including all variables for the likelihood ratio test. All statistical analyses

Characteristics of study populations
Baseline characteristics of the databases are summarized in Table 2. Due to the large number of participants, even small differences became significant and all variables differed between databases. Notably, EMIS had more non-Dutch participants (22.4 % vs 12.7 % and 17.9 % in SMON and SOAP respectively), which can be explained by the European nature of this survey. This survey was available in 24 languages, and promoted internationally; therefore, this survey obtained a more ethnic diverse sample. This is also clear from the distribution of ethnicities, as fewer participants are from the four biggest minority groups in the Netherlands (e.g., Turkey, Morocco, Surinam, and the Netherlands Antilles; 5.8 % of the non-Dutch participants in EMIS vs 19.9 and 20.9 %, in SMON and SOAP). In all three databases, a majority of MSM was from Amsterdam, had a Dutch ethnicity, used alcohol or poppers in the past 6 months, and was HIV negative. Most questions contained few missing values. However, in SMON and SOAP, condom use with last partner was an optional question, which explains why over 60 % of people did not provide an answer to this question. Concerning our outcomes, approximately a quarter of the participants had one or more STI (

Being diagnosed with STI
Multivariable analyses showed that having more partners, using drugs, and being tested for HIV (positive or negative) were associated with being diagnosed with STI in EMIS and SMON (Table 3). Not using condoms with last partner did not reach significance in the multivariable model of SMON, but it did in EMIS. In SOAP, not being Dutch and having used a condom with the last partner were associated with being diagnosed with STI. Specifically, analysing Dutch, non-Dutch Western, and non-Western separately, showed similar odds ratios (OR) 1.0 (95 % CI 0.8-1.4) in EMIS for Western and 1.2 (CI 1.0-1.5) for non-Western respondents, for SMON 1.1 (CI 0.8-1.5) and 1.3 (CI 1.0-1.7) respectively. In SOAP, Western 1.5 (CI 1.2-1.9; the only significant OR) showed higher odds of being diagnosed with STI and non-Western 1.0 (CI 0.8-1.4) did not differ from Dutch respondents.
Inspecting interactions between the variables and database showed an impact of database on the variables age, residence, number of partners, condom use last partner, and HIV status (p's < .01). In EMIS and SMON, being older, living in Amsterdam, having more partners, and been tested for HIV, especially testing positive, increased the odds of being diagnosed with STI. SOAP did not reveal these risk factors. Condom use with the last partner decreased the risk of reporting STI in EMIS (and univariable in SMON). Contrastingly, in SOAP this increased the risk of being diagnosed with an STI.

Being diagnosed with HIV
Multivariable analyses showed that for both EMIS and SMON, being older, living in Amsterdam (only in EMIS), not using condoms with last partner, using drugs, and being diagnosed with STI were significantly associated with being diagnosed with HIV (Table 4). In contrast, in SOAP only being older, living in Amsterdam, and having more partners were significantly associated with being diagnosed with HIV.
Inspecting the interactions between the variables and databases showed an effect of database on the number of partners, condom use with last partner, and being diagnosed with STI (p's < .01). Having more partners was associated with being diagnosed with HIV in SOAP, but not in EMIS and SMON. Moreover, not using a condom with their last partner and being diagnosed with STI were associated with being diagnosed with HIV in EMIS and SMON. In SOAP, these relations were lacking.

Never been tested for HIV
Multivariable analyses showed that being younger, living outside of Amsterdam, and having fewer partners were determinants for never been tested for HIV in all databases (Table 5). Additionally, not using drugs and not being diagnosed with STI were associated with never been tested in EMIS and SMON, but not in SOAP.
Inspecting the interactions between the variables and databases showed an effect of database on the number of partners, and being diagnosed with STI (p's < .03). In SOAP, the relation between having fewer partners and never been tested for HIV seems stronger than in EMIS and SMON. In EMIS and SMON, being diagnosed with STI decreases the chance of never having had an HIVtest, this association was not found in SOAP.

Discussion
Determinants of being diagnosed with STI or HIV differed remarkably between MSM recruited from STI clinics and MSM participating in internet surveys, whereas the data obtained from the two internet surveys were largely comparable. Moreover, the risk factors found via internet surveys for being diagnosed with STI/HIV are largely similar to findings from previous research [18,19]. The outcome of having never tested for HIV differed from the outcomes of being diagnosed with STI/HIV. The differences between risk factors between the internet surveys and STI clinic are much less pronounced for this outcome. We will discuss each determinant in the context of the outcomes of this study, the Dutch situation, and previous research.

Determinants for being diagnosed with STI/HIV and never been tested for HIV
In all three databases, older MSM were more likely to be diagnosed with HIV, as is to be expected. Older MSM will have had more sexual contacts, and are more likely to be exposed to HIV. Moreover, older MSM are also less likely to never been tested for HIV, although multivariably this result only remains for STI clinic data. Overall, younger and older MSM seem to be underrepresented assuming an equal fraction of MSM over age groups. However, younger MSM (<20 years) might genuinely be a smaller group, because they might not be sexually active or did not 'come out' yet. Similarly, there might be less older MSM (>55 years); due to HIV/AIDS or due to the political, legal and cultural climate of their youth [20]. Moreover, the Dutch sample seems somewhat more biased towards older, gay identified, HIV positive MSM as also mentioned in the introduction [10]. Notably, the age distribution is similar in all three databases. Furthermore, in a panel study from 2013, the age of a comparable group of MSM, is even higher than in our databases (mean age 50 years) [21]. This might suggest that people that participating in panel research might be older, but it is also an indication that the higher age of MSM in the Netherlands compared to other countries might not be a bias, but a true representation of the Dutch situation.
MSM from Amsterdam are more likely to be diagnosed with STI/HIV, they are also more likely to be tested for HIV. Because of the large MSM population in Amsterdam, there are more facilities for sexual contacts and testing. More temptations on the one hand, might increase chances to contract STI/HIV. On the other hand, this also affects openness towards and opportunities for testing. A lot of the STI/HIV prevention efforts focus on Amsterdam, and this shows that even though successful (fewer MSM never tested), there might still increased chance to contract STI/HIV. This research suggests that intervention aimed to increase testing uptake could focus more on people not living in Amsterdam, whereas STI and HIV prevention should still be targeted towards MSM in Amsterdam as well.
In the three databases, MSM were either asked about ethnic group, their country of birth, or their cultural background. Surprisingly, the overall composition of ethnicity seems similar over the databases. Indicating that, independent of recruitment method, some ethnic groups were not reached. This included some of the most important minority groups in the Netherlands (i.e., minorities from Turkey, Morocco, Surinam, and the Dutch Antilles). Future research on sexual behaviour should explicitly aim to recruit MSM with specific ethnic backgrounds, or find other ways to investigate behaviour among MSM from the biggest ethnic minority groups in the Netherlands.
Having more partners has divergent influence on being diagnosed with STI/HIV depending on whether data was obtained via internet surveys or from STI clinic attendees, but no difference was found between the databases for having never tested for HIV. Having more partners was a determinant associated with testing, which makes sense as having more partners implies more risk, hence more reasons to test for HIV. Furthermore, having more partners was indicative for being diagnosed with STI but not for being diagnosed with HIV in the internet surveys. STI were reported for the last 12 months, whereas HIV was naturally reported for lifetime, in that light finding an association between the number of partners (over the last 6 or 12 months) for having STI and not for HIV is logical.
Internet surveys and STI clinic data differed for condom use with last sexual partner. Despite the high number of missing data, condom use with the last partner in the internet surveys was protective against STI and HIV, whereas in the STI clinic it was a risk factor for STI and had no effect on HIV positivity. This difference could be explained by timing. Specifically, MSM might visit an STI clinic when there is an indication to do so. Possibly, people visiting an STI clinic suspect a STI, and therefore prospectively used condoms immediately before the visit.
The results for drug use were consistent. Unfortunately, we did not have information on drug use for STI clinic visitors. Using drugs was positively associated with STI diagnosis, diagnoses with HIV, and had tested for HIV. Possibly, drug use influenced risk behaviour directly with drug use resulting in more risky behaviours. Drug use could also influence outcomes indirectly, as people who are more likely to use drugs may also take risk in other domains, because they have an impulsive or sensation-seeking personality [22]. However, risk taking in one domain (doing drugs) is not necessarily indicative for risk taking in other domains [23]. It is important to notice that, frequency of use and the moment of drug  Factors differ significantly between databases, p-values < .05 are considered significant use (before or during sexual intercourse) were not assessed. We found similar patterns of behaviour irrespective of the sort of drugs (i.e., uppers, downers, or party drugs); future research should take frequency, sort, and timing of drug use into account. Despite lack of details, drug use still was an important risk factor related to all three outcomes. Finally, being diagnosed with one or more STI was also associated with being diagnosed with HIV and decreased the chance that MSM had never been tested for HIV. These factors are largely in line with previous studies using EMIS data [18,19,24]. In addition, being tested for HIV (and especially being tested positive) increased the risk of being diagnosed with STI.

Strengths and Limitations
One important limitation of this study is the possible overlap between the databases. The same MSM might have contributed data to all three databases. Moreover, MSM who took part in EMIS after being invited via email the submitted previously to SMON probably filled out both, unfortunately we are unable to identify overlapping records. Since, the internet surveys contained many questions and took some effort to fill out, selfselection bias could have taken place. MSM who are sexually active and attach importance to STI and HIV prevention might be inclined to participate in one or both surveys. Despite our efforts to limit the chances of double participation by our selection of cases in the STI clinic data, we cannot entirely rule out overlap. Another limitation is that these databases were not designed to be comparable, reflected most notably in difference in question phrasing, content of the questions, and the reference times. Obviously, there have been and still are initiatives for harmonization in the collection of behavioural data within Europe [25]. Notably, even though the surveys are not always comparable, their results are quite similar.
Finally, there is a limited availability of behavioural data in SOAP. This database is intended for surveillance and not research purposes. In the future, we plan to compare additional variables of EMIS and SMON, which were omitted in the current study because they were unavailable in SOAP. It will be possible to look at other psychosocial factors, such as unsafe anal intercourse with steady and casual partners, knowledge, and beliefs [18,19,24]. Besides

Conclusion
This research sheds light on the comparability of different recruitment strategies, with the important finding that there seems to be a difference between MSM responding to surveys and MSM visiting STI clinics. Especially when interested in risk factors for contracting STI and HIV, recruitment method seems important. The difference between the internet surveys and STI clinic data is much less pronounced for having never been tested, suggesting both are appropriate for this outcome. The robustness of the risk factors arising from different recruitment methods are limited and show large differences. Finally, this research sheds light on the differences in conclusions drawn based on different data sources, which are pronounced between internet surveys and data from STI clinics.