Reliability, validity and invariance of the Simplified Medication Adherence Questionnaire (SMAQ) among HIV-positive women in Ethiopia: a quasi-experimental study

Background: Adherence to antiretroviral therapy is critical to the achievement of the third target of the UNAIDS Fast-Track Initiative goals of 2020-2030. Reliable, valid and accurate measurement of adherence are important for correct assessment of adherence and in predicting the efficacy of ART. The Simplified Medication Adherence Questionnaire is a six-item scale which assesses the perception of persons living with HIV about their adherence to ART. Despite recent widespread use, its measurement properties have yet to be carefully documented beyond the original study in Spain. The objective of this paper was to conduct internal consistency reliability, concurrent validity and measurement invariance tests for the SMAQ. Methods: HIV-positive women who were receiving ART services from 51 service providers in two sub-cities of Addis Ababa, Ethiopia completed the SMAQ in a HIV treatment referral network study between 2011 and 2012. Two cross-sections of 402 and 524 female patients of reproductive age, respectively, from the two sub-cities were randomly selected and interviewed at baseline and follow-up. We used Cronbach’s coefficient alpha (α) to assess internal consistency reliability, Pearson product-moment correlation (r) to assess concurrent validity and multiple-group confirmatory factor analysis to analyze factorial structure and measurement invariance of the SMAQ. Results: All participants were female with a mean age of 33 (33.06-33.74; median: 34 years; range 18-45 years. Cronbach’s alphas for the six items of the SMAQ were 0.66, 0.68, 0.75 and 0.75 for T1 control, T1 intervention, T2 control, and T2 intervention groups, respectively. Pearson correlation coefficients were 0.78, 0.49, 0.52, 0.48, 0.76 and 0.80 for items 1 to 6, respectively, between T1 compared to T2. We found invariance for factor loadings, observed item intercepts and factor variances, also known as strong measurement invariance, when we compared latent adherence levels between and across patient-groups. Conclusions: Our results show that the six-item SMAQ scale has

adequate reliability and validity indices for this sample, in addition to being invariant across comparison groups. The findings of this study strengthen the evidence in support of the increasing use of SMAQ by interventionists and researchers to examine, pool and compare adherence scores across groups and time periods.

Background
According to the United Nations AIDS Program (2019), by the end of 2018 nearly 38 million people were living with HIV/AIDS, of whom 23 million were on antiretroviral therapy (ART) [1]. HIV treatment using ART can improve functionality and decrease mortality but lapses in adherence may render treatment permanently ineffective, for example, due to drug resistance [2]. The WHO has defined adherence as "the extent to which a person's behavior -taking medication, following a diet, and executing lifestyle changes, corresponds with agreed recommendations from a healthcare provider" [3]. Nonadherent patients have higher mortality rates than adherent ones with similar CD4+ counts and adherence is the critical determinant of survival among persons living with HIV [4,5,6]. Non-adherence is also associated with poor health outcomes, increased healthcare costs and poor patient safety, due to increased risk of dependence, relapses, toxicity, to mention a few [3]. Adherence is reported to be a major challenge in healthcare, estimated at 50 percent in high-income countries and even lower in some low and medium income countries [3]. Adherence is also critical to the achievement of the third target of the UNAIDS Fast-Track Initiative goals of 2020-2030, in which 90-95 percent of people with HIV are diagnosed with it, 90-95 percent of the diagnosed receive ART, and 90-95 percent of those on ART achieve viral suppression [7,8,9].
In Ethiopia, treatment adherence and retention were estimated to be on average 51-85 percent and 70 percent among those who had been initiated on ART, respectively Accurate measurement of adherence is important for correct assessment of health outcomes and in predicting the efficacy of ART [3]. Non-adherence compromises treatment efficacy, and without accurate treatment efficacy data, adherence rates necessary for planning and project evaluation cannot be achieved [3]. Further, accurate measurement of adherence is required for effective and efficient treatment planning, and for ensuring that changes in health outcomes can be attributed to recommended regimens. In addition, decisions to change recommendations, medications, and communication style in order to promote patient participation depend on valid and reliable measurement of the adherence construct [3].
Medication adherence has been measured using several methods, including: direct measures, measures involving secondary database analysis, measures involving electronic medication packaging (EMP) devices, pill count and measures involving clinician assessments and self-report [12]. However, there is no "gold standard" for measurement of adherence, and each method has advantages and disadvantages [12,13]. For example, the WHO reported that there are challenges in measurement of the adherence construct even when more objective methods are used [3]. The report cited challenges including: counting inaccuracies using the "remaining dosage units" method; the inability to capture important information such as timing of dosage and pattern of missed dosage; the high cost of medication event monitoring systems (MEMS); the inability to tell whether patients actually use their medicine when they are removed from the bottle; difficulties faced when an individual acquires medication at multiple pharmacies; and inaccurate and incomplete records using the prescription refills method [3].
Self-reports include measures such as patient-kept diaries, patient interviews and questionnaires and scales; they tend to overestimate adherence behavior compared with The SMAQ is one of the self-report questionnaires which is increasingly used globally to assess adherence to ART and non-HIV-related medications [15]. It was developed and validated among a sample of predominantly male (72%) HIV-positive individuals in Spain [15], with 72 percent sensitivity, 91 percent specificity, and a likelihood ratio of 7.9 in identifying nonadherent patients as compared to medication event monitoring systems, the authors concluded that the SMAQ was reliable and valid for assessment of adherence among HIV-infected patients in most settings [15]. It has been used to assess adherence to ART in at least 12 countries, including South Africa and Kenya, in at least 25 studies and interventions between 2002-2018 [16, 17, 18, 19, 20, 21, 22, 23, 24, 25]. It has also been used to assess adherence to non-HIV medication in at least eight countries and 12 studies [26,27,28,29,30,31,32,33,34,35].
According to the WHO, standardized multi-item scales such as SMAQ that assess specific behaviors relating to medication recommendations may be better predictors of adherence than simple yes/no responses [3]. The underlying logic is that each indicator when used on its own may be insufficient to capture the construct, but when these indicators are combined, they represent a valid composite measure of the underlying construct of interest [36]. While standardized scales have potential advantages in understanding perceptions about adherence, literature assessing MI of different scales in diverse settings is sparse. In addition, standardized scales are often used with populations that may be quite different from the one in which the scales were originally validated [36].
Also, there is a natural desire to make group comparisons and conclusions about effects of interventions on the mean scale scores of expected patient outcomes [37]. However, such comparisons are justified only to the extent that these comparisons approximate differences of means on the theoretical true score of the relevant constructs, and when the means are generated from data collected using questionnaires and scales exhibiting acceptable levels of reliability and validity [13,37]. Further, even when standardized scales are used, inferences and conclusions about observed mean differences are dependent on the between-group equivalence of the underlying measurement model [37].
However, an investigator's ability to assess true differences between groups or across time can be hindered by measurement errors, which can limit the ability to make accurate meaningful comparisons when determining program impacts [36].
Measurement invariance (MI) is a statistical criterion that is used to assess the extent to which a standardized scale measures the same construct in each group and at each time point studied [37]. It provides a way to assess whether respondents interpreted measures conceptually similarly across groups and time and whether participation in an intervention altered the conceptual frame of reference against which a group responded to an indicator over time [36]. MI requires that any two persons with the same level of the latent construct should obtain the same expected score on the indicators used to measure the underlying construct, regardless of the group they are in [38]. Assessment of MI helps in determining if a scale functions equivalently for all groups defined by factors such as gender, age, education, mother tongue, socioeconomic status, regional background, among others [38]. Demonstrating that a scale has MI allows an investigator to make valid comparison of construct scores such as means that yield meaningful interpretations and substantive inferences [39].
Despite increasing frequency of use of the SMAQ in assessment of adherence to antiretroviral therapy, to date no study has assessed its MI and other psychometric properties such as reliability and validity in sub-Saharan Africa. Using data from a quasiexperimental evaluation of a HIV/AIDS intervention among HIV-positive women of reproductive age in Ethiopia, hereinafter referred to as the parent study [40,41], this paper assesses the internal consistency reliability, concurrent and factorial validity, and MI for the SMAQ in this setting. These analyses build upon the parent study and add to the sparse literature about the validity of SMAQ as a HIV/AIDS treatment adherence measure.

Parent Study
Data for this paper were obtained from a parent study conducted by MEASURE Evaluation [1]   The MEASURE Evaluation team enrolled clients, using random selection, from one large home-based care service provider that operated in both sites [40,41]. And, "6. Over the past three months, how many days have you not taken any medicine at all? Adherence was scored as a "no" response to questions 1, 2, 3 and 5, zero response for question 4 and any response less than 2 for question 6. The six questions/items constituted the unidimensional model for measurement of adherence. The six questions assess three components of adherence to ART: intentional (question three), unintentional (questions one and two) and frequency or quantity (questions 4, 5 and 6). Intentional non-adherence refers to when a patient deliberately decides not to take their medication because of various reasons, for example feeling worse. Whereas unintentional nonadherence occurs when a patient wishes to adhere to medication but is prevented by some reason, for example, forgetfulness [42]. Questions four to six assess various aspects of frequency of non-adherence. An experienced Amharic-English speaker translated the questionnaire and then it was back-translated by an Ethiopian survey coordinator. R e l i a b i l i t y a n d v a l i d i t y Prior to conducting MI tests, we assessed SMAQ's reliability and validity in a Ethiopian context. Reliability denotes the ability of a scale to produce consistent results when completed under similar conditions, whereas validity denotes the extent to which a scale measures the construct it is supposed to. Reliability is analogous to the scale's precision, whereas validity is analogous to its accuracy.

Internal consistency reliability and concurrent and factorial validity
We conducted an internal consistency reliability test of the SAMQ data from Ethiopia using Cronbach's alpha (α). This index measures the homogeneity of both items and the construct being measured [43,44]; in this case, how closely-related the six items of the SMAQ were as a set in measuring the adherence construct. A high value of α (α>0.6) is generally accepted as adequate evidence that the items measure an underlying or latent construct [45]. We used Pearson product-moment correlation coefficients (r) to assess concurrent validity of the domain scores at T1 and T2. In this context, concurrent validity represents the extent to which item scores at T1 related to those of the same scale administered to women at T2 [46]. Criteria for concurrent validity were based on directionality of expected relationships of the six items between the two times and strength of the observed correlation coefficient. The Pearson product-moment correlation coefficient has a range of -1 to +1 between two sets of scores, and coefficients close to 1 in absolute value indicate high concurrent validity [46]. Based on thresholds from previous studies, correlation coefficients less than or equal to 0.25 suggest a weak relationship, those between 0.25 and 0.50, a moderate relationship, those between 0.50 and 0.75, strong relationship; and values greater than 0.75, very strong correlation [47,48]. We measured adherence with six factor indicators corresponding to the six SMAQ items. In Figures 1-4, the latent factor of adherence is represented by the circular shape.
The arrows represent factor loadings, which are direct effects of each adherence indicator on the latent construct of adherence. We report summary statistics, factor loadings and model fit indices for specific models including chi-square values, root mean square error of approximation (RMSEA) values, comparative fit indices (CFI)/Tucker-Lewis indices and the final estimated measurement models. A significant chi-square test indicates a poor model fit, but this may also be due to moderate discrepancies in normality of data and large (n>200) sample size [50]. Therefore, we used other model fit indices to supplement the chi-square test in determining the model that best fit the data. The RMSEA is a measure of the estimated discrepancy between the population and model implied covariance matrices per degree of freedom [37]. Values of RMSEA less than 0.05 indicate close model fit whereas values up to 0.08 reflect adequate fit. The CFI varies from 0 to 1, representing extremely weak to perfect fit, respectively, and a value of 0.95 is considered to represent adequate fit [37].
M e a s u r e m e n t i n v a r i a n c e t e s t Measurement invariance testing is based on the overall assumption that comparison between groups is important, and the presence or absence of differences between groups has some meaningful implications [39]. We tested for the levels of invariance based upon the following assumptions: (1) the measure of interest, that is adherence to ART, was perceptually based; adherence comprises multiple manifest indicators (i.e., multiple items of the SMAQ); (2) the six items of SMAQ are combined additively to operationalize the underlying construct; (3) evidence exists of the SMAQ's psychometric soundness beyond the preliminary stages of scale development, i.e. evidence exists of the SMAQ's psychometric soundness in a Spanish sample, but it has yet to be demonstrated for Ethiopia or other sub-Saharan Africa location; (4) the four study groups are independent of each other: T1 control, T1 intervention, T2 control and T2 intervention; and (5) the common factor model for describing relationships among items of the SMAQ holds across groups [39].
Following the independent groups assumption, we applied a multiple-group confirmatory factor analysis (CFA) to test three levels of invariance: configural, weak factorial and strong factorial [39]. Multiple-group CFA allowed us to simultaneously test four group-specific latent adherence factor models using robust weighted least squares (WLSMV). We fit models for each group/time and evaluated sample differences with a chisquare test. We used WLSMV to conduct chi-square difference testing because adherence indicators were categorical and non-normally distributed. A significant chi-square difference value indicated that constraining the parameters of the nested model significantly worsened the fit of the model, which indicated measurement non-invariance, thereby sustaining the unconstrained or less constrained model. A non-significant chisquare difference indicated that constraining the parameters of the nested model did not significantly worsen the fit of the model, which indicated MI of the parameters constrained to be equal in the nested model. We did not estimate the next restrictive model if the result was significant, as it suggested that the next level of parameter restriction would have significant differences with the previous model. We used MPlus 7 [51] and Stata 12 [52] to conduct data analysis. Additional details of steps for invariance testing can be found in Appendix 2.   Table 2. Initial assessment of correlations between the six items ranged from -0.09 to 0.95 (see Table 3). Question five "did you not take any of your medicine over the last weekend" was not significantly correlated (correlation coefficient = -0.09), with question three "sometimes if you feel worse, do you stop taking your medicines?" in the T1 intervention group. This was not expected, as all indicators of a construct are expected to have significant positive correlations with each other. Due to the negative correlation coefficient and its nonsignificance, we considered removing question five from our analysis, but sensitivity analysis with and without this item showed no differences in results for measurement invariance tests. Therefore, we included it in order to present results for the full SMAQ scale as it was originally designed and validated in Spain.   Table 4). The good model fit indices and significant factor loadings indicate factorial validity (See Tables 5 and 6). (90% CI 0.04-0.08)), suggesting that constraining factor loadings, intercepts and factor variances improved model fit in the strong factorial model, compared with the configural and weak factorial models (see Table 5). Factor loadings in the final strong factorial model were all statistically significant (p<0.05) and ranged from 0.26 to 1.18 (see Table 6).
Positive and significant factor loadings suggest that the construct of adherence significantly and positively influenced all the measures generated by the six items of SMAQ.
M e a s u r e m e n t i n v a r i a n c e t e s t A chi-square difference test between configural and weak factorial models was significant (chi-square difference = 34.79 (DF=15) p<0.01). Other model fit indices were comparable between the two models, which suggested that the weak factorial model had a better fit for the data. The next chi-square test between the weak and strong factorial models found no significant difference (chi-square difference = 13.36 (DF=15) p=0.57). In addition, the RMSEA statistic reduced by 0.01 to 0.06 and other model fit indices were comparable with those of the weak factorial invariance model. Therefore, the strong factorial invariance model had the best fit for the data and was accepted as the final model (see Tables 5 and 6). The final estimated measurement models for the strong factorial invariance are presented in Figs 1-4. Factor loading estimates for the models are shown in Table 6.     In addition, we documented strong factorial invariance across the four independent study groups, suggesting that the SMAQ questions/items were being interpreted in an equivalent manner across groups. This finding suggests that the SMAQ performs equally well across samples and operationalizes group-specific differences in an invariant manner across time points. An important implication of this finding is that adherence scores obtained using SMAQ from the four study groups can be compared preand post-intervention for policy or intervention purposes [37]. Taken together, these findings affirm that the six-item SMAQ is a valid measure of adherence to ART in this sample of women with HIV/AIDS in Ethiopia. Our findings add confidence for researchers and interventionists interested in using the SMAQ to assess adherence to ART in this setting.

Figures for strong measurement invariance testing
We found one negative but nonsignificant correlation between the six indicators of the SMAQ suggesting that a five-item version might be more efficient [53]. However, our findings showed no differences in measurement invariance tests when question five was included or excluded. It is possible that the lack of correlation was caused by a data entry error, but we were unable to verify this possibility. More likely, it was due to the magnitude of question five's correlation with question three being too small to impact the results. In addition, question five was strongly and positively correlated with other items of the scale, and all its factor loadings were positive and significant.  [7,64,65]. Improving adherence may be challenging or impossible without our ability to measure it reliably, validly and consistently across groups of individuals, which makes efforts to improve measurement methods and tools an important contribution for public health. A study of strategies to improve adherence to ART in low-resource settings reported that adherence measurement was required for optimal targeting and tailoring of interventions [66]. The present study moves the field forward by presenting reliability, validity and invariance test statistics for SMAQ from a sub-Saharan Africa setting where such HIV research is scant, yet the burden of disease and potential need for such measurement is greatest as sub-Saharan Africa bears the greatest HIV/AIDS burden. According to the WHO, nearly one in every 25 adults is living with HIV in Africa, accounting for nearly two-thirds of the global total [67]. Providing evidentiary measurement properties for SMAQ increases practitioners' confidence in using SMAQ, which increases its adoption in assessment of adherence.
Although we found strong invariance for the six items of the SMAQ, it is worthwhile to note that adherence is a dynamic behavior which may change over time, even without intervention. Thus, invariance can be expected for SMAQ items that assess intentional non-adherence across time, such as question three of the SMAQ: "Sometimes if you feel worse, do you stop taking your medicines?", because such items are embedded in a patient's beliefs and self-construct and therefore, are more robust to behavior change.
Conversely, the SMAQ also has a component of unintentional non-adherence due to forgetfulness, assessed by questions one and two: "Do you forget to take your medicine?" and "Are you careless at times about taking your medicine"? The Unintentional nonadherence component may be prone to random variability, which may not be captured by invariance testing of the six items of the SMAQ together, but by testing invariance for each item using longitudinal data. Thus, attribution of changes in adherence to specific components of the SMAQ as intentional or unintentional was not possible in the present study, because of the independent cross-section design. This is an important area for future studies in which researchers may be able to identify modifiable items of nonadherence measured by the SMAQ so as to appropriately intervene to improve adherence, as was demonstrated by Mora et al. (2011) in their assessment of non-adherence among asthma patients using the Medication Adherence Report Scale (MARS-A10) [68].
Several limitations of our study should be noted. Although we treated the samples as independent, they may not be truly independent because some participants may have participated at both T1 and T2 interviews. This limitation may manifest in repeated questions where social desirability bias is also a limitation. However, the cross-sectional design of the parent study mitigated this tendency. The design limited the use of multilevel multigroup CFA, as suggested by Kim and colleagues [69]. Ethical considerations and operational logistics were also considered in the design. The taxonomy for describing adherence to medications now suggests that results from baseline and follow-up can only be compared if the patient was already on treatment at least 3 months prior to baseline [70]. However, the taxonomy was not in place at the time of data collection. Challenges associated with diagnosis and treatment initiation records in the study settings would also limit application of the taxonomy. Further, the lack of data on clinical methods of measuring adherence-such as a HIV RNA test (a test that checks for RNA genetic material from the virus in a sample of blood) [71,72] or CD4 counts (the number of CD4 T lymphocytes -a type of white blood cells --in a sample of blood, which is used to monitor an individual's response to ART) [73]-limited our ability to assess the predictive validity of the SMAQ with these data. This is an important agenda for future research.

Conclusions
This is the first study to assess reliability, validity and measurement invariance of SMAQ in the sub-Saharan Africa region, using pre-and post-intervention data from two treatment referral networks. The findings show that the SMAQ is sufficiently reliable and valid to be used for HIV-positive Ethiopian women of reproductive age who are on ART

Competing interests
The authors declare no conflict of interest.

Funding
The primary data which were used to produce this manuscript were from a quasiexperimental study funded by the United States Agency for International Development The design, data analysis, interpretation of data and writing of the manuscript was not funded by any agency or organization. The authors received no funding to produce this manuscript as it is part of the first author's dissertation research.
Authors' contributions CBA conceptualized the study, analyzed data, wrote the manuscript.
JCT and HWR designed the parent study and collected the data.
BJF supervised and provided feedback for conceptualization, analysis and writing of manuscript.
CZ reviewed methodology and analysis.
JM reviewed methodology and technical writing.
KHL and KW reviewed content and edited the manuscript.
All authors reviewed and edited the manuscript.         K. Ganasegeran and A. Rashid, "The prevalence of medication nonadherence in post-myocardial infarction survivors and its perceived barriers and psychological correlates: a cross-sectional study in a cardiac health facility in Malaysia," Patient Preference and Adherence, p. 11:1975-1985, 2017. Unidimensional measurement model for adherence with mean adherence score, factor loadings and measurement errors for T1 control group Unidimensional measurement model for adherence with mean adherence score, factor loadings and measurement errors for T1 intervention group Unidimensional measurement model for adherence with mean adherence score, factor loadings and measurement errors for T2 control group Unidimensional measurement model for adherence with mean adherence score, factor loadings and measurement errors for T2 intervention group

Supplementary Files
This is a list of supplementary files associated with the primary manuscript. Click to download.