Skip to main content

Cross-model validation of public health microsimulation models; comparing two models on estimated effects of a weight management intervention

Abstract

Background

Health economic modelling indicates that referral to a behavioural weight management programme is cost saving and generates QALY gains compared with a brief intervention. The aim of this study was to conduct a cross-model validation comparing outcomes from this cost-effectiveness analysis to those of a comparator model, to understand how differences in model structure contribute to outcomes.

Methods

The outcomes produced by two models, the School for Public Health Research diabetes prevention (SPHR) and Health Checks (HC) models, were compared for three weight-management programme strategies; Weight Watchers (WW) for 12 weeks, WW for 52 weeks, and a brief intervention, and a simulated no intervention scenario. Model inputs were standardised, and iterative adjustments were made to each model to identify drivers of differences in key outcomes.

Results

The total QALYs estimated by the HC model were higher in all treatment groups than those estimated by the SPHR model, and there was a large difference in incremental QALYs between the models. SPHR simulated greater QALY gains for 12-week WW and 52-week WW relative to the Brief Intervention. Comparisons across socioeconomic groups found a stronger socioeconomic gradient in the SPHR model. Removing the impact of treatment on HbA1c from the SPHR model, running both models only with the conditions that the models have in common and, to a lesser extent, changing the data used to estimate risk factor trajectories, resulted in more consistent model outcomes.

Conclusions

The key driver of difference between the models was the inclusion of extra evidence-based detail in SPHR on the impacts of treatments on HbA1c. The conclusions were less sensitive to the dataset used to inform the risk factor trajectories. These findings strengthen the original cost-effectiveness analyses of the weight management interventions and provide an increased understanding of what is structurally important in the models.

Peer Review reports

Background

Behavioural weight management programmes have been shown to result in weight loss and reductions in glycaemia [1,2,3]. The WRAP trial showed that referral to a commercial open-group behavioural programme (WW, formerly Weight Watchers) for 12 weeks or 52 weeks resulted in greater weight loss during a 2-year follow-up than a brief intervention (a booklet of self-help weight-management strategies) [1]. At the five-year follow-up of the trial, there was no significant difference in weight between the groups, but there was some weight loss maintenance [4]. Health economic modelling over the lifetime estimated that the 12-week and 52-week programs were cost saving and generated QALY gains compared with the brief intervention. Relative to natural history, the 52-week programme generated greater benefits than the 12-week programme. To improve model credibility and align with the International Society for Pharmacoeconomics and Outcomes Research–Society for Medical Decision Making (ISPOR-SMDM) international best practice recommendations, cross-model validation is needed [5].

Cross-model validation, also known as cross validation, comparative modelling or convergent validation, is the process of simulating the same decision problem with two or more models and comparing the predictions and outcomes [6] and is recommended by modelling good research practices and guidelines [5, 7, 8] Existing guidance is summarised in Supplementary Material, Table S1. Although it cannot be used as evidence that the model predicts outcomes accurately, it can give more confidence in the outcomes and credibility to the model if different models result in similar outcomes or the same decision. To maximise the value of the cross-model validation, models should be developed independently, and modellers should collaborate in the comparison process [5]. Cross-model validation is particularly important for validating the long-term impact of an intervention. External validation (comparisons to historical event data) is a useful form of validation for health trajectories estimated in the absence of an intervention (i.e., standard care). However, novel intervention outcomes cannot be compared with historical data. Furthermore, it can be challenging to find, and access, representative datasets to validate model outcomes against, particularly for utilities (a measure of the preference or value assigned to a particular health state), quality-adjusted life years (QALYs) and aggregated costs. Cross-model validation of the long-term impact of the treatment is likely to be particularly important for public health interventions in which the long-term (and even short-term) impacts of an intervention on health are unknown [9].

Although cross-model validation is recommended, there is no clear guidance on how to carry out the validation and there is variation in how it is conducted [10]. Few studies provide a detailed explanation of the validation framework used, and there was variation in reporting [6]. Examples of cross-model validation most commonly include a comparison of model outcomes against previously published results [11]. In many published cross-model validations, individual teams run scenarios on their models, and the results are compared [8, 12,13,14,15,16]. Although this enables many models to be compared at once with multiple scenarios, it can be more difficult to establish how model structure impacts these differences. Each team has a less detailed understanding of other models, and the process requires extensive time commitment from all participants, which can act as a barrier to engagement. In one example, the authors had access to both models that had been developed independently [6]. Expansion in open-source modelling provides opportunities to develop cross-model validation processes to benefit decision-makers and modellers.

Comparisons of public health models that estimate the impact of behavioural interventions are challenging. When comparing public health models, a single intervention may impact several risk factors and the risk of many different health outcomes, which may result in large differences between model structures and assumptions even when examining a single health behaviour (e.g., eating healthily) [9]. Furthermore, microsimulation models are often used for evaluating behavioural interventions because they allow flexibility in structure and therefore can be used to represent the various pathways from intervention to risk factors to outcomes. This may contribute to greater variation between model structures. Observations from a cross-model validation of public health interventions suggest that there are large variations in costs per QALY between models [10]. Differences in model structure are likely to make differences in model outcomes difficult to interpret and risk reducing trust in findings for decision-makers. This highlights the need for publicly available detailed cross-model validations of public health models; it enables greater understanding of the impact of model structure on outcomes for the decision problem of interest but also because it can inform the development of existing and new models.

The aim of this study was to conduct a cross-model validation to compare estimations of long-term effectiveness generated by the SPHR model, as part of an existing cost-effectiveness analysis, with those of a comparator model that also quantified long-term effectiveness of weight management, to help understand how differences in model structure contribute to outcomes.

Method

Decision problem

The extended and standard duration weight-loss programme referrals for adults in primary care (WRAP) trial was a randomised controlled trial in which participants were assigned to a brief intervention (brief advice and self-help materials) or a 12- or 52-week weight-management programme (WW) [4]. The 5-year follow-up was registered with Current Controlled Trials (ISRCTN64986150) on 01/02/2018. Participants’ weight was measured at baseline and at 3, 12, 24 and 60 months, and cholesterol HbA1c and blood pressure were measured at baseline and at 12 and 60 months. The decision problem was to determine whether the 12- and 52-week interventions were effective over the lifetime at reducing the incidence of disease and increasing QALYs for adults with a BMI ≥ 28 compared to a brief intervention and simulated no intervention scenario.

Model comparison

Models compared

The Health Checks model was chosen for comparison with the SPHR model. The model code was freely available allowing for an in depth understanding of the models, and the authors of the original model code agreed to support the model comparison throughout the process.

School for Public Health diabetes prevention model

The School for Public Health Research (SPHR) diabetes prevention model is a microsimulation health economic model that describes individuals’ risk of type 2 diabetes, microvascular outcomes, cardiovascular disease (CVD), congestive heart failure, cancer, osteoarthritis, depression, dementia, and mortality in England. The model includes correlated trajectories of risk factors, including body mass index (BMI), systolic blood pressure, cholesterol and blood glucose, and annual changes in these risk factors impact the risk of health conditions in the model. Benefits are measured in QALYs, and the model uses a National Health Service (NHS)/personal social services perspective. The model was developed in the R software to examine the cost-effectiveness of type 2 diabetes prevention programmes; detailed information about the model can be found in a prior studies [17, 18] and in Table S1.

Health Checks model

The HC microsimulation model was developed to examine the impact of the NHS Health Checks cardiovascular disease prevention programme in England on the risk of ischaemic heart disease, stroke, dementia, and lung cancer. The health checks intervention involved eligible simulated patients being invited for a health check, potentially followed by referral for one or more statin medications, antihypertensive medication, smoking cessation, and/or weight management depending on health status. This referral had the potential to impact the trajectories of risk factors in the model, including BMI, blood pressure, smoking status, and cholesterol. Benefits are measured in QALYs. No costs were included. Detailed information about the model, which was programmed in Python, can be found in a prior study [19] and in Table S1.

The models were compared qualitatively to identify structural differences (Table S1 of the supplementary material). Existing publications, technical descriptions and the model code were reviewed to document the model structures. Developers of both models contributed to the protocol and identified methodological challenges to the validation process.

Standardising model setup

Models were standardised to ensure that model setup were the same across models. Supplementary material A and Table S3 provides a summary of the standardisation steps. In contrast to previous cross-model validations [6], we found that using a common set of parameter inputs was not feasible, as both models used multivariate risk equations that are conditional on multiple characteristics, and different methodologies for applying utility decrements.

The HC model was adapted in the following ways to reflect the model setup and baseline population characteristics used in the SPHR model.

  1. a.

    Baseline population was HSE 2014 (instead of 2012).

  2. b.

    Discount rate of 3.5% applied (instead of no discount).

  3. c.

    Baseline utility from HSE 2014 (instead of 1).

All standardisation steps were made to the HC model. Each standardisation step was conducted independently, and then all steps were applied simultaneously.

Outcomes compared

To understand the differences between the models we conducted an in-depth assessment of consistency across multiple model outcomes, including uncertainty analyses and sub-group analyses. We examine include outcomes that will impact decision-makers (QALYs), we also examined aggregate outcomes, incremental values, intermediate outcomes which include diagnosis of health conditions and outcomes for relevant subgroups. Previous cross-model validation examples [6] have suggested that an acceptable magnitude of difference between model outcomes should be identified in advance of the model comparison. However, in the absence of cost estimates it was not possible to calculate the acceptable magnitude of differences based on decision impacts.

Identifying aspects of model structure that explain differences in outcomes

We identified some adaptations that could be made to each model to make the model structures more similar. Because of the complexity of both models, it was impractical to change every aspect of one model incrementally to become like the other model and so we focussed on (a) the aspects of the model could be feasibly modified and (b) what we expected to make a difference to the outcomes of interest. Running iterations of the models in which, the model structures are made more similar can increase our understanding of how differences in model structure contribute to the differences in model outcomes. The options for model adaptations were pre-specified in the protocol. Of the pre-specified model adaptations specified in the protocol (Table S5 of the supplementary material) three model adaptations were selected in which the SPHR and/or the HC model were altered such that the models were more similar, and 4 scenarios were explored:

  1. A.

    Remove impact of treatment on HbA1c from SPHR. The HC model was not designed to model the impact of changing HbA1c and so, the HBA1c trajectory did not impact on risk of diabetes or cardiovascular disease as is the case in the SPHR model. Therefore, the SPHR model was run without modelling the impact of the treatment on HbA1c trajectories to replicate the HC model structure.

  2. B.

    Same health conditions. In addition to changes A, the models were adapted so that they had the same health conditions. Both models were run with only the conditions that the models had in common (diabetes, dementia, and CVD) removing microvascular complications, osteoarthritis and cancer from the SPHR model and lung cancer from the HC model. Other-cause mortality was not adjusted. Although this will result in slightly inflated life year estimates, adjusting mortality would have required a large model adjustment and the expected impact on mortality would be very small. Given the adjustment required and the small expected effect, we decided to compare model results without a full mortality adjustment.

  3. C.

    Use ELSA trajectories in the SPHR model. The SPHR model was run with the trajectories of risk factors based on an analysis of the ELSA dataset. This differs from the base case in which trajectories from age 18 to 59 were based on analysis of the Whitehall II data set with trajectories for age 60 and over were based on ELSA. The modelling methods for using the ELSA data remained different.

  4. D.

    All structural changes applied. Sensitivity analyses A, B and C were applied together to explore the impact of all adjustments combined.

Results

Impact of standardising the model setup

Three changes were made to the HC to reflect the model step-up used in the SPHR model.

  1. a.

    Baseline population was HSE 2014 (instead of 2012).

  2. b.

    Discount rate of 3.5% applied (instead of no discount).

  3. c.

    Baseline utility from HSE 2014 (instead of 1).

The results of each change to the model, then of all adaptations were applied simultaneously are shown in Table 1 alongside outcomes generated by the SPHR model. Discounting had the largest independent impact on absolute QALYs. All following model comparisons were conducted with all standardisation steps applied.

Table 1 Absolute and incremental QALYs generate by the SPHR and health checks model including standardisation steps

Comparison of model outcomes

Table 2 shows the estimated QALYs and health conditions (per 1000) when no treatment effect is applied (for the original models, and for the HC model with standardisation steps applied). There were fewer strokes and CVD events and fewer cases of dementia in the HC model. The cases of dementia provided the largest difference in simulated health outcomes between the models. Between the original HC model and the standardised version, there are small differences in the number of events due to differences in the baseline population. The utility decrement for each condition is also shown in the table. The direct comparison of utility decrements is challenging because the HC model applies the decrement additively, whereas the SPHR model uses a multiplicative model. However, the health decrements in the SPHR model will have a greater impact on QALYs in an otherwise healthy population.

Table 2 QALYs and cases of and utility decrements associated with cardiovascular disease, dementia, and diabetes

Table 3 shows the absolute and incremental QALYs for each simulated intervention group and natural history control. The total QALYs estimated by the HC model were higher in all treatment groups with no overlap in PSA credible intervals (Table 3). There was a large difference in incremental QALYs, and the credible intervals for the SPHR model overlapped the HC model. The SPHR simulated greater QALY gains for 12-week WW and 52-week WW relative to the Brief Intervention. Comparisons across socioeconomic groups found a stronger socioeconomic gradient in the SPHR model, and the results are reported in Table S6 in Supplementary Appendix B.

Table 3 Absolute QALYs and incremental QALYs versus simulated natural history in the SPHR and HC models

Impact of iterative adjustments to identify drivers of differences in outcomes

Remove the impact of treatment on HbA1c from the SPHR

Making this adjustment resulted in more similar incremental QALYs between the two models, but the SPHR model still predicted a greater difference between the brief intervention and the 12-week and 52-week interventions than the HC model (Fig. 1, Table S7 of the supplementary material).

Fig. 1
figure 1

Incremental QALYs between each treatment scenario (brief intervention, 12-week intervention and 52-week intervention) and simulated no intervention scenario for each model (HC: Health Checks Model and SPHR: School for Public Health diabetes prevention model) for the main analysis and for four scenario where adjustments have been made to one or both models (A. impact of treatment on HbA1c removed from the SPHR model, B. Models run only with the condition represented in both models (diabetes, dementia, and CVD) plus scenario A, C. Estimate trajectories of risk factors based on ELSA dataset in the SPHR model, and D. Scenario B and C are combined so that all structural changes described were applied)

Same health conditions

The analysis was run on models adapted to include only the conditions that the two models have in common (diabetes, dementia, and CVD) removing microvascular complications, congestive heart failure, osteoarthritis and cancer from the SPHR model and lung cancer from the HC model. This was combined with the previous adaptation (i.e., the treatment effect on HbA1c was removed from the SPHR model). Absolute and incremental QALYs became more similar across models (Fig. 1, Table S7 of the supplementary material).

Use ELSA trajectories in the SPHR model

When the SPHR model was run with the trajectories of risk factors based on an analysis of the ELSA dataset rather than the Whitehall II dataset, there were very small changes to the incremental QALYs. There was a negative incremental QALY when comparing the brief intervention to natural history. This negative incremental QALY for the brief intervention was due to higher simulated BMI in the brief intervention arm compared with the simulated natural history using the ELSA trajectories. (Fig. 1, Table S7 of the supplementary material).

All structural changes applied

Sensitivity analyses A, B and C were applied together. As above, the brief intervention arm produces a higher BMI over time than the simulated natural history, which leads to a loss of QALYs in the brief intervention arm. (Fig. 1, Table S7 of the supplementary material).

Figure 1. Incremental QALYs compared to the simulated no intervention arm.

Discussion

Estimates of long-term effectiveness of the weight management programme were similar using the SPHR and HC models after adjusting for explainable differences, namely, removing effects of weight management on HbA1c, limiting the models to a common set of health outcomes, and using a common dataset to derive metabolic trajectories. Since the SPHR model includes stronger evidence on the effect of weight management on HbA1c, the results of this cross-model validation support the existing cost-effectiveness analysis [4] and provide additional confidence in those estimates.

These findings reflect the suitbaility of the SPHR model for the deicision problem; the models original purpose, to evaluate diabetes prevetion strategies, resulted in a focus on diabetes-related risk factors (including HbA1c) and outcomes. There was a signifincant change in HbA1c in the intervention evaluated and the SPHR model enabled long-term modelling of this effect. In contrast, the HC model was developed to represent the impact of health checks for adults between the age of 40 and 74 and this health check could result in upto 4 treatments; statins, antihypertensives, smoking cession and weight management. This is reflected in the outcomes modelled (e.g., lung cancer) and diabetes was not a focus of the model. Therefore, one of the impacts (change in HbA1c) of the intervention could not be modelled in the HC model.

The process of cross-model validation resulted in a deeper understanding of the characteristics and sensitivities in the models. By identifying the features of the model that explain differences, it is possible to reflect on whether they should be included in the final analysis. The process of cross-model validation we conducted highlighted that the inclusion of HbA1c was the most important driver of differences in incremental QALYs; and that the inclusion of diabetes-related complications was most important for differences in total QALYs. These observations highlight the importance of modelling the interdependence between obesity and diabetes outcomes. The model comparison process can also inform future decisions around the development of both the existing models and new models. We conclude that cost-effectiveness analyses of weight management interventions should consider representing the intervention effects on HbA1c to avoid potentially underestimating the cost-effectiveness.

This analysis has several strengths. First, collaboration between model developers throughout the process enabled greater understanding of the models, enhanced error checking, discussion of results and setting priorities for comparison. This collaboration, along with access to all model code, facilitated a model comparison process where models can be incrementally adapted to isolate and understand drivers of differences in outcomes. Increase in the use of open source health economic models will support cross-model validations [20] as there will be more models for which full model code is available. Second, we have demonstrated how a process of cross-model validation can be used to compare complex public health microsimulation models. Existing validation methods and recommendations [6, 10] are not always practical for all model types. For example, the potential for differences in model boundaries, model structures, and specification of input parameters increases as the number of health outcomes increases, and public health models are more likely to adopt complex model structures [21]. This makes it difficult to draw conclusions about what is an acceptable difference between model outcomes, and there are likely to be a wide range of outcomes of interest, including prevalence of health conditions, costs and QALYs, and these may differ between models. Finally, the in-depth comparison can be used to inform future model development. Squires et al. recommend that when developing and justifying the structure of a new model, any existing health economic models should be reviewed [22]. Cross-model validation has the potential to provide information for this stage of development, but it is often reported very briefly as part of model development if at all, and there is a missed opportunity for this to inform future model development. A greater emphasis on reporting cross-model validation in detail, either as a separate publication or within supplementary material, can enable the results to have an impact beyond the decision problem. Our method can be a starting point that can be used and tested in other model comparisons.

While conducting the model comparison, we identified challenges that are likely to be common to cross-model validation, so we discuss the justification for the approach taken. First, each model was developed for a different purpose and then adapted to examine the long-term effectiveness of the weight management intervention. They were therefore structurally different, which made it challenging to decide on the extent to which the models can be standardised, and which changes to make when conducting iterative adjustment to the model. Broadly, in standardisation we intended to ensure that the model setup is consistent in terms of the eligible population, time horizon, discount rate, and perspective. This deviates from other cross-model validation approaches that aim to align model inputs prior to examining structural differences [6]. We found that using a common set of parameter inputs was not feasible, particularly for multivariate risk equations that are conditional on multiple characteristics and different methodologies for applying utility decrements. These differences are more likely to arise in microsimulation models, which allow for complex relationships to be represented. We then intended to interrogate the structural assumptions and data sources informing long-term health outcomes. Although there were multiple potential iterations, we could have conducted, resource constraints meant that our choice of which changes to make was informed by the outcomes of the standardisation steps, the initial model comparison, and discussions with the model developers.

Second, it was difficult to decide which of the multiple potential outcomes to examine and report. Public health economic evaluations often need to consider a broad range of outcomes, time horizons, and cost perspectives [22]. As a consequence, we decided that the validation process should aim to make comparisons across outcomes that will inform the validation process or impact decision-making. In public health microsimulation models, this may include subgroup analyses, as these results may impact conclusions regarding equity impacts for an intervention. Detailed cross-model validation that provides an in-depth assessment of consistency across multiple model outcomes, including uncertainty analyses and subgroup analyses, has the potential to provide a much greater understanding of the differences between the models, as demonstrated. While the validation should include outcomes that will impact decision-makers (costs and QALYs), it should also extend to include other intermediate outcomes to help understand what may be driving the differences. However, steps taken to understand drivers of difference may involve changing isolated factors of the model and caution should be taken when interpreting these results. For example when we examined the impact of including the same conditions only by removing health conditions from each model, we made the decision not to adjust other-cause mortality due to the extent of structural changes required. These results inform our understanding of the drivers of differences between the models but the outcomes of this scenario could not be used to inform decision-making. Caution should be taken when making changes to individual parts of the model to consider the unintended consequences on other outcomes. Finally, previous cross-model validation examples [6] have suggested that an acceptable magnitude of difference between model outcomes should be identified in advance of the model comparison. However, in the absence of cost estimates, it was not possible to calculate the acceptable magnitude of differences based on decision impacts.

While conducting the cross-model validation, we developed some proposed recommendations (detailed in Supplementary Material, Appendix D). However, the current cross-model validation is based on a comparison of two broadly similar microsimulation models, which provides a useful but limited overview of opportunities and challenges for model comparison. A different set of challenges may arise if validating a detailed microsimulation model against a different modelling structure, for example, a multistate lifetable model structure often used in public health evaluations. It is possible that future cross-validation will build on this, and other existing detailed model comparisons, to help to develop and refine a set of recommendations that can be formalised into a standardised best practice guide for modellers.

Conclusions

The cross-model validation process supported the findings of the initial cost-effectiveness analysis of this weight management programme and identified which structural differences contributed most to differences in outcomes. The process we have reported addresses the need for more examples of cross-validation approaches for public health economic models and is a preliminary step in the development of cross-model validation guidelines specific to complex interventions.

Data availability

No new datasets were generated or analysed during the current study. The WRAP trial data are not publicly available. Participant consent allows for data to be shared for future analyses with appropriate ethical approval. Non-identifiable data and analysis code can be made available to bona-fide researchers on submission of a reasonable request to datasharing@mrc-epid.cam.ac.uk. The principles and processes for accessing and sharing data are outlined on the UK Medical Research Council Epidemiology Unit Data Sharing Portal: epi-meta.mrc-epid.cam.ac.uk. Meta data for the WRAP study is available at https://epidata-ext.mrc-epid.cam.ac.uk/ddic/overview/WRAP/. The source code for the Health Checks model is available under licence from a GitHub repository (https://github.com/chjackson/healthchecks).

Abbreviations

SPHR:

School for Public Health Research

HC:

Health Checks

WW:

Weight watchers

ISPOR-SMDM:

International Society for Pharmacoeconomics and Outcomes Research–Society for Medical Decision Making

QALY:

Quality-Adjusted Life Year

WRAP:

The extended and standard duration Weight-loss programme Referrals for Adults in Primary care

CVD:

Cardiovascular disease

BMI:

Body Mass Index

NHS:

National Health Service

HSE:

Health Survey for England

References

  1. Ahern AL, Wheeler GM, Aveyard P, Boyland EJ, Halford JC, Mander AP, Woolston J, Thomson AM, Tsiountsioura M, Cole D. Extended and standard duration weight-loss programme referrals for adults in primary care (WRAP): a randomised controlled trial. Lancet. 2017;389(10085):2214–25.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Aveyard P, Lewis A, Tearne S, Hood K, Christian-Brown A, Adab P, Begh R, Jolly K, Daley A, Farley A. Screening and brief intervention for obesity in primary care: a parallel, two-arm, randomised trial. Lancet. 2016;388(10059):2492–500.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Jebb SA, Ahern AL, Olson AD, Aston LM, Holzapfel C, Stoll J, Amann-Gassner U, Simpson AE, Fuller NR, Pearson S. Primary care referral to a commercial provider for weight loss treatment versus standard care: a randomised controlled trial. Lancet. 2011;378(9801):1485–92.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Ahern A, Breeze P, Fusco F, Sharp S, Islam N, Wheeler G, Hill A, Hughes C, Duschinsky R, Thomas C. Effectiveness and cost-effectiveness of referral to a commercial open group behavioural weight management programme in adults with overweight and obesity: 5-year follow-up of the WRAP trial. 2022.

  5. Eddy DM, Hollingworth W, Caro JJ, Tsevat J, McDonald KM, Wong JB. Model transparency and validation: a report of the ISPOR-SMDM modeling Good Research practices Task Force-7. Value Health. 2012;15(6):843–50.

    Article  PubMed  Google Scholar 

  6. Jahn B, Rochau U, Kurzthaler C, Paulden M, Kluibenschädl M, Arvandi M, Kühne F, Goehler A, Krahn MD, Siebert U. Lessons learned from a cross-model validation between a discrete event simulation model and a cohort state-transition model for personalized breast cancer treatment. Med Decis Making. 2016;36(3):375–90.

    Article  PubMed  Google Scholar 

  7. Vemer P, Ramos IC, Van Voorn G, Al M, Feenstra T. AdViSHE: a validation-assessment tool of health-economic models for decision makers and model users. PharmacoEconomics. 2016;34(4):349–61.

    Article  CAS  PubMed  Google Scholar 

  8. Brown JB, Palmer AJ, Bisgaard P, Chan W, Pedula K, Russell A. The Mt. Hood challenge: cross-testing two diabetes simulation models. Diabetes Res Clin Pract. 2000;50:57–S64.

    Article  Google Scholar 

  9. Squires H, Chilcott J, Akehurst R, Burr J, Kelly MP. A systematic literature review of the key challenges for developing the structure of public health economic models. Int J Public Health. 2016;61(3):289–98.

    Article  PubMed  Google Scholar 

  10. Briggs AD, Wolstenholme J, Scarborough P. Estimating the cost-effectiveness of salt reformulation and increasing access to leisure centres in England, with PRIMEtime CE model validation using the AdViSHE tool. BMC Health Serv Res. 2019;19(1):1–14.

    Article  Google Scholar 

  11. Drolet M, Bénard É, Jit M, Hutubessy R, Brisson M. Model comparisons of the effectiveness and cost-effectiveness of vaccination: a systematic review of the literature. Value Health. 2018;21(10):1250–8.

    Article  PubMed  Google Scholar 

  12. Si L, Willis MS, Asseburg C, Nilsson A, Tew M, Clarke PM, Lamotte M, Ramos M, Shao H, Shi L. Evaluating the ability of economic models of diabetes to simulate new cardiovascular outcomes trials: a report on the ninth mount hood diabetes challenge. Value Health. 2020;23(9):1163–70.

    Article  PubMed  Google Scholar 

  13. Hoogendoorn M, Feenstra TL, Asukai Y, Borg S, Hansen RN, Jansson S-A, Samyshkin Y, Wacker M, Briggs AH, Lloyd A. Cost-effectiveness models for chronic obstructive pulmonary disease: cross-model comparison of hypothetical treatment scenarios. Value Health. 2014;17(5):525–36.

    Article  PubMed  Google Scholar 

  14. Brisson M, Bénard É, Drolet M, Bogaards JA, Baussano I, Vänskä S, Jit M, Boily M-C, Smith MA, Berkhof J. Population-level impact, herd immunity, and elimination after human papillomavirus vaccination: a systematic review and meta-analysis of predictions from transmission-dynamic models. Lancet Public Health. 2016;1(1):e8–e17.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Handels RL, Green C, Gustavsson A, Herring WL, Winblad B, Wimo A, Sköldunger A, Karlsson A, Anderson R, Belger M. Cost-effectiveness models for Alzheimer’s disease and related dementias: IPECAD modeling workshop cross‐comparison challenge. Alzheimer’s Dement. 2023;19(5):1800–20.

    Article  Google Scholar 

  16. Tew M, Willis M, Asseburg C, Bennett H, Brennan A, Feenstra T, Gahn J, Gray A, Heathcote L, Herman WH. Exploring structural uncertainty and impact of health state utility values on lifetime outcomes in diabetes economic simulation models: findings from the Ninth Mount Hood Diabetes Quality-of-life challenge. Med Decis Making. 2022;42(5):599–611.

    Article  PubMed  Google Scholar 

  17. Thomas C, Sadler S, Breeze P, Squires H, Gillett M, Brennan A. Assessing the potential return on investment of the proposed UK NHS diabetes prevention programme in different population subgroups: an economic evaluation. BMJ open. 2017;7(8):e014953.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Breeze P, Thomas C, Squires H, Brennan A, Greaves C, Diggle P, Brunner E, Tabak A, Preston L, Chilcott J. Impact of Type 2 diabetes prevention programmes based on risk identification and lifestyle intervention intensity strategies: a cost-effectiveness analysis. 2015.

  19. Mytton OT, Jackson C, Steinacher A, Goodman A, Langenberg C, Griffin S, Wareham N, Woodcock J. The current and potential health benefits of the National Health Service Health Check cardiovascular disease prevention programme in England: a microsimulation study. PLoS Med. 2018;15(3):e1002517.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Dunlop WC, Mason N, Kenworthy J, Akehurst RL. Benefits, challenges and potential strategies of open source health economic models. PharmacoEconomics. 2017;35(1):125–8.

    Article  PubMed  Google Scholar 

  21. Breeze PR, Squires H, Ennis K, Meier P, Hayes K, Lomax N, Shiell A, Kee F, de Vocht F, O’Flaherty M. Guidance on the use of complex systems models for economic evaluations of public health interventions. Health economics 2023.

  22. Squires H, Chilcott J, Akehurst R, Burr J, Kelly MP. A framework for developing the structure of public health economic models. Value Health. 2016;19(5):588–601.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

We thank Dr Amy Ahern and Dr Fiona Whittle for their support in this study.

Funding

This study is funded by the NIHR [Programme Grants for Applied Research (RP-PG-0216-20010). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.CJ and OC were funded by the UK Medical Research Council programme MRC_MC_UU_00002/11.

Author information

Authors and Affiliations

Authors

Contributions

SB, PB, CT, CJ, OC, AB were involved in developing the concept and design of the study. SB and PB conducted the model comparison and analysed results. SB and PB drafted the manuscript which was critically revised by CT, CJ, OC and AB. PB, CT and AB obtained funding for the work.

Corresponding author

Correspondence to Sarah Bates.

Ethics declarations

Ethics approval and consent to participate

Ethical approval for the WRAP-UP trial was obtained from the West Midlands-Coventry and Warwickshire Research Ethics Committee on Dec 8, 2017 (17/WM/0432). All participants gave written informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Rights Retention Statement

For the purpose of open access, the author has applied a Creative Commons Attribution (CC BY) licence to any Author Accepted Manuscript version arising.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

12889_2024_18134_MOESM1_ESM.docx

Supplementary material: Supplementary Appendix A: Published guidance on cross-model validation. Supplementary Appendix B: Protocol to compare and cross-validate models. Supplementary Appendix C: Summary report of cross-model Validation. Supplementary Appendix D: Proposed recommendation for Cross-model validation

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bates, S., Breeze, P., Thomas, C. et al. Cross-model validation of public health microsimulation models; comparing two models on estimated effects of a weight management intervention. BMC Public Health 24, 764 (2024). https://doi.org/10.1186/s12889-024-18134-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12889-024-18134-4

Keywords