Skip to main content

Data collection for outbreak investigations: process for defining a minimal data set using a Delphi approach



Timely but accurate data collection is needed during health emergencies to inform public health responses. Often, an abundance of data is collected but not used. When outbreaks and other health events occur in remote and complex settings, operatives on the ground are often required to cover multiple tasks whilst working with limited resources. Tools that facilitate the collection of essential data during the early investigations of a potential public health event can support effective public health decision-making. We proposed to define the minimum set of quantitative information to collect whilst using electronic device or not. Here we present the process used to select the minimum information required to describe an outbreak of any cause during its initial stages and occurring in remote settings.


A working group of epidemiologists took part in two rounds of a Delphi process to categorise the variables to be included in an initial outbreak investigation form. This took place between January–June 2019 using an online survey.


At a threshold of 75 %, consensus was reached for nineteen (23.2%) variables which were all classified as ‘essential’. This increased to twenty-six (31.7%) variables when the threshold was reduced to 60% with all but one variable classified as ‘essential’. Twenty-five of these variables were included in the ‘Time zero initial case investigation’ ‘(T0)’ form which was shared with the members of the Rapid Response Team Knowledge Network for field testing and feedback. The form has been readily available online by WHO since September 2019.


This is the first known Delphi process used to determine the minimum variables needed for an outbreak investigation. The subsequent development of the T0 form should help to improve the efficiency and standardisation of data collection during emergencies and ultimately the quality of the data collected during field investigation.

Peer Review reports


The rapid investigation of potential public health eventsFootnote 1 is key to mitigating the harmful effects of these. Operatives on the ground during the early phases of a public health event are often required to cover multiple tasks whilst working with limited resources. When outbreaks and other health events occur in remote and complex settings, these problems can be multiplied. Tools that facilitate the accurate, rapid and efficient collection of essential data during the early investigation of a potential public health event can support effective public health decision-making, whilst adding minimal additional burden to the work of field investigators.

Whilst much attention has been given to the development of analytic tools which aim to facilitate timely analysis and prediction during complex outbreaks, there have been fewer reports which emphasise the need for quality, completeness and timeliness of primary data collection at the field level [1].

At present, there is significant variability in the quality of epidemiological information generated during outbreak investigations. During the 2014–2015 West African Ebola virus disease outbreak, a number of inadequacies in data management were reported, including reports of incomplete case investigation forms and the late arrival of data [2] and the challenges to properly label and classify samples sent to the labs during the investigation. The absence of a central database aimed at linking different sources of data was highlighted as a crucial problem [2] as was the variability of formats from different sources preventing the merger of information and the surfeit of detailed data that were collected but never used in analyses [3]. Studies conducted in South-East Asia and in Europe have also found that a number of data sets collected for outbreak reports were incomplete and/or inaccurate [4, 5]. Combined, the problems of incomplete, delayed, incorrect, excessively detailed and disorganised data sets are the key factors that result in an inability to rapidly assess and report critical indicators on the nature and risks posed by a potential health event. Those observations have been reported for several years and from all over the world.

In resource-limited settings, it is imperative to maximise efficiency by focussing on the primary objectives of data acquisition and to include only variables which may directly inform the immediate response. Alongside this, the methods of data collection and management need to be considered, with particular focus on the need to link multiple data sources and standardise variables for downstream data processing.

In response to the need for standardized data collection instruments for early investigations of public health events, the Health Emergencies Program (WHE) of the World Health Organization (WHO) proposed in 2018 to develop the WHO Outbreak Toolkit (WHO-OT), a novel resource which aims to improve the quality, timeliness and use of data during public health events through a web-based application for field staff. The WHO-OT includes a set of documents providing information about diseases, standardized case definitions, case report forms, laboratory guidelines and standards for data collection [6]. The WHO-OT project adheres to the key epidemiological principles of outbreak investigation, which are to provide an initial description of the situation with regard to the time, place and person. This should allow for 1) an informed assessment of severity and risk of progression of the outbreak, 2) the opportunity to generate hypotheses regarding the potential sources of the hazard(s) and mode of transmission, and 3) to establish whether further laboratory and epidemiological investigations are needed. The WHO-OT project therefore will comprise a set of tools which enable field staff to correctly collect the minimal necessary data for outbreaks (both infectious and non-infectious) of known and unknown cause.

The Delphi process is used to develop consensus amongst a group of experts [7], particularly where there may be little existing evidence for a specific issue [8]. It uses structured group communication through an iterative multistage process [9]. WHO has successfully used the Delphi process in multiple contexts, including recently to define the priority diseases for the WHO Research and Development Blueprint [10].

This paper describes the process used to identify and refine the minimum variables needed during the initial stages of any public health event investigation.


In June 2018, WHO convened a working group constituted of 36 experts in epidemiology, clinical medicine, environmental sciences, data management and data sciences who worked in organisations including WHO, the United States Centers for Disease Control (US-CDC), Médecins Sans Frontières (MSF), the European Centre for Disease Prevention and Control (ECDC), the London School of Hygiene & Tropical Medicine (LSHTM), the Helmholtz Centre for Infection Research, and Oxford University. These participants belonged to international organisations involved in field outbreak investigations and were considered experts in working in epidemics in low- and middle-income countries (LMICs). The list of participants was established based on a list of WHO experts in health event investigations, existing working relationships with WHO external partners in data collection, recommendations from experts consulted during recent work for the investigation of outbreaks of unknown origin, and partner institution experts involved in field data collection for all hazards. Terms of references of the working group were circulated and agreed during a first teleconference in June 2018.

The working group was invited to participate in the process of selecting a set of minimum variables (defined as ‘Epi Core Variables’) for event investigations under the WHO-OT project. This process commenced with a review of thirty-one case investigation forms from a range of sources from WHO, US-CDC, MSF, International Federation of Red Cross (IFRC) which had been used in previous health events and included both infectious and non-infectious hazards. From this, the participants selected variables which were common to a generic outbreak situation rather than specific for any pathogen or disease aetiology. These variables, called the “initial list”, contained 82 variables grouped into four categories: 1) Notification interview, 2) Case information, 3) Clinical information and 4) Exposure (Additional file 1). Questions for each variable were phrased in a way that would allow them to be used in any type of health event, whilst also ensuring the data to be comparable across situations. Variables describing clinical signs and symptoms were grouped into two sub-groups including (a) those characterising the severity of any disease and (b) those covering signs and symptoms; Only variables that characterise the severity of any disease, (a), were included in the prioritization exercise.

In order to reach consensus on which of the variables should classified as ‘Epi Core’, the working group decided to use a Delphi process.

Prioritisation of variables was restricted to a sub-group of 26 members of the working group in order to keep a balance among members from different institutions and to restrict the group to experts with regular participation in field outbreak investigations. We conducted the Delphi via an online survey tool in two stages conducted between January 2019 and June 2019 using Enketo web-forms [11]. Delphi participants were sent a guidance document for the process which outlined the purpose of the selection of variables. The variables were ordered by categories and accompanied by a description. This allowed the experts to prioritise the importance of including these as ‘Epi Core variables’ as defined in Table 1. For each variable, this sub-group of participants were asked to determine whether the variable should be retained for inclusion, and if so, to assign its priority in data collection during an outbreak as essential, high, medium, low or unknown (Table 1). The priority classification serves to adapt to the resources available in the field at the time of the outbreak investigation, i.e. if resources are limited, the field officer should collect at a minimum the essential variables. Experts could also provide additional explanatory comments in relation to their categorisations.

Table 1 The priority classification and definition of the categorisation of the variables

In January 2019, all 26 members of the sub-group were sent invitations by email to participate in the first round of the Delphi process. Participants were given 55 days to complete the survey. Automated reports of the aggregated results and any additional comments from the first round were generated using R statistical software (version 3.5.3) and were sent to all participants in March 2019, and 25 members of the sub-group were sent invitations to participate in the second round for which the methods were the same. Two additional participants were added to the second round to account for a loss to follow-up of three participants after the first round.

Those variables for which a priority classification reached at least 75% consensus during the first round, were excluded from the second round. In a systematic review of English language Delphi studies, the most common definition for consensus was percentage agreement, with 75% being the median threshold to define consensus [12]. Similar to the first round, 75% was used as the threshold for variables to be included in the second round.

Participants were given 26 days to respond before reports of the results from both the first and second round were disseminated. Following this, all participants were invited to a teleconference to discuss the results, in which 13 attended. Given the limited number of variables which reached a consensus of 75% and the large spread of the results, the group decided to broaden consideration of variables which reached a consensus at 60%. Therefore, we also show those variables which reached at least 60% consensus.

The minimum set of variables served to develop a generic data collection form for outbreak investigation, nominally titled the ‘T0 initial case investigation form’ (“T” for time, “0” for first data collection), to collect the minimum set of data to describe an outbreak of any origin and to build initial hypotheses regarding its origin/source, transmission, aetiology or syndrome. Given that not all of the working group had a clinical background, the Delphi process was only used to facilitate the prioritisation of signs of severity of disease. A sub-group of medical experts then selected pathogen- or syndrome-specific variables from those medical variables included in the initial list of 82 variables (Additional file 1) to be included in the T0 form. WHO laboratory experts were consulted to define variables for laboratory diagnosis to be included in the form.

The T0 form was field tested by members of the National Rapid Response Teams Knowledge Network (RRT KN) in French and English-speaking countries. Feedback was received from three countries, Tunisia, Egypt, Morocco and the Eastern Mediterranean Public Health Network (EMPHNET) who tested the T0 during training of Rapid Response Teams. The field-testing led to the inclusion of additional variables, mainly pertaining to exposure, included in the list as essential in the T0 form following the recommendations issued from field practitioners in the testing process. One variable, ‘neighbourhood/camp/settlement’, was deleted following results of the field-testing.


Eighty-two variables were selected to be included in the Delphi survey by the sub-group with epidemiological outbreak expertise. Out of 82, nine (11.0%) were categorised as being related to the notification of a case such as the date and location, 31 (37.8%) related to the demographics of the case, 17 (20.7%) to clinical information, and 25 (30.5%) to exposure to the hazard. The list of variables by categories is given in the supplementary data (Additional file 1 ).

Sixteen (61.5%) members of the working group responded to the first-round of the Delphi survey and 17 (68%) to the second round. The distribution of participants and responses by organisational group is shown in Table 2. In both rounds, the majority of participants were from WHO, (61.5 and 60%, respectively).

Table 2 Number and Response rates to the first and second round of the Delphi survey by organization

Consensus for the classification into the categories was reached for nineteen variables (23.2%) using a threshold of at least 75% (Table 3). All nineteen variables were classified as essential (Tables 4, 5, 6 and 7). On reduction of the threshold for consensus to at least 60%, the number of variables increased to 26 (31.7%) of which 25 variables were classified as essential and one variable was classified as high (Tables 3, 4, 5, 6 and 7). Consensus was highest amongst those related to case information: 35.5% at a threshold of 75% consensus and 45.2% at a threshold of 60% consensus; and lowest for those related to exposure: 8.0% at a threshold of 75% consensus and 12.0% at a threshold of 60% consensus (Table 3).

Table 3 The number of variables and the number and percentage of variables which reached consensus by group

The highest consensus for any variable ranged from 31.3% (n = 5) to 100% (n = 1) (Tables 4, 5, 6, 7 and Additional file 2).

The tested version of the T0 initial investigation form comprises 43 (52.4%) of the variables from the Delphi process, which included all of those which reached consensus at 75% and all of those which reached consensus at 60% except one indicated above (Tables 4, 5, 6 and 7). In addition to the epidemiologic variables, eleven clinical variables from the initial list were added describing the syndrome and severity of the status of the patient. To those clinical variables, quantitative criteria were also added in the T0 form following recommendations of the medical sub-group of experts. Tables 4, 5, 6 and 7 show that most of the variables selected via the Delphi process relate to the socio-demographic and epidemiology criteria while those related to exposure to agent or mode of transmission were selected during the field testing phase.

Table 4 List of variables included in the Delphi process as part of the ‘Notification Interview’ and score of consensus obtained in the two rounds of the survey with final priority category, decision of the medical sub-group and of the field testing results to preserve and final decision to include in the T0

The highest score refers to the highest percentage given for any priority category. The final category refers to the priority assigned based on reaching consensus at 60%. "X" indicates whether the category was selected by the medical sub-group; included as result of field testing of the T0 form and whether it was included in the final T0 form.

Table 5 List of variables included in the Delphi process as part of the ‘Case Information’
Table 6 List of variables included in the Delphi process as part of the ‘Clinical Information’
Table 7 List of variables included in the Delphi process as part of the ‘Exposure’


The importance of developing minimum datasets for emerging infectious diseases has been previously described in relation to the design of clinical trials however, this is the first known attempt to determine the core epidemiological information to be collected during outbreaks based on an all-hazards approach [13]. We describe here an approach that allowed a working group from various countries and organisations to participate remotely in a flexible way.

At a threshold of 75%, only 28% of variables included in the Delphi survey reached consensus and even at a threshold of 60% we achieved 39% consensus. Our consensus levels were lower than in a systematic review of one-hundred manuscripts where close to 88% reached consensus [12]. We may have achieved a greater percentage of variables reaching consensus if the categorisation was less specific, such as by limiting it to ‘essential’ and ‘non-essential’ only. Additionally, the background of the working group varied particularly regarding their level of clinical expertise and the pathogens which with they were familiar, which may have influenced some results and potentially resulted in response bias.

The final output of the work is the T0 form, a generic data collection form for outbreak investigations, developed to describe an outbreak and to build hypotheses regarding its origin/source, transmission, cause and/or clinical syndrome. To facilitate data collection in a variety of settings, including areas lacking data connectivity, the T0 has been converted into formats to enable electronic data capture including: KoBoCollect, Go. Data, and the Early Warning, Alert and Response System (EWARS) [14]. In order to standardise the data and to optimise sharing and analysis, a data dictionary for the T0 form was also developed and made available on the website of the Outbreak Toolkit Project. Our objective is to stimulate the use of electronic data capture tools in the field.

The T0 form served as the basis for the urgent development of a coronavirus disease 2019 (COVID-19) case based surveillance form that was shared with all countries and which allowed to rapidly collect the minimum information needed to monitor the COVID-19 pandemic [15, 16]. The variables of the T0 form have also served as the backbone for the development of a Time 1 (T1) questionnaire for investigation of unknown diseases, with a greater focus on clinical symptoms and laboratory results and which is available on demand. The development of the T1 questionnaire was conducted separately to the Delphi process.

We hope that this first attempt to develop a standard questionnaire to be used at the start of any outbreak investigation will continuously be revised by those using it. Also the principle of a generic questionnaire, developed as a tool to support field epidemiologists, implies that it should be adapted to the specific outbreak context at the start of the investigation.


This is the first reported systematic Delphi process which aimed to define a common set of core variables to be included in an initial case investigation form for outbreak investigations. This tool will support the global standardisation of data collection during the early stage of public health events, allowing for improved efficiency and timeliness of data capture in challenging settings.

Availability of data and materials

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.


  1. A public health event is any event that may have negative consequences for human health. The term includes events that have not yet led to disease in humans but have the potential to cause human disease through exposure to infected or contaminated food, water, animals, manufactured products or environments. In contrast, an outbreak denotes that the event has already caused disease (WHO definition).



The European Centre for Disease Prevention and Control


Eastern Mediterranean Public Health Network


Early Warning, Alert and Response System


International Federation of the Red Cross


Low- and middle-income countries


The London School of Hygiene & Tropical Medicine


Médecins Sans Frontières


Rapid Response Team


The United States Centers for Disease Control


World Health Organization Outbreak Toolkit


World Health Organization


  1. Polonsky JA, Baidjoe A, Kamvar ZN, Cori A, Durski K, Edmunds WJ, et al. Outbreak analytics: a developing data science for informing the response to emerging pathogens. Philos Trans R Soc Lond Ser B Biol Sci. 2019;374(1776):20180276.

    Article  Google Scholar 

  2. Owada K, Eckmanns T, Kamara KB, Olu OO. Epidemiological data management during an outbreak of Ebola virus disease: key issues and observations from Sierra Leone. Front Public Health. 2016;4:163.

    Article  Google Scholar 

  3. Cori A, Donnelly CA, Dorigatti I, Ferguson NM, Fraser C, Garske T, et al. Key data for outbreak evaluation: building on the Ebola experience. Philos Trans R Soc Lond B Biol Sci. 2017;372(1721):20160371.

    Article  Google Scholar 

  4. Lawpoolsri S, Kaewkungwal J, Khamsiriwatchara A, Sovann L, Sreng B, Phommasack B, et al. Data quality and timeliness of outbreak reporting system among countries in greater Mekong subregion: challenges for international data sharing. PLoS Negl Trop Dis. 2018;12(4):e0006425.

    Article  Google Scholar 

  5. Kroneman A, Harris J, Vennema H, Duizer E, van Duynhoven Y, Gray J, et al. Data quality of 5 years of central norovirus outbreak reporting in the European network for food-borne viruses. J Public Health (Oxf). 2008;30(1):82–90.

    Article  CAS  Google Scholar 

  6. World Health Organization. WHO Outbreak Toolkit. 2020. Available from:.

    Google Scholar 

  7. Dalkey N, Helmer O. An experimental application of the Delphi method to the use of experts. Manag Sci. 1963;9(3):458–67.

    Article  Google Scholar 

  8. Thangarathium S, Redman CWE. The Delphi technique. Obstet Gynaecol. 2005;7:120–5.

    Google Scholar 

  9. Okoli C, Pawlowski SD. The Delphi method as a research tool: an example, design considerations and applications. Inf Manag. 2004;42(1):15–29.

    Article  Google Scholar 

  10. Mehand MS, Millett P, Al-Shorbaji F, Roth C, Kieny MP, Murgue B. World Health Organization Methodology to Prioritize Emerging Infectious Diseases in Need of Research and Development. Emerg Infect Dis. 2018;24(9):e171427.

    Article  Google Scholar 

  11. Enketo. 2020 [Available from:].

  12. Diamond IR, Grant RC, Feldman BM, Pencharz PB, Ling SC, Moore AM, et al. Defining consensus: a systematic review recommends methodologic criteria for reporting of Delphi studies. J Clin Epidemiol. 2014;67(4):401–9.

    Article  Google Scholar 

  13. Rojek AM, Moran J, Horby PW. Core minimal datasets to advance clinical research for priority epidemic diseases. Clin Infect Dis. 2020;70(4):696–7.

    PubMed  Google Scholar 

  14. T0 Initial case investigation form [Available at:

  15. Interim case reporting form for 2019 Novel Coronavirus (2019-nCoV) of confirmed and probable cases: WHO minimum data set report form, 21 January 2020: available at

  16. World Health Organization. Case-based reporting form 2020 [Available from:

Download references


The authors thank the Rapid Response Team Knowledge Network Secretariat and the Rapid Response Team Knowledge Network Steering Committee for their support in the testing of the questionnaire. The authors thank the members of the “Minimum Variables for Outbreak Investigation Working Group of the WHO Outbreak Toolkit”: Rawan Araj, Brett Archer, Isabel Bergeri, Souha Bougatef, Rocio Escobar, Esther Hamblion, Christopher Haskew, Yvan Hutin, Yurie Izawa, Laura Merson, Jonathan Polonsky, Ana Riviere, Emmanuel Robesyn, Maria Jose Sagrado, Ravi Shankar Santhana Gopala Krishan, and Bernard Silenou, for their participation to preliminary work, and/or to the survey or to the field testing of the questionnaire.


HB, CR and MM are funded by the Department of Health and Social Care as part of the UK Vaccine Network (UKVN), a UK Aid programme to develop vaccines for diseases with epidemic potential in low and middle-income countries. The research is managed by National Institute for Health Research (PR-OD-1017-20001).

Author information

Authors and Affiliations




The study was designed by AP, HB, and CR. Data were analysed by HB. The manuscript was written by AP and HB. SM, SS, AICM, and AP contributed to the selection of medical variables of the T0 questionnaire. All authors contributed to the interpretation of the data and to the editing of the content of the manuscript. All members of the working group responded to the survey and contributed to the editing of the manuscript. All authors have approved the submitted version (and any substantially modified version that involves the author’s contribution to the study); and have agreed both to be personally accountable for the author’s own contributions and to ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Author’s information

AP is a medical doctor and epidemiologist. She developed the WHO Outbreak Toolkit Project for the WHO Emergency program.

HB is a research fellow with the Emergency and Epidemic Data Kit, Global Health Analytics, LSHTM.

Corresponding author

Correspondence to Anne Perrocheau.

Ethics declarations

Ethics approval and consent to participate

This project does not fall within the scope of application of the Swiss Federal Act on Research involving Human Beings (Human Research Act, HRA) of 30 September 2011 (Status as of 26 May 2021) and therefore no ethics approval is required.

The participating experts received a full written description of the scope and goals of this study. Participation was voluntary. The experts gave their written consent in the course of completing Round 1 or 2 of this Delphi survey. Electronic consent was obtained from all participants to use their personal information for the purpose of the Delphi process. All methods were carried out in accordance with relevant guidelines and regulations.

Consent for publication

Not Applicable.

Competing interests

The authors declare that they have no competing interests.

The authors alone are responsible for the views expressed in this article and they do not necessarily represent the views, decisions or policies of the institutions with which they are affiliated.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

  Selected Variables and Definitions by Category.

Additional file 2: Fig. S1.

 The percentage of participants who categorised each variable as essential, high, medium, low or unknown (‘nil’). The red bars refer to the responses from the first round and the blue bars, the second. A line is given at the 75% threshold used to determine consensus following the first round.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perrocheau, A., Brindle, H., Roberts, C. et al. Data collection for outbreak investigations: process for defining a minimal data set using a Delphi approach. BMC Public Health 21, 2269 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: