Etiology of acute febrile illness in the peruvian amazon as determined by modular formatted quantitative PCR: a protocol for RIVERA, a health facility-based case-control study

Background The study of the etiology of acute febrile illness (AFI) has historically been designed as a prevalence of pathogens detected from a case series. This strategy has an inherent unrealistic assumption that all pathogen detection allows for causal attribution, despite known asymptomatic carriage of the principal causes of acute febrile illness in most low- and middle-income countries (LMICs). We designed a semi-quantitative PCR in a modular format to detect bloodborne agents of acute febrile illness that encompassed common etiologies of AFI in the region, etiologies of recent epidemics, etiologies that require an immediate public health response and additional pathogens of unknown endemicity. We then designed a study that would delineate background levels of transmission in the community in the absence of symptoms to provide corrected estimates of attribution for the principal determinants of AFI. Methods A case-control study of acute febrile illness in patients ten years or older seeking health care in Iquitos, Loreto, Peru, was planned. Upon enrollment, we will obtain blood, saliva, and mid-turbinate nasal swabs at enrollment with a follow-up visit on day 21–28 following enrollment to attain vital status and convalescent saliva and blood samples, as well as a questionnaire including clinical, socio-demographic, occupational, travel, and animal contact information for each participant. Whole blood samples are to be simultaneously tested for 32 pathogens using TaqMan array cards. Mid-turbinate samples will be tested for SARS-CoV-2, Influenza A and Influenza B. Conditional logistic regression models will be fitted treating case/control status as the outcome and with pathogen-specific sample positivity as predictors to attain estimates of attributable pathogen fractions for AFI. Discussion The modular PCR platforms will allow for reporting of all primary results of respiratory samples within 72 h and blood samples within one week, allowing for results to influence local medical practice and enable timely public health responses. The inclusion of controls will allow for a more accurate estimate of the importance of specific prevalent pathogens as a cause of acute illness. Study Registration Project 1791, Registro de Proyectos de Investigación en Salud Pública (PRISA), Instituto Nacional de Salud, Perú. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-023-15619-6.

In recent years, TaqMan Array Cards (TACs) -modular semi-quantitative polymerase chain reaction (PCR) diagnostic devices -have demonstrated the capacity to enhance disease surveillance through their efficiency and ease of use [28,29]. However, this technology has only been limited to AFIs as part of a sustained febrile surveillance program [30].
While blood is generally sterile, microbial nucleic acid can be detected in a certain fraction of healthy individuals even with aseptic collection techniques. Therefore, as molecular diagnostics are applied to AFI patients, the possibility of subclinical detections in immune hosts must be evaluated. Rates of subclinical detection may vary by pathogens, such that some pathogens are never found in asymptomatic individuals and are always considered the attributable etiologic agent when detected. In contrast, detections of other pathogens need to be reconsidered carefully and compared to detections among controls, such as Plasmodium, which has a community prevalence of 2-5% in communities around Iquitos [31,32]. Asymptomatic prevalence of dengue virus (DENV) can reach 5-7.5% in endemic populations [33], while for Zika virus (ZIKV), rates of 60% have been observed [34], and asymptomatic human infection with Leptospira has also been documented [13]. Additionally, estimating the prevalence and microbial load in cases of asymptomatic carriage is essential in understanding the transmission dynamics of the high-burden AFI etiologies.
Given the lack of combined rigorous case definitions and poor understanding of the relative performance of accepted AFI gold standards against emerging and promising diagnostics, this study, named RIVERA, aims to fill a critical knowledge gap to attain more accurate regional and estimates for AFI in the Peruvian Amazon.

Study aim
The aim of this study will be to implement year-round, active, etiology-specific surveillance of AFI cases and their age-matched controls in Loreto, Peru, using Taq-Man array cards for the detection of bloodborne pathogens. The evaluation of respiratory testing by RT-PCR of Influenza and SARS-CoV-2 was added given the heavy burden of SARS-CoV-2 in the region [35].

Study design
This study has a prospective health facility-based casecontrol design with active surveillance of AFI in patients (cases) aged ten years or older seeking care at selected facilities in Iquitos, Loreto, Peru, and a nearby rural community. The study will recruit 1,200 cases and controls per year over a four-year period.

Study setting
This study will be carried out at seven health centers distributed in the city of Iquitos, in the province of Maynas, Loreto Region, Peru. These include two tertiary level hospitals (Hospital Regional de Loreto and Hospital Apoyo Iquitos) and four primary care centers (Centros de Salud -C.S.) -C.S. San Juan, C.S. Moronacocha, C.S. Americas, and C.S. Santa Clara -in Iquitos, with an additional fifth health center located some distance downriver in the riverine community of Mazán (C.S. Mazán). Figure 1 shows the locations of the seven recruitment centers. Socioeconomic and health indicators of the region of Loreto lag behind those of the rest of the country, including access to water and sanitation, per capita income, and critical health indicators, including infant mortality (29.5 vs. 12.0 per 1000 in Lima) [36] and rates of childhood stunting of 25.2 (national average 12.1%). In 2021 Loreto reported 85.70% of the total number of cases of malaria in Peru (18,074), and 11.42% of all reported cases of dengue (44,791) [37].

Participant characteristics and enrollment
Enrollment will occur at primary (5) and referral (2) health care centers where the participant is seeking care for an illness which includes documented fever of 38 o C or more at the time of enrollment as measured by temporal thermometry. Enrollment is to be restricted to individuals greater than ten years of age as it has previously been shown that respiratory tract infections cause > 70% of acute febrile illness in children ten years and under, [38] and will exclude illness of greater than 14 days of duration. Enrollment also excluded participants with clinical evidence of a focal cause of fever (such as symptoms of urinary tract infection, monoarticular septic arthritis, dental infection, wound or soft tissue infections, or a history of surgery in the past 30 days). Participation in recruitment areas where inpatient care occurs is restricted to the first 24 h of admission. Health facility-based controls will be recruited within 14 days of the matched case. Those identified as controls will be enrolled within the health establishment where the study is being performed to attempt to match environmental and demographic risk exposure. Controls can be patients who do not have a measured or reported fever in the last ten days or have family members of cases in the AFI study. They will be age matched within 5 years of the age of the index case.

Study procedures
Enrolled participants will complete a questionnaire that collects information on the medical evaluation and clinical background of the patient, demographic, socioeconomic, occupational travel, and animal contact information. Additionally, biological samples, including whole blood, saliva, and mid-turbinate swabs, will be collected.
A follow-up visit on days 21-28 after enrollment will be scheduled for both cases and controls in which participants provide additional clinical and household information as well as biological specimens (blood and saliva). Details of the procedures participants will undergo are shown in Table 1.

Laboratory procedures
i) Biosafety. Respiratory samples will be inactivated at 60°C for 30 minutes. Samples arriving at the laboratory, including saliva, blood, and mid-turbinate swabs, will be handled in a biosafety cabinet, and all laboratory personnel will wear personal protective equipment, including gloves, disposable aprons, and KN95 masks. Specimens not processed immediately will be stored at -80°C. ii) TaqMan Array Cards. TaqMan array cards (Thermo Fisher Scientific, Waltham, MA) are a modular quantitative PCR diagnostic device that enables the detection of up to 32 pathogen targets in a single biological sample (Fig. 2). The card is essentially a microfluidic card that allows for the parallel processing of multiple (24) wells in which specific reagents for a single or multiplex assay are deposited in each individual well, and template and PCR common agents are loaded in a single step. Targets include: (1) the principal identified causes of acute febrile illness in the tropics; Plasmodium falciparum, Plasmodium vivax, dengue virus detection and serotyping assays, leptospirosis, tuberculosis, Salmonella typhi, Streptococcus pneumonia, and Brucella; (2) (5) chronic viral infections; HIV, CMV and EBV. In this study, total nucleic acids will be extracted from whole blood samples using the High Pure Viral Nucleic Acid Large Volume Kit (Roche Life Science, Indianapolis, IN, USA) as instructed by manufacturers. TaqMan Fast Virus 1 -Step Master Mix (Thermo Fisher Scientific, Waltham, MA) will be used for the qPCR reactions performed in a QuantStudio 7 Flex. To increase the sensitivity of the assay, duplicate wells were run for all bacterial etiologies. Reactions with cycle thresholds (Ct) of less than or equal to 35 are considered positive, as supported by prior work to establish this cutoff as validated by sequencing [28,30,39]. Validation reactions include an internal control (MS2) to detect reaction inhibition and extraction blanks to detect contamination. The customized TAC utilized in this study has the potential to be redesigned and it is expected that individual targets may be added during the study period. Assay targets for the initial iteration are included in Supplementary Table 1.
Results will be reviewed by two independent reviewers to ensure that interpretation is consistent. PCR targets with Ct's of greater than 35 but less than 40 will be confirmed by single plex PCR and/or sequenced by Sanger sequencing as per Liu et al. [28].
iii) SARS-CoV-2 and Influenza. Mid-turbinate swabs from both cases and controls will be collected on Day 0 and on Day 21-28 and placed in 1.5 mL of T.E. buffer (pH 8.0). Mid-turbinate swabs obtained at Day 0 will be processed using the CDC Influenza SARS-CoV-2 (Flu SC2) multiplex assay that simultaneously detects SARS-CoV-2, influenza A, and Influenza B. Specifically, RNA will be extracted from 140 µL of the sample using the Qiaamp Viral RNA Mini Kit (Qiagen, Germantown, MD), as instructed by manufacturers. Primers and probes are detailed in Supplementary   Table 3.

SARS-CoV-2 Variant Sequencing
All samples that are determined to be positive for SARS-CoV-2 will be sequenced to determine the variant in the population using the Illumina COVIDSeq Assay Kit per the manufacturer's instructions. Briefly, each RNA sample will be converted to cDNA using 1st strand synthesis, and then the SARS-CoV-2 genome will be amplified with pre-designed primers that generate overlapping amplicon products of the genome. Next, Illumina adapters will be added to all PCR amplicons, and then all amplicons for a single sample will be pooled together and barcoded for multiplex sequencing. Next, all barcoded Illumina libraries will be pooled together, the libraries cleaned up and quantified using a Qubit 4 Fluorometer and normalized to 4 nM. Finally, the normalized Illumina sequence library was diluted to 75 pM and sequenced on an Illumina iSeq 100 using the v2 (300-cycle) reagent kit. Sequence reads were de-muliplexed according to the samples barcode, FASTQ files for the forward and reverse reads of each sample uploaded into Illumina's Basespace (www.basespace.illumina.com), and the SARS-CoV-2 variant called using the DRAGEN COVID Lineage application (v3.5.7).

Dengue virus sequencing
All samples positive for dengue virus with Ct values less than 30 will be sequenced for targeted full-genome amplification and sequencing as per Cruz et al. Sequencing will be done using the MinION (Oxford Nanopore, Oxford, UK).

Additional testing
Acute and convalescent sera allow for confirmatory testing using serological methods or targeted substudies to focus on pathogens where infectious load makes direct detection challenging (Coxiella burnetti, Rickettsiae spp).

Statistical analysis and power calculation
A primary aim of RIVERA is to quantify the burden of AFI attributable to each pathogen in a diagnostic panel, or the etiology-specific population attributable fraction (AFp). The inclusion of controls is necessary to estimate the associations between pathogen detection and AFI by etiology while adjusting for rates of subclinical infections. While crude, unadjusted AFps can be derived from standardly calculated odds ratios (ORs), maximum likelihood methods allow for adjustment for confounders and the presence of multiple pathogens, as well as the comparison of etiological and non-etiological (risk factor) exposures [40]. For analysis of RIVERA data, conditional logistic regression models will be fitted to the matched pairs data, treating AFI (case/control) status as the outcome and with pathogen-specific sample positivity as exposures to estimate the strength of association between pathogen detection and AFI while accounting for asymptomatic infections. Models will be further adjusted for sex, and other confounders, as well as other pathogens, to account for co-infections that are documented in this context (33,34). Adjusted attributable pathogen fractions (AFe) will be calculated from OR estimated by the models using the method proposed by Bruzzi and colleagues [41], which have since been adapted for applications to case-control data from studies of infectious syndromes of diverse etiology [42]. Assuming the absence of interactions between pathogens, the adjusted attributable fraction AF a for a pathogen A is given by: where Pr(A|AFI) represents the proportion of AFI cases in which A is present, and OR a is the adjusted odds ratio for A estimated by the conditional logistic regression. Figure 3 graphs the relationship between the prevalence of a given pathogen in controls (the subclinical detection rate) and the minimum effect size detectable in the study overall (6,000 case/control pairs assuming 1,500 recruited per year for 4 years) and for the sub-group of site-specific estimates in 2000 pairs. With those numbers of subjects and considering, for example, a 5% Plasmodium prevalence in controls, we will be powered to detect a minimum odds ratio for AFI of approximately 1.25 in the study overall or 1.4 in sub-groups with 2,000 case/control pairs. An alternative approach, proposed by Deloria Knoll and colleagues [43], will be used in parallel.
Despite the relative simplicity and ease of interpretability of the attributable fraction methods, the PERCH Integrated Analysis method, uses Bayesian nested partially latent class analysis to provide estimates at the individual-and population-level for each pathogen. These analytic methods are highly flexible and can be adapted to fit the intricacies of particular diagnostic test characteristics [43]. This approach has several advantages which include the ability to incorporate results from more than one test for a single pathogen into estimates (i.e. PCR and serology), for the adjustment of sensitivity and specificity of each test into the estimate, and for inclusion of pathogens in estimates which have population based odds ratio ≤ 1 [44].
In addition to etiological exposures (pathogens) we will also quantify population level attributable risk factors for exposure variables. Variables such as employment, history of travel by river, housing construction material, water storage and treatment, livestock husbandry, bush Peñataro Yori et al. BMC Public Health (2023) 23:674 meat exposure, and domestic ectoparasite sightings. These will be tested for their unadjusted associations with AFI in univariable conditional logistic regression models and those for which the effect estimate is significant at the α < 0.05 level will be included alongside pathogen infection statuses in a final multivariable model, allowing both adjustments for confounding, and direct comparison of effect magnitudes between etiological and risk factor exposures to estimate the population attributable fractions for multiple risk factors. The association of clinical and laboratory features of the illness will also be examined to determine if these components can be associated with specific etiologies of AFI.

Discussion
The results of this study will inform clinicians and public health practitioners as to the relative importance of a broad range of pathogens causing acute febrile illness in the Northern Amazon, a region of known importance in the identification and characterization of agents of novel and emerging infectious diseases. There are several commercial nucleic acid amplification assays designed for use in acute febrile illness etiology determination that aim to identify a broad set of etiologies. These include the Bio-Fire FilmArray Global Fever Panel (19 pathogens) [45], Verigene, VerePLEX (26 pathogens) [46], and FeverDisk (12 pathogens) [47]. We selected custom-made cards as the ability to select pathogens of regional interest was seen as an important asset in responding to local clinicians' needs assessment, and because acute febrile illness surveillance and emerging infectious diseases surveillance are overlapping tasks in our laboratory. We also selected a diagnostic method that would allow for the direct detection of the four major serotypes of dengue, as serotype switching is an important component of disease surveillance at the population level for epidemic response preparedness.
Nucleic acid-based methodologies yield certain clear advantages over older approaches that were based on aggregated results from culture, microscopy, lateral flow assays, and serology. First, the nucleic acid tests are in most cases more sensitive. Secondly, the streamlined approach in which most assays result from the testing of a single card facilitates reporting, permits on-site testing, and facilitates the clinical application of results. Lastly, nucleic acid isolated from the sample and from PCR amplification are each available for sequence-based confirmation and molecular epidemiology. This accelerates not only clinical care but reporting or results of notifiable diseases at the local, regional, national, and international level and is likely the only strategy that is truly feasible for emerging diseases surveillance and response in the modern era.
Independent of the diagnostic testing approach, the case control component will allow for the application of the attributable fraction approach for acute febrile illness Fig. 3 Power calculations -minimum effect size (odds ratio for acute febrile illness in cases compared to controls) detectable in the study overall (6,000 cases over 4 years) and for site-specific effects (2,000 cases over 4 years) given the prevalence of pathogen positivity in controls, 80% power, an alpha of 0.05 and 1:1 matching. Horizontal lines show documented prevalence of subclinical detection for 4 key pathogens for the primary analysis for disease burden estimates. In a planned secondary analysis, we will repeat estimates using a Bayesian approach. Advantages of this alternative approach allow for the integration of testing results from greater than one test per etiology, the integration of the sensitivity and specificity of the test into the estimate, and makes makes estimates at the individual rather than population level. As a result, etiologies that have fractions of < 1 may appear in the etiologic distribution, as would be expected when predictions are individual rather than population level. Analysis of risk factors may also yield more efficient pathways for the improved clinical management of patients presenting with acute febrile illness in this region.
The team that has designed this study is composed of key stakeholders in the surveillance of infectious diseases in the region and clinicians overseeing the care of patients to enable the inclusion of appropriate populations maximally enable technical and de-identified data transfer at all levels of the project.
Supplementary Material 1