A spatial-temporal statistical analysis of health seasonality: explaining HFMD infections within a children population along the Vietnamese south central coast

Background Various neglected tropical diseases show spatially changing seasonality at small areas. This phenomenon has received little scientific attention so far. Our study contributes to advancing the understanding of its drivers. This study focuses on the effects of the seasonality of increasing social contacts on the incidence proportions at multiple district level of the childhood hand-foot-mouth disease in Da Nang city, Viet Nam from 2012 to 2016. Methods We decomposed the nonstationary time series of the incidence proportions for the nine spatial-temporal (S-T) strata in the study area, where S indicates the spatial and T the temporal stratum. The long-term trends and the seasonality are presented by the Fourier series. To study the effects of the monthly average ambient temperature and the period of preschooling, we developed a spatial-temporal autoregressive model. Results Seasonality of childhood hand-foot-mouth disease incidence proportions shows two peaks in all spatial strata annually: large peaks synchronously in April and small ones asynchronously during the preschooling period. The peaks of the average temperature are asynchronous with the seasonal peaks of the childhood hand-foot-mouth disease incidence proportions in the period between January and May, with the negative values of the regression coefficients for all spatial strata, respectively: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\beta}_{{\mathrm{T}}_11}^{S_1}=-0.18\pm 0.07;{\beta}_{{\mathrm{T}}_11}^{S_2}=-0.25\pm 0.09;{\beta}_{{\mathrm{T}}_11}^{S_3}=-0.14\pm 0.05 $$\end{document}βT11S1=−0.18±0.07;βT11S2=−0.25±0.09;βT11S3=−0.14±0.05. The increasingly cumulative preschooling period and the seasonal component of the incidence proportions are negatively correlated in the period between August and December, with the negative values of the regression coefficients for all temporal strata, respectively: \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ {\beta}_{{\mathrm{T}}_32}^{S_1}=-0.40\pm 0.01;{\beta}_{{\mathrm{T}}_32}^{S_2}=-0.29\pm 0.00;{\beta}_{{\mathrm{T}}_32}^{S_3}=-0.25\pm 0.01 $$\end{document}βT32S1=−0.40±0.01;βT32S2=−0.29±0.00;βT32S3=−0.25±0.01. Conclusions The study shows that social contact amongst children under five years of age is the important driving factor of the dynamics of the childhood hand-foot-mouth disease outbreaks in the study area. The preschooling season when children’s contact with each other increases stimulates the geographical variation of the seasonality of childhood hand-foot-mouth disease infections at small areas in the study area.


Background
Neglected tropical diseases (NTDs) have shown evidently seasonal patterns with various timings, amplitudes and types [1][2][3]. NTDs remain socio-economic burdens in developing countries, especially in low-income communities in Asia and Africa. The considerable seasonality of these NTDs is affected by seasonal changes of both environmental and social risk factors [4,5]. Understanding the regularity or irregularity of NTDs' outbreaks is necessary to obtain optimal control over the disease spread [6], to reduce their burdens and to reach the UN sustainable development goals for health [7].
Amongst others, the infections of Coxsackievirus A16 (CA16) and human Enterovirus 71 (EV71) that mainly cause hand-foot-mouth disease (HFMD) in children under 5 years of age (U5s) have been recognized as an emerging public health problem in North East and South East Asian countries [8,9]. So far, studies of the seasonal variations of childhood HFMD infections have mainly focused on climatic driving factors such as temperature, humidity and rainfall [9]. The effects of seasonally changing social contacts on the dynamics of childhood HFMD for small areas have been overlooked, even though HFMD viruses are mainly transmitted by directly physical contacts between infected and noninfected children [10]. Additional research has been done for northeastern Asian countries [11][12][13], whereas some studies were also done for southeastern Asian regions [14,15]. Despite the increasing application of mathematical models to understand the epidemiology of infectious diseases, spatial-temporal statistical methods (STS) have not yet been well recognized and leveraged for studying seasonality of NTDs, i.e. to quantitatively explain the spatial-temporal dynamics of the disease outbreaks [16], especially for small geographical areas.
In this research, we study the seasonality of childhood HFMD in Viet Nam and its geographical variations at small areas as affected by seasonally changing weather and the social contacts. The study of Horby et al. in Viet Nam in 2011 [17] shows that the physical contact amongst U5s themselves is the most intensive, especially during the period at preschool. Hence, the social contact amongst U5s is measured by the annually cumulated period of time they spend at preschool since their first day at preschool. We distinguish between the trend and the seasonality. The trend establishes the long-term changing pattern of the mean level, whereas the seasonality represents yearly periodic variations. We propose a two stage STS analysis to extract the seasonal patterns and to identify the driving factors of the incidences of childhood HFMD infections in Da Nang city, Viet Nam as an application. In so doing, we aimed to identify and understand the effects of seasonally changing social contacts of U5s on the dynamics of HFMD outbreaks at multiple district level.

Study area and data
Da Nang city (Fig. 1), the biggest city on the south central coast of Viet Nam has an area of 1,285 km 2 and a population of more than 1 million people in 2016 [18]. Approximately 8% of Da Nang city's population consists of U5s [18]. According to the guideline from the Vietnamese Ministry of Health on diagnosis and treatment of HFMD, a confirmed HFMD case is defined when a child has a positive RT-PCR assay for CA16 or EV71. Weekly counts of new cases, i.e. the incidences from seven mainland districts of the city from 2012 to 2016 were obtained from the published reports of Da Nang city Preventive medicine center (https://ksbtdanang.vn/). Because these weekly counts have many small number of cases (< 5 cases per week), statistics based upon them are not reliable. To avoid this small number problem, the monthly count of new cases that is the sum of the weekly counts of new cases from this month was analyzed in this study. Note that the first and last weeks of the months can include 1 to 3 days of the previous or the next months. The demographical data were obtained from the demographic statistics of Da Nang city. The monthly population of U5s was interpolated from the yearly data with the assumption of a constant monthly growth rate. The monthly meteorological data from 2012 to 2016 were obtained from the Viet Nam institute of meteorology, hydrology and climate change (http://www.imh.ac.vn).

Spatial-temporal heterogeneity analysis
To analyze the spatial-temporal heterogeneity and identify the predominant risk factors of HFMD, we compared the qstatistic of the nine spatial-temporal stratification to that of without stratification: [19]; where: N is the total number of HFMD cases in the study area and σ 2 is the corresponding total variance, N h and σ 2 h are those of various spatial-temporal strata that are considered in the statistical test, H is the total number of these strata. The q-statistic for heterogeneity analysis regarding covariates was provided by the geodetector library of R [19,20].

Spatial-temporal stratification
To evaluate the effects of U5s as the population at risk on the magnitude of the seasonality of the childhood HFMD incidences, we stratified the districts by the density of U5s. This classification resulted into three spatial strata ( Fig. 1): S 1 is an urban stratum with the highest density of U5s (> 500 per km 2 ), S 3 is the rural stratum having the lowest density (< 100 per km 2 ) and S 2 is located in between.
We transformed the monthly counts of HFMD incidences to the monthly HFMD incidence proportions in order to correct the effects of the internal heterogeneity of the population at risk within the spatial strata. Let i = 1:12 index the months, j = 1:5 the years, d = 1:7 the districts and s = 1:3 the spatial strata. IR s ij is used to denote the incidence proportion at spatial stratum s in month i of year j. To account for unequal numbers of days in months, the IR s ij were adjusted as: [21], where I d ij is the monthly incidence proportion, N d ij is the population of U5s, D i is the number of days per month,

Modelling seasonality
The seasonality of IR s ij was diagnosed by using the seasonal index (SI). The original value of IR s ij for each stratum s in month i in year j was represented as the percentage of the mean value of the IR s ij for stratum s in year j. SI of stratum s in month i equals the median of the percentages from all 5 years: The median allows for the inclusion of the effects of extreme values over the years. The standard error of SI was derived from the bootstrap confident interval assuming a normal distribution. A value of SI s i > 100% indicates that the incidence proportion is above the yearly average incidence proportion and vice versa.
For the analysis of the dynamics, we applied to IR s ij an additive conceptual model with three main components: a trend (μ), a seasonality (ξ) and random noises (ε): We used the Fourier series to represent different oscillations of μðIR s ij Þ and ξðIR s ij Þ in (1) by a linear combination of sinusoidal functions with different frequencies, magnitudes and phases [23]: Here P and Q are the number of the sine and cosine functions representing the trend and seasonal components, respectively, μ s 0 and ξ s 0 are the intercepts defining the baselines of the trend and seasonal components, A s μp , B s μp , A s ξq and B s ξq are the Fourier coefficients and t ij indexes time. The amplitude of the trend equals: 1=2 ; that of the seasonal component equals: where sign is the function extracting the sign of a real number [24]. Iterative weighted least square estimation was used to fit (2) and (3) by minimizing the generalized cross- Fig. 1 Spatial strata in the mainland of Da Nang city, Viet Nam: S 1 is the most densely populated districts, S 3 has the lowest population density as compared to S 1 and S 2 validation value: GCV = RSS + λ × PEN L , where RSS is the weighted residual sum of squares, λ is the smoothing parameter and PEN L is a penalty term as a measure of Fourier series roughness [25].

Identifying driving factors of seasonality
To contrast the effects of preschooling period as a proxy measure of the social contacts and the weather season on the seasonality of the disease, we further stratified the seasonal components ξðIR s ij Þ into three temporal strata: T 1 is the dry and preschooling season from January to May, T 2 is the dry and summer holiday season from June to July and T 3 is the rainy and preschooling season from August to December. Hence, for all j and s, ξðIR s ij Þ belong to T 1 for i = 1:5, to T 2 for i = 6:7 and to T 3 for i = 8:12, respectively.
To relate ξðIR s ij Þ of T 1 , T 2 and T 3 to the potential risk factors, a spatial-temporal regression model was deployed: where N is the total number of the weather covariates, β s T 0 and β s Tn are the regression parameters, X s Tnij are the weather covariates, ε s Tij are the spatial-temporal autoregressive regression residuals, P Ti are the number of months at preschools, cumulated from August current year to May next year. Notice that P Ti did not vary with s and j as they were obtained by: P Ti ¼ i þ 5 for i∈T 1 0 for i∈T 2 i−7 for i∈T 3
To account for spatial-temporal autocorrelation remaining in the residual, we applied the Spatial-Temporal AutoRegressive model (STAR) [26] for the error term ε s Tij in (4): where M is the total number of the first-order spatial neighbors of S s , γ s T 1 and γ s T 2 are the temporal autoregressive coefficients, ρ s Tm are the spatial autoregressive coefficients, ðσ s Tω Þ 2 is the variance of the i.i.d. random error ω s Tij . The model parameters were estimated by maximum likelihood (ML) [27]. Schwarz's Bayesian criterion (BIC) was used to indicate the fitness of the models. The Shapiro-Wilk test [28] was used for normality test. The analyses and modellings were executed in R. 3.4.4, using mainly stat4 library [29] and fda library [30].

Seasonality at multiple district level
The seasonality of IR s ij was evidenced by the variations of the corresponding SI s i ( Table 1). For all spatial strata, the maximum SI s i > 100% fell into two periods, between April and May and between September and October. These indicate that, for example for S 2 in April, the median incidence proportion was 1.66 times higher than the average incidence proportion of the year. The minimum SI s i < 60% were in the period between December and January. These show that, for example for S 2 in January, the median incidence proportion was 2.5 times lower than the average incidence proportion of the year.
As outcomes from the iterative fitting processes, the optimally fitted trends of S 1 and S 2 had two sine and two cosine functions (P = 2) and the optimal logλ = 4 corresponding to the minimum values of the GCV equal to 2.28 and 3.33, respectively. For S 3 , the fitted model included four of each of the trigonometric functions (P = 4) with logλ = 2 and the minimum GCV equal to 6.1. The fitted models for the seasonality (3) have five of each sine and cosine functions (Q = 5) with logλ = 0 and the minimum GCV equal to 0.87, 2.6 and 4.1 for the S 1 , S 2 and S 3 , respectively. The trends show a gradual decrease from 2012 to 2015, whereas from the end of 2015 onwards, they increase (Fig. 2). The magnitude is highest in S 2 , being approximately double to those of S 1 and S 3 . Fig 3 presents the seasonal patterns of IR s ij . In T 1 , all the peaks fell within April as indicated by the maximum SI s i in Table 1. In T 3 , the peaks occurred in August (S 3 ), September (S 2 ) and October (S 1 ). All the deepest troughs were in January, whereas the small troughs occurred in June. The maximum amplitude of seasonality was equal to 4.56 infected U5s per 1,000 U5s at risk per month.

Driving factors of seasonality
The monthly relative humidity and sunny hours in Da Nang city in the study period were significantly correlated at lag zero with the average temperature (r lag0 = − 0.82 and 0.84, respectively). Meanwhile, ξðIR s ij Þ from all three spatial strata had significantly lower correlation with the rainfall than with the average temperature at lag zero (r lag0 = − 0.13, − 0.10, − 0.23 versus 0.74, 0.74, 0.63 for S 1 , S 2 , S 3 , respectively). The average temperature was apparently the sole explanatory weather variable included or   (4). The maximum average temperature of over 30°C occurred in May and June; the minimum of approximately 22°C occurred in January every year (Fig. 4). The ξ T ðIR s ij Þ all satisfied the Shapiro-Wilk normality test (pvalue < 0.05), thus satisfying the specifications of (4) and (5) without any need for transformation.
The results of the q-statistic show that the spatialtemporal stratification into nine strata reflecting the influences of the density of U5s and the monthly average temperature yielded a significant homogeneity within the strata. The q-statistic of the nine spatial-temporal strata was equal to 0.21 (p-value = 0.00), while that of the combinations of the months versus the districts was not significant, q-statistic = 0.26 (p-value = 0.24).
To compare the effects of the average temperature and the preschooling period (Fig 4), the parameters in (4) and (5) were consequently estimated for two sub-models: 1) with preschooling period or β s T ðNþ1Þ ¼ β s T 2 and 2) without preschooling period or β s T ðNþ1Þ ¼ 0 . Accordingly, every spatial-temporal stratum had two sets of estimated parameters, except for those in T 2 because children did not attend preschools in this period. The estimated results are presented in Tables 2 and 3. The effects of the spatial-temporal autocorrelations were taken into account in these estimations, resulting in estimated parameters as presented in Tables 4 and 5 in Appendix. The corresponding BIC values of the selected models were all minimized. T 2 is the transitional period between the dry and rainy season and also the period of summer holidays. Thus, the interest of studying the driving factor of the seasonality laid in T 1 and T 3 . Table 2 shows that in the sub-model one, the average temperature was negatively correlated with ξðIR s ij Þ in T 1 . This negative β s T 1 indicates a mismatch between the peak of the seasonality of the disease incidences and the average temperature. In sub-model two as can be seen in Table 3, the β s T 2 had negative values for all strata in T 3 ; whereas, the β s T 1 were minimal. The average temperature therefore had no significant effect in T 3 .

Discussion
In this section, we discuss our main findings, the consistency of the seasonal patterns extracted from IR s ij using our methods compared to other methods and we also highlight the limitations of this study.
In this study, we have decomposed the monthly time series of the adjusted incidence proportions of childhood HFMD infections for the spatial-temporal strata of the mainland of Da Nang city to reveal the trend and the seasonality. The yearly larger outbreaks simultaneously happened in all three spatial strata in April. The variation of the seasonality of the second smaller outbreaks in T 3 is intriguing. This variation suggests the effects of the season of increasing social contacts among U5s when they go to preschool on the onset of HFMD infection.
We have shown statistical evidences of these effects during the preschooling period. The positive values of the spatialtemporal regression coefficients imply the simultaneity of the peaks and vice versa [21]. The mismatch between the maximum incidence proportions and the maximum average temperature indicates that the increase of the average temperature was the driving factor, not its maximum values. In the last two quarters of all years, the average temperature was not the predominant risk factor. The contrary was evident in T 1 . At small areas of multiple districts in the study area, the increasing social contact has been shown to be the important driving factor of the geographical variation of the seasonality of the childhood HFMD outbreaks.
The estimated spatial-temporal regression parameters of (4) in T 3 are interestingly more informative than those in T 1 because all the smaller peaks occurred in T 3 . As the Fig. 4 Temporal variation of the driving factors of the seasonality of HFMD incidences: the monthly average ambient temperature (cyan line) and the cumulative preschooling period (red line) period at the preschools in T 3 cumulatively increased from August to December, the negative values of β s T 2 indicate the occurrence of the smaller peaks at the beginning of the preschooling period. Recall from the study of Horby et al. [17] that the physical contacts of U5s are most intense within this age group in Viet Nam. Given that the physical contact is the prominent means of passing HFMD viruses, the nature, frequency and duration of the contacts of those children during the preschooling period explain the occurrence and the geographical dynamics of these smaller outbreaks.
The trends show the occurrences of the large outbreaks at the beginning and at the end of the observed period. Study of a longer period of time would show a long-term oscillation of the large outbreaks with the frequency of every two to 3 years in this city. The possible reasons can be the required lapse of time to cumulate the critical susceptible population [31] or the existence of multiple viral strains in the study area [32,33].
The decomposed trends and seasonal components were highly correlated with those fitted by the method of seasonal and trend decomposition using Loess [34]. The correlation coefficients of the trends from both methods at lag zero varied from 0.85 to 0.90. Those of the seasonality varied from 0.83 to 0.97. The similar results from both decomposition methods imply the consistency of the seasonality derived from the Fourier decomposition method. This method provides a semiparametric approach to extract different components of a time series including the long-term and short-term irregular and regular fluctuations, taking into account effects of possible confounders [35,36]. Using this approach instead of the common seasonal ARIMA, the drawbacks of stationarization by differencing and statistical transformation can be avoided [37]. Moreover, Fourier decomposition in combination with the spatial-temporal autoregressive model allow both spatial and temporal autocorrelation existing in the data to be included into the model calibration. This reduces the biases in the estimation of the associations between the incidence proportions and the risk factors. Notwithstanding, from a statistical point of view, the small number of spatial strata in the study area places one of the limitations of the study to include the effects of the spatial auto-correlation. In addition, the separation the temporal trend from the time series was based mainly upon expert judgments. In other words, the efficiency of the Fourier decomposition in many cases relies on the expert's familiarity with the understudied phenomena.

Conclusions
HFMD has been becoming one of the most important pediatric NTDs in the Northeastern and Southeastern Asian countries. By applying spatial-temporal statistical analyses, the results have shown that at multiple district level, the social contact is the important driving factor of the spatial variation of the disease's outbreaks. This study provides statistical evidences of the effects of the seasonality of increasing social contact amongst U5s on geographical dynamics of HFMD infection outbreaks. Our findings contribute to extend the understanding of the underlying driving factors of the disease dynamics at small areas. This contribution is necessary to inform the next insightful research into the spatial-temporal dynamics of the transmission of the NTDs within the local population.