Lagging effects and prediction of pollutants and their interaction modifiers on influenza in northeastern China

Background Previous studies have typically explored the daily lagged relations between influenza and meteorology, but few have explored seasonally the monthly lagged relationship, interaction and multiple prediction between influenza and pollution. Our specific objectives are to evaluate the lagged and interaction effects of pollution factors and construct models for estimating influenza incidence in a hierarchical manner. Methods Our researchers collect influenza case data from 2005 to 2018 with meteorological and contaminative factors in Northeast China. We develop a generalized additive model with up to 6 months of maximum lag to analyze the impact of pollution factors on influenza cases and their interaction effects. We employ LASSO regression to identify the most significant environmental factors and conduct multiple complex regression analysis. In addition, quantile regression is taken to model the relation between influenza morbidity and specific percentiles (or quantiles) of meteorological factors. Results The influenza epidemic in Northeast China has shown an upward trend year by year. The excessive incidence of influenza in Northeast China may be attributed to the suspected primary air pollutant, NO2, which has been observed to have overall low levels during January, March, and June. The Age 15–24 group shows an increase in the relative risk of influenza with an increase in PM2.5 concentration, with a lag of 0–6 months (ERR 1.08, 95% CI 0.10–2.07). In the quantitative analysis of the interaction model, PM10 at the level of 100–120 μg/m3, PM2.5 at the level of 60–80 μg/m3, and NO2 at the level of 60 μg/m3 or more have the greatest effect on the onset of influenza. The GPR model behaves better among prediction models. Conclusions Exposure to the air pollutant NO2 is associated with an increased risk of influenza with a cumulative lag effect. Prioritizing winter and spring pollution monitoring and influenza prediction modeling should be our focus. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-023-16712-6.


Introduction
Influenza is an acute viral respiratory illness in humans, usually characterized by fever, headache, muscle pain, weakness, nasal congestion, sore throat and cough.Seasonal epidemics of influenza viruses can spread rapidly and cause significant morbidity and mortality worldwide [1][2][3].Globally, influenza is estimated to cause 3 to 5 million severe cases and 290,000 to 650,000 deaths related to respiratory infections every year [4].While certain non-pharmaceutical interventions that can effectively control influenza in the early stages exist, such as the use of masks, hand washing, and other hygiene measures, or even closing schools, influenza is still a big threat to human health [5].As evidenced by the trends of influenza incidence in such countries as Europe, the United States and Japan [6,7], the situation remains grim and global preventive measures have little impact on the trend of influenza outbreaks.Currently few models can effectively predict influenza outbreaks.By monitoring influenza morbidity indicators on a daily basis after environmental factors that tend to predict outbreaks can be identified and modelled, we could predict influenza virus outbreaks in advance to reduce or even prevent influenza with associated costs.
In early studies we can assume that air pollution may contribute to influenza-induced morbidity.An epidemiological investigation suggests that particulate matter ≤ 10 μm (PM 10 ) and ozone (O 3 ) should be considered when forecasting the incidence of influenza [8,9].Influenza viruses have been detected in polluted waters, possibly originating from bird excretion carrying the virus, as per some studies [10].As shown in the 2002-2003 SARS pandemic and the 2009 H 1 N 1 influenza pandemic, influenza viruses are mainly transmitted through respiratory droplets.So air pollutants such as particulate matter (PM) and carbon monoxide (CO) may influence the transmission and prevalence of influenza viruses [11,12].In addition, secondary human-to-human transmission may occur and the outbreak may lead to the closure of schools and workplaces.Previous studies have investigated how meteorological factors facilitate the transmission of influenza among regions worldwide.A study in the UK showed that influenza viruses prefer low temperatures in temperate regions [13], while researchers in Canada have observed that the increase of influenza viruses is associated with low temperatures and high relative humidity [14].The different subtypes of influenza viruses that have infected humans in recent decades include the H 10 N 8 , H 5 N 6 and H 9 N 2 , most of which were firstly reported in China [15].
Numerous studies, both domestically and internationally, have investigated the correlation between the seasonal distribution patterns of influenza and meteorological and pollution factors.In China, Yuzhou Zhang et al. [16] explored the effect of different meteorological factors on influenza incidence in Shanghai by developing a distribution lagged nonlinear model (DLNM).Some researches indicated that in the multi-day lag model, there was a statistically significant correlation between SO 2 , NO 2 and O 3 concentrations and influenza risk between lags 0 and 1 [17].Similarly, in Nanjing, Lei Huang et al. [18] found that PM 2.5 and NO 2 were associated with an increase in influenza cases.Air pollutants significantly affect the susceptibility of human respiratory epithelial cells to influenza virus infection by increasing virus attachment and entry.
The overall objective of this study is to explore influenza epidemic characteristics, lagged effects of pollutants and develop models suitable for predicting influenza virus outbreaks.Our specific objectives are to: a) screen environmental predictors of influenza outbreaks; b) evaluate the lagged and interaction effects of pollution factors; c) construct models for estimating influenza incidence in a hierarchical manner, selecting appropriate models for different characteristics.

Materials and methods
Figure 1 shows the geographical location of the study area-Heilongjiang, Jilin and Liaoning provinces, which lie between 120° and 135° E and 40° and 53° N latitude in China.These three provinces are located in Northeast China with a medium level of economic development and population density.
We collected influenza case surveillance data from the National Public Health Data Centre of China between 2005 and 2018.All patients are diagnosed according to the criteria of influenza management issued by the Ministry of Health of the People's Republic of China.We obtained the corresponding daily weather data including air temperature, dew point temperature etc. from the China Meteorological Data Sharing Service.Pollutant information is originally from the National Oceanic and Atmospheric Administration (NOAA) including CO, NO 2 , O 3 etc.

Statistical methods
To address missing values in the influenza epidemic and meteorological pollution data, we use multiple interpolation to fill them.LASSO regression analysis is used for feature selection in response to the effects of meteorological and pollution factors.We develop quantile regression models and generalized additive models [13] with a maximum lag of 6 months to assess the extreme effects of pollution and meteorological factors on influenza cases, lags, and interactions between pollutants, respectively.Finally, we make prediction by complex regression models.All analysis in our study is performed in R software (version 4.1.3).

The screening of lasso regression with environmental variables
There are N sets of observations, each consisting of a total response variable y and p associated characteristic variables x i = (x i1 , …, x ip ) T .A linear regression model can be set as follows: where β 0 and β = (β 1 , β 2 , …, β p ) are unknown parameters and e i is the error term.The introduction of some variables in practical problems not only complicates the calculation, but also risks increasing the data covariance, thus affecting the model fit.We can use lasso regression to estimate the parameters by solving the following problem: where � β � 1 = p j=1 β j is the l 1 parametrization of β and t is the specified tuning parameter.Overall, the lasso method improves the overall prediction accuracy, and the inclusion of the constraint term compresses the coefficients of some of the eigenvariables in the model to zero, thus enabling the selection of important variables among the many eigenvariables.

The lagging and interaction effect of generalized additive model [13]
The models are listed as follows (Model 1): Here, Y t is the number of monthly counts of influenza cases in monthly t; α1 is the intercept of the whole model; S () is a smoothing function, and the penalty spline method is often used to smooth the parameters; M represents the estimated environmental variable related to influenza; β is the regression coefficients.The optimal degrees of freedom (df) for the spline function are estimated by Akaike information criterion for Poisson (AIC) and Minimum partial regression coefficient (PACF min ) criteria.Subsequently, we explore the interaction of pollutants on the prevalence of influenza.The model can be written as follows (Model 2): α2 is the intercept; X 1 indicates two of the interaction factors whereas X 3 denote the other one; S(X 1 , X 2 ) is a spline function of the interaction between the parameters X 1 and X 2 .M is the meteorological factors.

The establishment of Gradient boosting regression tree (GBRT) and Gaussian process regression (GPR) model
If the input training set is: T = {(× 1, y1), (× 2, y2), …, (xN, yN)}, the training samples i = 1, 2, …, N, the number of iteration rounds t = 1, 2, …, T, and the loss function is L, then the GBRT algorithm is divided into the following three steps: First, initialize the weak learner: Next, calculate the negative gradient r ti and the output of each leaf node region of the regression tree, R tm output value c tm , and update the strong learner: Finally, get strong learners: From a function space perspective, a Gaussian process [19] is defined to describe the distribution of the function (f(x)).The GP is the set of any finite number of random variables that have a joint Gaussian distribution, and its properties are determined entirely by the mean and covariance functions, that is: where x,x' ∈ R are arbitrary random variables.Thus GP can be defined as f (x) ~ GP(m(x), k(x, x')) and the mean function is generally taken to be 0 (m(x) = 0).For the regression model as follows: where x is the input value, f is the function value, and y is the observation plus the observation affected by noise, if noise ε ∼ N 0, σ 2 n yields a priori distribution of the observation y as follows:

Influenza surveillance in Northeast China
From 2005 to 2018, a total of 32,989 influenza cases were reported in the three eastern provinces of China, showing an increasing trend every year (Table 1 and Fig. 1).Heilongjiang province exhibited a significantly high level of epidemic in the first few years, followed by Liaoning province, which has consistently been the main epidemic area since then and had 14,921 reported cases of influenza by 2018.Young children aged 5-14 years and young adults aged 25-59 years had the highest incidence of influenza, accounting for 55.35% of all reported cases (Table 1).Significant differences in the incidence of influenza were observed in terms of seasonality, age, and region (P < 0.05).

The screening and extreme effect for meteorological and pollutant factors among influenza prevalence
We apply a fivefold cross-validation to select a model with small and stable error fluctuations and a parameter λ of 28.3327.The results of the runs are shown in Fig. 2 and Table S1.After the initial selection of the lasso method, six of the variable variables including air temperature (AT), dew point temperature (DPT), sea level pressure (SLP), NO 2 , PM 10 and PM 2.5 are selected and the coefficients of the other independent variables are contracted to zero.

Exposure-response relationships for pollutants with different lag times
From the line graph of Figure S1 regression coefficients, we can see that SLP does not fluctuate at different levels, while PM 2.5 fluctuates at different concentrations, but the overall effect increases with the quantile.In addition, PM 10 , NO 2 and AT has a negative effect on influenza incidence at different levels, and the overall effect also increases with the quantile.Significant differences were observed between the age groups of 25-59 years and 60 + year.In Table 2 and Fig. 3, in the single-pollutant model we find a negative association between short-term exposure to NO 2 (within 1 month) and monthly influenza incidence (ERR-2.68%(-4.72%, -0.60%)), and this implies that low levels of NO 2 may be the most responsible air pollutant for excess influenza incidence.The Age 25-59 years group is the most susceptible to NO 2 , followed by the Age 0-4 years group, and the ERR increases with lag in both groups, with essentially no lag in the 60 + years age group.NO 2 showed a positive correlation with influenza incidence in patients aged 15-24 and 25-59 years at a 3-month lag.At a 5-month lag, there was a positive correlation between influenza incidence and NO 2 in patients aged 0-4, 15-24, and 25-59 years.At a maximum lag time of 6 months, the influenza incidence in patients aged 15-24 shows a positive correlation with PM 2.5 .

Interaction and comparison of multiple-pollutant model
We develop a multi-pollutant model with a single-day lagged ERR maximum and significance test, yielding a lagged day of 5 days for PM 10 , PM 2.5 and 4 days for NO 2 .Figure S2 revealed that PM 10 has interactive effects with PM 2.5 and NO 2 on influenza incidence.Additionally, there is a weak positive correlation between pollutants and the risk of incidence.Moreover, ambient temperature (AT) exhibits positive correlation with the risk of incidence at low temperatures and inverse correlation at high temperatures, while dew point temperature (DPT) shows an inverse trend compared to AT. Figure 4 indicated a non-linear effect of pollutants on influenza onset, with PM 10 levels of 100-120 μg/m 3 and PM 2.5 levels of 60-80 μg/m 3 , and NO 2 levels above 60 μg/m 3 exhibiting the greatest impact.Results from the statistical test presented in Table 3 suggest that the PM 10 and PM 2.5 interaction model is better (R 2 = 99.1%).These findings demonstrate the significant impact of air pollutants on influenza onset.
From the comparison of the parameters of the two modelling approaches in Table 4, the model fit is the best in Liaoning Province among different regions (R 2 > 70%), and the model fit is the best among Age 25-59 groups, while the GPR model shows the same fit as the GBRT model.

Discussion
Based on this study, it is determined that influenza epidemics in the northeastern region of China exhibit pronounced seasonality of winter-spring and demonstrate an upward trend annually.Prior to 2018, the province of Liaoning continued to be the primary epicenter of influenza outbreaks within the northeastern region, with a higher prevalence of cases observed among young children and adolescents, which could be attributed to their relatively weaker immune systems rendering them more susceptible to influenza viral infections [20].
Our single-pollutant lagged modeling yields intriguing results, indicating that lower concentrations of NO 2 in January, March, and June may be the primary contributing factor to excessive influenza incidence.Exposure to NO 2 can lead to reduced virus-specific immunity and increased cellular inflammation, potentially causing the onset of influenza virus, regardless of whether it occurs before or after respiratory virus infection.The relative risk of influenza associated with NO 2 exposure increased with higher NO 2 concentrations in the age groups of 0-4, 15-24, and 25-59.This suggests that young adults may have a higher susceptibility to influenza under NO 2 exposure due to the rapid release of immune cells stimulated by the virus, resulting in an increased relative release of immune cells.The disruption of immune cell homeostasis caused by the rapid release of immune cells stimulated by influenza viruses may explain the higher susceptibility of young adults to influenza under NO 2 exposure.This could be the result of a relative increase in immune cell release in this age group [21].In the age group of 15-24, the relative risk of influenza associated with PM 2.5 exposure increased with higher PM 2.5 concentrations at lag 0-6 months.This suggests that long-term exposure to high levels of PM 2.5 beyond 6 months may increase the risk of influenza.This was similar to the results of a monitoring study that found: Age 0-4 were significantly susceptible to PM 10 and NO 2 ; Age 5-14 were significantly susceptible to PM 2.5 and PM 10 ; and Age 15-24 were significantly susceptible to all air pollutants analyzed [18].During the remaining lag months with low concentrations of PM 2.5 , PM 10 and NO 2 , the onset of influenza is mainly attributed to factors other than air pollution, as indicated by ERR < 0. This may be due to low levels of external pollution, which increase population activity during high influenza season (spring, winter) and result in aggregated activity, thereby enhancing the risk of influenza.
The longitudinal study shows that the highest overall risk (ERR) of influenza onset is observed for pollutants with a lag of 5-6 months, indicating that long-term exposure to pollutants may primarily promote influenza onset.However, in the age groups of 15-24 and 60 + , NO 2 exposure is associated with a higher risk of influenza onset at an early stage.The lagged effects of PM 2.5 , PM 10 and NO 2 are characterized by a bimodal distribution, with a significant decrease in the risk of influenza onset during the first 2 months of exposure.However, the risk of influenza onset is found to be higher during the 2nd-3rd and 4th-5th months of exposure to these pollutants.Several studies have suggested that influenza viruses exhibit higher sensitivity and pathogenicity during the winter season, based on experimental findings [22].This might also be consistent with the findings of some surveys: The correlation between air pollutants and influenza varies by season and region, with higher effects estimated for the cold season, eastern and central regions, and provinces with wetter conditions and larger populations [23].In the multi-pollutant lagged interaction model, ambient temperature (AT) is found to be positively correlated with the risk of influenza onset at low temperatures, while pollutants are only weakly positively correlated with the risk of morbidity.This suggests that lower temperatures may facilitate the spread of pollutants, thereby exacerbating the spread of influenza viruses and causing the onset of influenza.These findings are consistent with the seasonal characteristics of influenza onset, which predominantly occur during the winter and spring seasons in this study.The quantitative analysis of interaction models reveals that the interaction between PM 10 and PM 2.5 has a more significant effect on influenza onset.Several studies have suggested that particulate matter (PM) may stimulate macrophage apoptosis in lung tissue, which could exacerbate the damage caused by influenza viruses to the respiratory tract.This may explain why the combined effect of PM 10 and PM 2.5 is more detrimental to influenza onset [24].During heavy pollution, reduced outdoor activities and increased indoor activities can heighten the risk of influenza [25].Hence, the significance of the current study lies in the investigation of the link between pollutants and the development of influenza, which is in line with various domestic and international studies [17,26,27].
Temperature and sea level pressure are the most relevant meteorological factors in our study, as they can affect the transmission of pathogens and impact human immune function, leading to respiratory disease development [28][29][30].In our final GAM model, we did not find a significant association between meteorological factors and influenza incidence.This may be due to the potential confounding effect of regional environment and lifestyle habits on the transmission of influenza.
Our analysis involves multiple models exploring the relationship between environmental factors and influenza incidence, and subsequent subgroup analyses  It is notable that our study quantifies the impact of various factors on influenza incidence.The use of the GAM model allows us to control important confounding factors and examine the long-term monthly lagged effects of co-exposure to PM 10 and PM 2.5 .However, it should be noted that our study is conducted at an aggregate level and does not involve individual-level analysis.And the data is only collected from the Northeast region, so caution is needed when extending our findings to other regions.However, our study provides a starting point for future population epidemiology studies with larger samples and broader geographic coverage.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Fig. 1
Fig. 1 The geographical location of Northeast in China The map was created by ArcGIS 10.3 (Environmental Systems Research Institute; Redlands, CA, USA).The base map was acquired from the data center for geographic sciences and natural sources research, CAS (http:// www.resdc.cn/ data.aspx?DATAID= 201)

Fig. 2
Fig.2The process of lasso regression variable screening

Fig. 3 Fig. 4
Fig.3The associations between ambient air pollution and monthly Influenza prevalence with total and all ages

Table 1
Distribution of the influenza cases by age, region and season group in northeast China, 2005-2018

Table 3
Test of interaction model of multiple pollution factors

Table 4
Comparison of the prediction results with the gradient boosted regression tree (GBRT) and gaussian distribution regression (GPR) models demonstrate significant differences in the predictions made by region and age group.Interestingly, the Gaussian process regression (GPR) model outperforms the Gradient Boosting Regression Tree (GBRT) model in terms of predictive accuracy.In conclusion, the study suggests that Liaoning Province is proficient in predicting influenza outbreaks and accounting for environmental factors.Moreover, it serves as a suitable representative region for the northeastern part of China.Our study also finds that the model's fit and validation are satisfactory for the age group of 25-59 years, who are susceptible to influenza outbreaks.However, the predictive stability is suboptimal for the age group of 5-14 years, possibly due to the clustering of young individuals during the cold season.