Particulate matter (PM10) prediction based on multiple linear regression: a case study in Chiang Rai Province, Thailand

Background The northern regions of Thailand have been facing haze episodes and transboundary air pollution every year in which particulate matter, particularly PM10, accumulates in the air, detrimentally affecting human health. Chiang Rai province is one of the country’s most popular tourist destinations as well as an important economic hub. This study aims to develop and compare the best-fitted model for PM10 prediction for different seasons using meteorological factors. Method The air pollution and weather data acquired from the Pollution Control Department (PCD) spanned from the years 2011 until 2018 at two stations on an hourly basis. Four different stepwise Multiple Linear Regression (MLR) models for predicting the PM10 concentration were then developed, namely annual, summer, rainy, and winter seasons. Results The maximum daily PM10 concentration was observed in the summer season for both stations. The minimum daily concentration was detected in the rainy season. The seasonal variation of PM10 was significantly different for both stations. CO was moderately related to PM10 in the summer season. The PM10 summer model was the best MLR model to predict PM10 during haze episodes. In both stations, it revealed an R2 of 0.73 and 0.61 in stations 65 and 71, respectively. Relative humidity and atmospheric pressure display negative relationships, although temperature is positively correlated with PM10 concentrations in summer and rainy seasons. Whereas pressure plays a positive relationship with PM10 in the winter season. Conclusions In conclusion, the MLR models are effective at estimating PM10 concentrations at the local level for each seasonal. The annual MLR model at both stations indicates a good prediction with an R2 of 0.61 and 0.52 for stations 65 and 73, respectively. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-12217-2.

events [2]. The pollution in Southeast Asia is due to both natural factors and human activity. The anthropogenic sources are transportation, industrial processes, household activities, and agricultural burning. Moreover, pollutants are released naturally from forest fires. Many common characteristics of ASEAN countries will be tropical climatic conditions, which can result in extreme temperatures, rainfall, and high relative humidity. In addition, biomass burning is a major regional source of particulate matter in the atmosphere, most notably during the dry seasons [3]. These features introduce a large variability of haze characteristics distributed over this region. It was almost a decade ago that these regions started experiencing air quality problems that the haze episodes brought annually in the upper north of Thailand [4,5]. Almost all eight provinces in the upper north of Thailand are mountainous ranges and valleys. Identifying the transboundary of haze in tropical mountain cities will contribute to a growing body of knowledge currently being developed in different parts of world [6]. The particulate matter (PM) is an important pollutant present in the atmosphere that can penetrate the respiratory system and is a health hazard. High concentrations of particulate matter have caused disturbances to the environment, such as degraded atmospheric visibility, and to human health, such as acute or chronic respiratory diseases [7][8][9].
Thailand is one of many countries in this region that have had environmental concerns. During the dry season every year, the north of Thailand experiences haze episodes. PM 10 is one of the key factors for government monitoring and surveillance by the Pollution Control Department (PCD), Ministry of Natural Resources and Environment, Thailand. Haze is determined when average daily concentrations exceed 120 μg/ m 3 (National Ambient Air Quality Standard) [10]. Chiang Rai is a popular tourist destination and the northernmost province of Thailand, bordered by the Shan state of Myanmar and the Bokeo province of Laos. Chiang Rai has a total area of 11,678.37 km2 and a population of 1.28 million. This province is suffering from various air pollution factors, such as haze transboundary, biomass burning, and forest fires, [11]. From March 2014 to 2016, researchers studied the PM 10 measurement station in Chiang Rai province and discovered that 51, 28 and 21% of the hotspots in Myanmar, Lao PDR, and Thailand, respectively, primarily moved across the province's south-western border. Haze has emerged every year during the transition between the cold and dry seasons. The haze episode caused not only an air pollution problem, it also affected the socioeconomics in this province. Tourist activities and related services were cancelled due to the haze problem. There might be benefits for all related sectors in preparing for the unpredictable event. This study aims to support the local organization to forecast the haze episode by using the available monitored data. The overview of air pollution in this study focuses on the investigation of the correlation between air pollutants (PM 10 ) and meteorological parameters. Statistical studies using meteorological data and air pollution monitoring data have confirmed that meteorological conditions affect atmospheric pollution in numerous ways [1]. However, the most important role of meteorology is the effect on the dispersion, transformation, and removal of atmospheric pollutants from the atmosphere and finally affects the spatial-temporal characteristics and pollution levels of atmospheric pollutants. Some researchers reported that the meteorological factors influencing PM 10 , such as wind direction and speed, pressure, relative humidity, etc. This study therefore investigated their relationships in different scenarios, such as throughout the year and seasonal variation. The weather in different seasons might have influenced the PM 10 only in some seasons. This study focuses on the following: (1) Investigating the temporal variations of PM 10 in Chiang Rai, Thailand, between 2011 and 2018; and (2) Examining the effect of meteorological and air pollution factors on the seasonal variation of PM 10 concentration distribution. (3) the establishment of MLR models for the three different seasons in Chiang Rai province. The outcomes of this study give insight into the sources of pollutants in Chiang Rai, and how pollutant behavior is influenced by concentrations and factors of interrelationships in pollutant behavior. The results can be used for information distribution to local communities and people for their response and preparation. In addition, our findings will be beneficial in supporting the sustainable development goals (SDGs), particularly targets 13 (Climate Action), 3 (Good Health and Well Being), 12 (Sustainable Consumption and Production), and 17 (Partnership). Referring to target 13, climate action might be the drive or pressure to reduce the use of fossil fuels and GHG (Green House Gas) emissions reduction. As stated in target 12, air pollution and GHG emissions are linked to fossil fuel consumption and human activities. Target 3 is the consequence of human activities. Good health and wellbeing are directly linked to the environment, such as air quality and socio-economic status. In order to achieve the goal for each target, collaboration among various organizations in both national and international networks is needed to strengthen it.

Study area and data collection
Transboundary haze events are caused by large-scale biomass combustion in the northern parts of Thailand. The haze events usually occur during the months of mid-February to mid-May (dry season) every year. Figure 1 shows the location of the affected area, where air pollution data was obtained from the Pollution Control Department (PCD), Thailand observation station. In particular, the majority of PM data available has been collected using the Beta ray absorption or Beta-gauge attenuator, and the Tapered Element Oscillating Microbalance (TEOM) techniques have been used, including air quality monitoring stations in Chiang Rai province. Daily PM 10 concentration data were collected at two stations for 7 years, from January 1, 2011, to December 31, 2018 (station 65) and from April 1, 2011, to December 31, 2018 (station 73).

Statistical and temporal analysis
This is an annual analysis of daily PM 10 from 2011 to 2018 at the Chiang Rai station (65 and 73). The data was tabulated using Microsoft Excel Spreadsheet ® and analysis of the data were carried out using statistical software, R-studio open air package. The Bonferroni correction multiple comparison test was used to estimate differences between mean concentrations of PM 10 among seasonal periods across the year at 5%, and Spearman's rank correlation coefficient aimed to determine the interaction between PM 10 and meteorological factors.
The MLR model is essential in determining how the meteorological factors affect air pollutant concentrations. Thus, the PM 10 concentrations can be treated as a response to the meteorological variables as predictors. The model is itemized in Equation [12].
where, y is the dependent variable, b 0 is the regression intercept (constant term), b i is the regression coefficient (independent variables), x i is the explanatory variable, ε is the stochastic error associated with the regression. For analysis, the multicollinearity is defined as the variance inflation factor (VIF) to calculate for meteorological factors in these models. The multicollinearity analysis is used for independent variables. Our independent variables were both air quality data and meteorological data. Therefore, it is assumed that multicollinearity between selected predictors is not present [13,14].

Trajectory models
The HYSPLIT (hybrid single particle Lagrangian integrated trajectory) model [15] has been applied in most of the studies. The airmasses are responsible for the export and import of pollutants deposited in the country and neighboring areas [16][17][18]. Formalized paraphrase. The focus of this study was on the back trajectories of air parcels detected at 2 air quality monitoring stations in Chiang Rai Province. The direction analysis of air mass movement in reverse, which selected the date of the highest PM 10 at the top of each year, considered a period of 24 h.

Descriptive statistics
The characteristics of PM 10 data from 2011 to 2018 in Chiang Rai province are summarized in Table 1; The daily PM 10 concentration is greater than the national ambient air quality standard (NAAQS) of 120 μg/m 3 . The maximum 24-h concentrations of PM 10 were 371.1 and 129.6 μg/m 3 at stations 65 and 73, respectively. The annual average concentration was 41.9 at station 65, which was slightly higher than at station 73 (37.4 μg/m 3 ). However, the maximum concentration can be detected at any time of the day. Figure 2 shows that the daily average concentration of PM 10 presents a similar pattern during the year 2011 to 2018. This figure shows the behavior of PM 10 concentrations at different times. The concentration of PM 10 seems to have a similar trend from the start of the year to the end of the year, whereas maximum (summer) and minimum (rainy) concentrations occur at different times. While considering seasonal variations of PM 10 was higher during the summer compared to another season. Similarly, both station concentrations of PM 10 were higher in 2012, 2013, 2014, 2016 and 2017 than other year. Also, the seasonal for the seasonal fluctuation of the pollutants are not only caused by seasonal variation but also meteorological variable [19,20].

Seasonal meteorological variables
The variation of meteorological parameters was different in different seasons depending on the parameters. In general, the seasons in Thailand are classified into 3 seasons: the dry season or summer season starts from mid-February to mid-May, the rainy season occurs from  mid-May to mid-October, and the winter season is the period from mid-October to mid-February. In this study, the analysis of differences among seasonal variation in measurable climatic parameters in both monitoring stations. The variation of climatic parameters was dissimilar in different seasons depending on the parameters.
The difference was tested by ANOVA in each station as illustrated in Table 2. Concerning the climatic parameters, there was no difference in pressure in both stations for the rainy and winter seasons. A difference in temperature at station 65 between the rainy and winter seasons. Other climatic parameters are seasonal differences in both stations. The variation in PM 10 concentrations based on Bonferroni multiple comparison test among different seasons is shown in Table 3. However, high PM 10 concentration was observed in the summer period in both stations. Therefore, the mean comparison of PM 10 concentration between seasons was carried out by using the Bonferroni method. According to the study, the mean concentration of PM 10 was significantly higher during the summer than during the winter and rainy seasons combined in a year. The highest concentration was observed in summer, in both stations. The comparison of average PM 10 concentration by season was determined by Bonferroni analysis is vary with shifting seasons [21]. Same as a study from Cichowicz et al. mention that seasonal variation of air pollution is associated with variety of seasons [22]. We found a significant difference in both stations as illustrated in Table 3 (p < 0.001).

Comparison of MLR models
The MLR results are obtained using the annual data of Chiang Rai province. Even though available data related to PM 10 has indicated different seasons, they have been fitted for each season to examine their respective regression presentations. The coefficients corresponding to the different seasonal models are shown in Table 4. From the obtained models, it can be explained that CO was the dominated parameter of PM 10 concentration. For example, in the annual model of both stations, the coefficient of CO was 56.6 in station 65, compared to 1.3 of temperature, 0.3 of humidity, and 0.7 of pressure. It indicated that the change of CO 1 unit induced the change of PM 10 concentration of 56.6 μg/m 3 . Figures 3 and 4 shows the scatter plot for the model fitting of Chiang Rai's PM 10 data from 2011 to 2018. The fitted line was generated by Excel software packaging, which is based upon the least squares method to find   Y r = 67.0 + 9.3x 1 + 0.4x 2 + 0.03x 3 − 0.3x 4 − 0.03x 5 Y r = − 124.0 − 6.11x 1 + 4.4x 3 + 0.5x 4 Y w = 11.4 + 10.7x 1 + 1.5x 2 + 0.7x 3 − 0.2x 4 + 0.03x 5 Y w = − 33.9 + 58.4x 1 + 1.9x 2 out the linear trend with the best fitness among the scattered points. R 2 and RSME for the MLR model in annual data from station 65 (Fig.3) were 0.61 and 22.15 μg/m 3 , respectively. In the summer, it was 0.73 and 27.95 μg/m 3 respectively. In station 73 (Fig. 4), R 2 and REME were 0.52 and 15.83 μg/m 3 annually, 0.61 and 16.45 μg/m 3 for summer respectively, and the range of VIF for the independent variable was lower than 10 as 1.07-2.47 [12], which indicated that there was no multi-collinearity in variables. Moreover, the Durbin-Watson test showed that the range values for all models were still within the 0-4 range; Station 65 was 0.63, 0.41, 0.85 and 0.64 for PM 10 annually PM 10 , summer, PM 10 , rainy, and PM 10 , winter respectively, and for station 73 were 0.67, 0.98, 0.64 and 0.1.18 for PM 10 , annual, PM 10 , summer, PM 10 , rainy, and PM 10 , winter respectively. Thus, it indicates that all of the models do not have any first-order autocorrelation problems as the range values [12]. Chiang Rai is a tropical zone and has a temperate monsoon climate characterized by precipitous, hot summers and other specific seasonal characteristics. The PM 10 monitoring data were further classified into three seasons: summer (mid-February to mid-May); rainy (mid-May to mid-October); and mid-October to mid-February. As can be seen in Table 4, the mean PM 10 concentrations for summer and winter exceeded those of the rainy season. Table 5 shows that the results of PM 10 regression in the three seasons show high fitness for summer and winter, both with R 2 greater than 0.40; however, the rainy season is lowest, with a R 2 of only 0.12-0.24.
The correlation between PM 10 and the other parameters and variables is shown in Table 6, During the study period, there was an extremely strong correlation between the mean concentration of PM 10 in the summer season and those of CO (r = 0.7, 0.5), and O 3 (r = 0.5, 0.6). In Chiang Rai province, PM 10 concentrations were negatively correlated with RH (r = − 0.6, − 0.6) in all seasons, suggesting that the high humidity level allows PM 10 removal. Sometimes the increment in rainfall occurrence is accompanied by in-cloud scavenging [6], and relative humidity influences particle movement and can settle PM 10 at ground level [20]. On the other hand, the correlations with temperature were strongly positive in all seasons except for the winter, which is due to the significant role temperature plays in particulate matter. According to the high PM 10 concentrations during warm days, which can be related to enhanced photochemical activity on days with high solar intensity and the possible formation of secondary particulate matter [6,23].

PM 10 dispersion and backward air mass trajectory analysis
The peak of PM 10 concentration (Fig. 2), recorded at Chiang Rai station, was found in March of 2012 to 2016, and  April of 2011 and 2018. The weather data was obtained from the National Oceanic and Atmospheric Administration (NOAA) website by identifying the locations of both sites. The trajectory map indicated that 13 days of air movement were generated from neighboring countries from 24 days of records in Chiang Rai station (supplement 1). While at Mae Sai District Station (station 73), we discovered 20 days of air moved to a neighboring country [17,18]. However, the weather in Mae Sai district is likely to be affected partially by the PM 10 invented in neighboring countries. More than Chiang Rai Station (station 65).

Conclusion
The PM 10 concentration levels and meteorological data of Chiang Rai province were collected from 1 January 2011 to 31 December 2018 (Station 65) and 1 July 2011 to 31 December 2018 (Station 73). The higher levels of PM 10 were observed in Chiang Rai province (station 73) with values ranging from 3.0 μg/m 3 to 479.1 μg/m 3 and a mean concentration of 52.3 μg/m 3 . Temperature relative to humidity and pressure provide the highest influence on the level of PM 10 concentration. Relative humidity and pressure showed an inverse relationship, thus a decrease in PM 10 impact, even though temperature showed a positive association with PM 10 concentrations. The difference in PM 10 concentration between dry and wet seasons can be caused by scavenging processes in rain in the wet seasons. According to the MLR model, the influences of CO, O 3 , RH, temperature, and pressure on PM 10 concentrations during the annual, summer, and winter seasons are significant. The R 2 values for the annual summer, rainy, and winter seasons are 0.61, 0.73, and 0.40 (station 65) and 0.52, 0.61, and 0.67 (station 73), respectively. This research concerned only temperature, relative humidity, pressure, and other meteorological factors to determine the relationships, but the effects of other parameters are well documented and, thus, future studies will have more added variables to solve the issue more efficiently.