Skip to main content

Table 6 Summary of studies forecasting air pollution distributions and related variables using data mining methods

From: A systematic review of data mining and machine learning for air pollution epidemiology

Author

Year

Sub-field

Environmental agent

Data mining techniques

Objective

Kolehmainen et al. [60]

2001

Outdoor air pollution

NO2

ANN

Comparing two Neural Nets for their suitability in forecasting Air Quality

Kukkonen et al. [33]

2003

Outdoor air pollution

PM NO2,

ANN

Machine Learning Model comparison for forecasting NO2 and PM10 concentrations

Niska et al. [59]

2004

Outdoor air pollution

NO2

Genetic Algorithms, ANN

Investigate the use of GA to find a better ANN model to forecast air quality

Ghanem et al. [69]

2004

Outdoor air pollution

SO2,C6H6,NO,NO2,O3

Hierarchical clustering

Monitor chemicals and outline challenges related to collection and processing.

Corani [68]

2005

Outdoor air pollution

Ozone, PM10

ANN, Lazy Learning

Predict levels of air pollutants from meteorological and other local variables.

Dominici et al. [67]

2006

Outdoor air pollution

PM2.5

Bayesian Hierarchical Models

Assess the association of air pollution levels with the number of deaths per day

Ma et al. [58]

2008

Outdoor air pollution

SO2, O3, NOx, C6H6

k-means

Developing a distributed air pollution monitoring system & use data mining to find patterns of pollutant distribution

Pegoretti et al. [62]

2009

Indoor air pollution

Rn

Geostatistical Models, KNN

Forecasting the indoor Radon concentrations

Aquilina et al. [39]

2010

Outdoor air pollution

particle-associated PAH

DT, ANN

Predict personal exposure to particle-associated polycyclic aromatic hydrocarbons (PAH)

Padula et al. [57]

2012

Outdoor air pollution

Traffic-related pollution

Targeted maximum likelihood estimation

Estimate the probability of low birth weight among full-term infants based on the mother’s exposure to traffic-related air pollution

Zhu et al. [35]

2012

Urban outdoor air pollution

SO2, NO2, PM10, Respiratory diseases

ARM, GMDH

Forecasting the number of respiratory patients based on the seasonal effects of air pollution

Singh et al. [24]

2013

Outdoor air pollution

AQI

PCA, Ensemble Decision DT, SVM

Predicting the Air Quality and identifying major sources of air pollution

Beckerman et al. [66]

2013

Outdoor air pollution

NO2, PM2.5

GLM

Develop a better land use regression model for using machine learning methods

Pandy et al. [38]

2013

Outdoor air pollution

UFP, PM

DT, RF, etc.

Test machine learning classifiers for predicting air quality and assess the impact of weather and traffic related variables on UFP and PM.

Philibert et al. [56]

2013

Setting

N 20

RF

Predict NO2 emissions using variables related to chemical fertilizer treatments applied to agricultural plots.

Chen et al. [54]

2014

Outdoor air pollution

Smog

ANN, Social Network Analysis

Predicting Smog based Health Hazardous regions

Dias et al. [55]

2014

Outdoor air pollution

PM2.5

Density-based Clustering

Quantification of human exposure to traffic related air pollution

Lary et al. [4]

2014

Outdoor air pollution

PM2.5

Ensemble Algorithms RF, SVM, ANN

Estimating the daily distributions of PM2.5

Jiang et al. [26]

2015

Outdoor air quality

AQI

Correlation Analysis

Monitoring the dynamics of air quality in large cities based on social media

Wang et al. [27]

2015

Outdoor air pollution

Generic

Topic Models LDA, NLP

Evaluating the use of social media data to estimate air pollution and public response

Reid et al. [34]

2015

Outdoor air pollution

PM2.5

Generalized boosting model, GAM, RF, SVM, KNN Regression, etc.

Predicting PM2.5 during wildfire

Lary et al. [52]

2015

Outdoor air pollution

PM2.5

Ensemble regression models

Estimating PM2.5 distribution and relationship of such air pollutants with mental health

Lewis et al. [49]

2016

Outdoor air quality

NOx,O3, SO2, CO, VOCs, PM

Boosted regression DT, gaussian process emulation

Improve the accuracy of common low cost air pollution sensors

Hu et al. [65]

2016

In/Outdoor air pollution

Generic

RF

Understanding, exposure to air pollution by predicting time-activity tracking of individuals

Challoner et al. [61]

2015

Indoor air pollution

PM NO2,

ANN

Predicting the indoor air quality from outdoor monitors

Mirto et al. [48]

2016

Outdoor air pollution and climate

Generic

Spatial data mining, hot spot analysis

Finding correlations between diseases and air pollution due to climatic factors

Xu et al. [30]

2017

Outdoor air pollution

PM, CO O3, SO2 NO2,

SVM, Fuzzy Evaluation, Empirical Mode Decomposition

Air quality forecasting and evaluation

Min et al. [43]

2017

Outdoor air pollution

PM2.5

K-Means

Apply K-Means to the identify potential new monitoring sites by considering a larger set of 313 variables in their models. Traffic and urbanicity are found to be useful to guide site selection

Keller et al. [44]

2017

Outdoor air pollution

PM2.5

Modified K-Means

A clustering method to assess exposure to air pollution in health-related studies. They consider the multivariate nature of the exposure and spatial misalignment likely to occur when using data from central monitoring stations and the actual location of the cases

Liu et al. [47]

2017

Outdoor air pollution

PM, SO2, CO, NO2, O3

SVM Regression

Apply support vector regression for air pollution forecasting using six criteria pollutants, five meteorological conditions and the Air Quality Index

  1. Chemical abbreviations: NO nitrogen oxide, NO2 nitrogen dioxide, NOx nitrogen oxides, UPM ultra fine particulate matter, PM particulate matter, SO2 sulfur dioxide, C6H6 benzene, O3 ozone, Rn radon, AQI air quality index, VOCs volatile organic compounds. Data mining abbreviations: ANN artificial neural network, DT decision trees, ARM association rule mining, GMDH group method of data handling, PCA principle component analysis, SVM support vector machine, GLM generalized linear model, RF random forest, LDA latent dirichlet allocation, NLP natural language processing, GAM general additive models, k-nearest neighbors. Note, k is a constant value specifying the number of nearest neighbors in kNN and the number of clusters in k-means