Skip to main content

Table 7 Summary of hypothesis generating studies using data mining methods to generate new hypotheses to understand the relationship between air pollution and health conditions better

From: A systematic review of data mining and machine learning for air pollution epidemiology

Author

Year

Sub-field

Environmental agents

Data mining techniques

Objective

Chen et al. [22]

2010

Outdoor air pollution

Inorganic acids & basic air pollutants

Hierarchical Clustering

Explore relationship between climate and air pollutants

Zhu et al. [35]

2012

Urban outdoor air pollution

SO2, NO2, PM10, Respiratory diseases

ARM, GMDH

Forecasting the number of respiratory patients based on the seasonal effects of air pollution

Pandy et al. [38]

2013

Outdoor air pollution

UFP, PM

DT, RF

Test machine learning classifiers for predicting air quality and assess the impact of weather and traffic related variables on UFP and PM.

Payus et al. [32]

2013

Outdoor air pollution

SO2, NO2, PM10, CO,O3

ARM

Find associations between combinations of air pollutants with respiratory illness.

Bobb et al. [31]

2014

Mixture of chemicals

Multiple chemicals, neurodevelopment, hemodynamics

Bayesian kernel machine regression (BKMR)

Identifying mixtures (e.g., metals) and components responsible for various health effects (e.g., neurodevelopment)

Gass et al. [20]

2014

Outdoor air pollution

CO, NO2, O3, PM

Classification and regression trees

Apply classification and regression trees to generate hypothesis about exposure to mixtures of pollutants and health effects. They work with children’s asthma emergency visit

Fernández-Camacho et al. [51]

2015

Urban air and noise pollution by traffic

NOx, O3, SO2, Black Carbon

Fuzzy clustering

Find the relationship of noise to the traffic emission

Bell et al. [63]

2015

General chemical exposure

219 chemicals

ARM

Find relationships between chemicals and health biomarkers or diseases

Qin et al. [53]

2015

Outdoor air pollution

PM

ARM

Exploring relationships of PM spatial-temporal variations and how cities influence each other

Reid et al. [50]

2016

Outdoor air quality with wildfire

PM2.5 Respiratory diseases

Generalized estimating equation and generalized boosting model

Finding the relationship between wildfire and associated increment in PM2.5 affects people with respiratory diseases

Toti et al. [36]

2016

Outdoor air pollution, pediatric asthma

SO2, NO, PM, NO2

ARM

Exploring relationships of Air Pollution Exposure on Asthma

Mirto et al. [48]

2016

Outdoor air pollution & climate changes

Generic

Spatial data mining, hot spot analysis

Finding correlations between diseases (e.g. respiratory and cardiovascular diseases, cancer, male human infertility) and air pollution due to climatic factors

Li et al. [45]

2017

Outdoor air pollution

PM

Trajectory clustering

Apply clustering to identify transport pathways, sources and seasonal variations of particulate matter (PM2.5 and PM10) in Beijing for regulation purposes

Stingone et al. [46]

2017

Outdoor air pollution

National air toxics assessment

DT

Apply machine learning to identify air pollutants exposure profiles when exploring multiple pollutants (104 ambient air toxics) and then estimate the magnitude of the profile’s effect on math scores in kindergarten children

Ghanem et al. [69]

2004

Outdoor air pollution

SO2,C6H6,NO, NO2,O3

Hierarchical clustering

Monitor chemicals and outline challenges related to collection and processing.

  1. Chemical abbreviations: SO2 sulfur dioxide, NO nitrogen oxide, NOx nitrogen oxides, NO2 nitrogen dioxide, UFP ultra fine particulate matter, PM particulate matter, O3 ozone and C6H6 benzene. Data mining abbreviations: ASM association rule mining, GMDH group method of data handling, DT decision tree and RF random forest