Spatial patterns of the congenital heart disease prevalence among 0- to 14-year-old children in Sichuan Basin, P. R China, from 2004 to 2009

Background Congenital heart disease (CHD) is the most common type of major birth defects in Sichuan, the most populous province in China. The detailed etiology of CHD is unknown but some environmental factors are suspected as the cause of this disease. However, the geographical variations in CHD prevalence would be highly valuable in providing a clue on the role of the environment in CHD etiology. Here, we investigate the spatial patterns and geographic differences in CHD prevalence among 0- to 14-year-old children, discuss the possible environmental risk factors that might be associated with CHD prevalence in Sichuan Basin from 2004 to 2009. Methods The hierarchical Bayesian model was used to estimate CHD prevalence at the township level. Spatial autocorrelation statistics were performed, and a hot-spot analysis with different distance thresholds was used to identify the spatial pattern of CHD prevalence. Distribution and clustering maps were drawn using geographic information system tools. Results CHD prevalence was significantly clustered in Sichuan Basin in different spatial scale. Typical hot/cold clusters were identified, and possible CHD causes were discussed. The association between selected hypothetical environmental factors of maternal exposure and CHD prevalence was evaluated. Conclusions The largest hot-spot clustering phenomena and the CHD prevalence clustering trend among 0- to 14-year-old children in the study area showed a plausibly close similarity with those observed in the Tuojiang River Basin. The high ecological risk of heavy metal(Cd, As, and Pb)sediments in the middle and lower streams of the Tuojiang River watershed and ammonia–nitrogen pollution may have contribution to the high prevalence of CHD in this area.


Background
Congenital heart disease(CHD) refers to a malformation of the cardiovascular system and accounts for nearly one-third of all major congenital anomalies [1]. Heart malformations are the most common form of birth defects, occurring in approximately 8 per 1000 live births [2]. Surveillance data shows that CHD has the highest prevalence among other birth defects in Sichuan Province in recent years [3][4][5][6]. The proportion of birth defects related to infant mortality has recently increased, and CHD is now the most common cause of infant mortality and the leading cause of disability in young children [7], thereby increasing healthcare costs each year [8,9].
The pathogenesis of CHD is complicated and its underlying mechanism remains unknown. A group of CHD lesions with unknown etiology follows a multifactorial inheritance model, approximately 90% CHD cases are multifactorial [10][11][12], which implicates both genetic and environmental factors in disease development. Approximately 80% CHD cases are multifactorial and arise through various combinations of genetic and environmental factors [1,13].
Environmental factors contribute to 10% birth defects, but most birth defects are presumed to be caused by the combination or interaction of genetic and environmental factors [14]. Epidemiological research has yet to focus on the demographic, familial, social, genetic, and ethnic factors associated with the prevalence of CHD.
From the spatial epidemiology perspective, significant geographic differences occur in CHD prevalence [1]. Some studies have focused on geographical variations in CHD prevalence [1,[15][16][17][18][19], and CHD prevalence has been demonstrated to be closely related with elevation and latitude [20][21][22]. In addition, there is a clear and seasonal variation in CHD prevalence [23][24][25][26][27]. Maternal exposure to environmental factors such as ambient air pollution [28][29][30][31][32][33], heavy metals, and micronutrients are positively related to CHD prevalence because elements in the soil, water, and air affect human beings directly or indirectly [1,[34][35][36]. The physical environment such as solar radiation and magnetic fields also have influence on CHD prevalence [37,38]. Furthermore, socio-economic and lifestyle habits affect CHD prevalence. However, the extent of the contribution of these factors to CHD prevalence in the study area is unknown.
The purpose of this study was to detect spatial patterns of CHD prevalence at various geographical scales and to explore the possible links between CHD and environmental changes.

Study design
In this study, we mapped the prevalence of CHD among 0-to 14-year-old children at the township level in our study area firstly. In order to eliminate the dependence of the sampling variance on population size and the CHD prevalence, the hierarchical Bayesian model(HB) was employed to address the problem of a small population during explorative mapping of prevalence and to stabilize local estimates of CHD prevalence. Subsequently, global Moran's I statistic and local indicator of spatial association (LISA) statistic [39] were used to detect regions with high prevalence of CHD and the local Getis 0 sG Ã i method [40] was used to draw a prevalence map of CHD using the geographic information system(GIS). Finally, we try to explore the association between the high-prevalence clustering pattern of CHD and potential environmental risk factors.
All data analysis, including data processing, mapping, and spatial statistics were conducted using the ArcGIS and GeoDa 0.9.5-i software. The hierarchical Bayesian model and the publically available Winbugs 1.4 software [41] were used along with the Markov Chain Monte Carlo method.

Study area
The study area is situated in the eastern part of Sichuan Province in southwestern China, located in the Sichuan Basin with distinct geographic environment. The study area includes 13 municipalities comprising of 105 counties and 685 townships, due to its relative flatness and fertile ground, it's the most populous region in China with a population of near 70 million, and the population density is approximately 500-700 persons/km 2 .
Sichuan Basin is bordered by mountains and consists of low hills and alluvial plains with an elevation of 250-700 m. The Yangtze River passes through the southern part of the basin. Several major rivers such as the Minjiang River in central Sichuan and the Jialing River are tributaries of the upper Yangtze River.
Due to the unique prominent geology, geomorphology, and climatic characteristics, the study area is mostly covered by farms and cities. The basin with high population density and the cultivated land in the basin account for 85% of the total cultivated land in Sichuan Province. The basin is the central distribution of Chinese Mesozoic continental red beds, with plentiful mineral resources, various land use types, and well-developed industries, which generates a peculiar basin environment [42]. A map of the study area with the major cities, highways, rivers, and township boundaries is shown in Figure 1.
The basic geographical data, such as township boundaries, rivers, highways of the study area were provided in the form of shapefile by the State Key Laboratory of Resources and Environmental Information Systems (LREIS) of the Institute of Geographic Sciences and Natural Resources Research (IGSNRR), Chinese Academy of Sciences.

Data sources and data process
From 2004 to 2009, 2365 CHD cases among 0-to 14-year-old children have been reported in the Sichuan Province birth defect register system of Sichuan province, including 1,224 boys and 1,141 girls, most of the cases are belonging Han Chinese. The informed consent was obtained from CHD cases' parents or guardians. The study protocol conforms to the ethical guidelines of the 1975 Declaration of Helsinki and was approved by the Ethics Committee of the National Research Institute for Family Planning.
Each CHD case was classified and coded according to the International Classification of Diseases version 10 and belonged to the code range from Q20 to Q24.9. The classification showed that there were more than10 types of defects in our study area.
The CHD cases were distributed in 673 townships and were identified to the village level using Google Earth. Each CHD case was expressed as a point object and each point belongs to a particular township which expressed as a polygon object. The CHD cases in each township added together to calculated the CHD prevalence. The CHD cases with demographical and epidemiological information were also geocoded. Population data among the 0-to 14-year-old children every year since 2000 for each township in our study area were retrieved from the National Bureau of Statistics of China.

HB Model
The HB model [43] is simply an extension of traditional Bayesian models in which prior distributions have some form of conditional dependence. The following simple probabilistic model was postulated. It is assumed that O (i) denotes the number of CHD cases in towhship i which is independent and identically Poisson distributed with the intensity parameter λ(i) = E(i) × r(i), where E(i) denotes the expected number of CHD cases in township i, which is proportional to the corresponding population n i . r(i) is the positive township-specific rela- where the expectation and variance are defined by a linear function of a common value, α, and two independent random effects. a heterogeneous component e(i), which does not depend on the geographical location of townships and an autocorrelated component, v(i), which reflects the local spatial structure by incorporating the influence of neighboring townships. The model is Prior distributions are then assigned to these linear terms and consequent hyperprior distributions are assigned to the variance terms; thus, creating a hierarchical model as follows: where Aj(i) denotes neighborhood i, w(i,j) is a weight matrix element, and w * (i,j) is the standardized form of the weight matrix that defines the relationship between the township i and its neighbor township j. The weight is defined simply as w(i,j) = 1 if two townships are adjacent(means two townships' boundries share a common border or vertex) and w(i,j) = 0 otherwise.  model originally proposed by Besag and Newell [44], where the random effect associated with spatial autocorrelation, v(i) is defined according to the conditional auto-regressive model(CAR) [45], the hyperprior distributions for 1/κ 2 and 1/σ 2 were specified at Gamma(0.5, 0.0005) in this study.
Following the Bayesian inference technique, the observed number of cases in each township was treated as a binomial random variable with parameter P i in our analysis [7]. P i is the probability of a live birth with CHD in township i. The standard prevalence is the maximum likelihood estimation of P i . As the environment are similar, P i is assumed to be constant within the same township. P i is modeled through a logit transformation, logit(P i ), expressed as: where α is the intercept term(mean) used to calculate CHD prevalence, v i is the spatially structured autoregression, and ε i is the spatially unstructured random effect.
A single chain sampler with number i of 4000 iterations were run, followed by 1000 iterations during which values were stored in the form of P i .

Spatial cluster test
The first law of geography is summarized as: "Everything is related to everything else, but near things are more related than distant things" [46]. Spatial autocorrelation statistics analyzes the degree of dependency among observations in a geographical space. The fundamental goal of spatial analysis is to identify patterns in spatial data that lead to identifying a spatial autocorrelation or association and identify peculiarities in the data set in one or more regions [47]. Global spatial autocorrelation was used to test spatial correlation in the entire study area by assuming that the spatial process was the same everywhere. Spatial autocorrelation indicates that adjacent observations of the same phenomenon are correlated. Moran's I statistic [48] is one of the most commonly used test for areal cluster analysis.
The Global Moran's I statistic is a measure of spatial autocorrelation developed by Patrick Moran [8], the goal of which is to identify statistically significant hot-spots or clusters in data presented in spatial objects on two-dimensional surfaces. Moran's I statistics were applied to explore the spatial clustering pattern of the birth prevalence of CHD in a quantitative way, helping researchers gain a deeper understanding of the phenomena of high CHD prevalence. The formula for global Moran's I statistic is: Where w(i,j) is the weight between observations i and j, The values of Moran's Index range from −1 to +1. A Moran's Index value near +1.0 indicates clustering, whereas an index value near −1.0 indicates dispersion. A zero values indicates a random spatial pattern, Negative (positive) values indicate negative (positive) spatial autocorrelation. In general, Moran's I values can be transformed to Z-scores in which values >1.96 or < −1.96 indicate spatial autocorrelation significant at the 95% confidence level. The Z-score is used to evaluate the significance of the index value and is a measure of the standard deviation associated with a standard normal distribution.
Local spatial autocorrelation statistics provide estimates disaggregated to the level of spatial analysis units, allowing an assessment of the dependency relationship across space. Local clustering statistics are used to test the statistical significance of local clusters and map the extent of the clusters of the feature. The local indicators of spatial association (LISA) statistic, which is usually applied when studying local spatial clustering, is interpreted as an indicator of pockets of nonstationarity and is also used to assess the influence of individual locations on the magnitude of the global statistic as well as to identify "outliers" [39]. The LISA statistic is used to evaluate clustering of individual units by calculating the local Moran's I statistic for each spatial unit and evaluating the statistical significance. The equation can be written as follows: where N is the number of observations(units). The Getis 0 sG Ã i statistic [40], developed by Getis and Ord, is used as a method for detecting hot-spots that measure the overall spatial association of values falling  within a critical distance of each other. It can be expressed as follows: where S is the standard variance of CHD prevalence and w ij is the spatial distance weight matrix between townships i and j. When the distance from township j to i is within distance d, w ij (d) = 1; otherwise w ij (d) = 0, and S Ã 1i , W Ã i .

CHD Prevalence mapping
Because CHD is a low probability event, the number of CHD cases among 0-to 14-year-old children from 2004 to 2009 was geocoded and aggregated by geographical units at the township level. CHD prevalence before and after adjusted by HB model among 0-to 14-year-old children were calculated. Table 1 presents the statistical characteristics of the CHD proportion estimates in the 673 townships.
The prevalence Themerange map of CHD in the 673 townships was drawn and shown in Figure 2, which illustrates the distribution of CHD prevalence before adjustment by the HB model and the CHD prevalence estimates after adjustment by the HB model in the study area respectively.

Spatial autocorrelation
We constructed a first-order queen polygon contiguity weight matrix file of the 673 townships based on connectivity of the smallest administrative units. The global spatial autocorrelation statistic and corresponding p-values were estimated by Moran's I statistic to HB model smoothed CHD prevalence. The computation was implemented using the Geoda0.9.5-i software. The level of spatial autocorrelation was 0.3746 (p = 0.001), suggesting non-randomness in the overall spatial pattern. The positive value indicates that the prevalence of CHD in our study area had a significant clustering pattern at the township level.
The township scale LISA statistics based on the adjusted CHD prevalence using queen's case adjacency weight matrix were calculated, and the significant clusters (HH, LL, HL, and LH) are illustrated in Figure 3.
The results of this analysis yielded to five categories of spatial units. These categories were defined as "high-high (HH)," "low-low (LL)," "high-low (HL)," "low-high (LH)," and "not significant (NS)". The HH category indicates clustering of high adjusted CHD prevalence, whereas the LL category indicates clustering of low adjusted CHD prevalence. Three HH areas and two LL spot areas were detected. The largest HH area was detected in the central portion of NeiJiang City, and the two smaller hot-spots were found in Ya' An and MianYang. Two LL areas were observed in or just east of ChengDu city and south of LuZhou and YiBin. These outcomes were equivalent to a positive spatial autocorrelation.
In addition, the HL category indicates that high CHD prevalence values were adjacent to low values, whereas the LH category indicates that low values were adjacent to high values of adjusted CHD prevalence. These outcomes are equivalent to a negative spatial autocorrelation. Lastly, the NS category indicates that there is no statistically significant spatial autocorrelation.

Cluster pattern
The distance value is a critical threshold of Getis 0 sG Ã i statistics. The phenomena of hot-spot distribution with different distance thresholds (The average distance between townships is 12.27 km, the shortest distance between countries is 3.71 km, the average distance between countries is 22.70 km and the shortest distance between 13 cities at prefectural level is 35.49 km in our study area) are shown in Figure 4, and the hot-spot number at different distance thresholds among 1-to 35-km is shown in Table 2.
The largest hot-spot area was located on the left side of the Yangtze River and downstreams of the Tuojiang River watershed, including the NeiJiang, ZiGong, MeiShan, and LeShan areas. One of the cold-spot areas was located on the right side of the Yangtze River and was symmetrically distributed with the largest hot-spot area. Within the distance from 1 km to 9 km, the variations in the hot-spot cluster did not change. The hotspot cluster's aggregate began at the distance threshold value of 10 km. The center of the hot-spot area was located in NeiJiang city, and the standard deviation ellipse statistics showed that the hot-spot region extended along the flow direction of the TuoJiang River and was present as a zonal distribution trend, which is shown in Figure 5.

Discussion
CHD is the most frequent group of congenital anomalies and is the leading cause of infant death due to congenital anomalies and is associated with a considerable burden on public and private resources [13].
In this study, the HB model was used to adjust the prevalence of CHD. The geographical distribution of CHD prevalence at the township level was investigated and mapped, both global and local spatial clustering methods were used to quantify the spatial pattern of CHD prevalence. Moran's I statistic was used as a measure of global clustering and was assessed by testing the null hypothesis that the spatial pattern of these data were random. LISA is an indicator of local spatial association that measures whether CHD prevalence for a particular spatial unit at the township scale is closer to the values of a neighboring unit or to the average of the study area.
We found significant spatial variability in the prevalence of CHD in 0-to 14-year-old children in Sichuan Basin. In addition, the significant positive spatial autocorrelation and the significant local clusters confirmed the spatial variances of CHD prevalence. The spatial pattern and clustering of events provide important information for developing and refining geographical-and population-specific prevention programs to reduce CHD risk. In addition, this information will be useful to healthy planners because many current policies and health initiatives are principally based on assumptions of spatial homogeneity.

Explanatory phenomena
Identifying CHD clusters provides clues to causality. The largest CHD hot-spot region was located in the TuoJiang  River watershed, as shown in Figure 6. Thus, a stronger relationship was observed between CHD prevalence and the environment around the TuoJiang River.
The Tuojiang River is one of the largest tributaries of the upper Yangtze River, industries and agriculture are well developed along the coastal area of the TuoJiang River. Pollution from chemical plants, machinery and paper industries as well as non-point source pollution from rural regions is a very serious issue. A literature review showed that heavy metal pollution in sediments increases from up to downstream of the Tuojiang River. Mining activities are the most important sources of heavy metals, and heavy metal contents clearly increase at the convergence region of the Tuojiang River. The potential ecological risk from cadmium is the highest, followed by that of arsenic and lead [49,50]. In addition, the average total nitrogen and total phosphorus concentrations in the Tuojiang River exceed the standard acceptable value by more than 3-and 1.2-folds, respectively [51,52]. All of these characteristics are potential risk factors for a high prevalence of CHD.

Limitations
Three major limitations of this study should be discussed. First, the calculation of CHD prevalence was a key step in the study. The reported prevalence of CHD at birth varies widely worldwide. In our study, the newborn rate in Sichuan Province was 8.93% in 2010, which was cited from the Sixth National Population Census from the National Bureau of Statistics, China. The prevalence of CHD in our study area was lower than this value because we only considered surviving children with CHD in each family as per the current family planning policy in China.
The second limitation was that spatial patterns of CHD prevalence may change dependence on the spatial scales and units used in analysis, which is commonly known as a modifiable areal unit problem or ecological fallacy [53]. The importance of location, spatial interaction, spatial structure, and spatial processes has been well established in public health literature. The utility of exploratory spatial data analysis tools allows researchers to map spatial patterns, identify local variability in CHD prevalence, and assess the efficacy of spatial models. The objectives of this study were to help generate working hypotheses and design a more sophisticated research protocol for future research efforts. Studying different distributions and spatial patterns (point or lattice) at different spatial scales (country or village level) deserves further research.
The third limitation was CHD defects include abnormal chromosomes, single-gene disorders, and polygenic disorders. But the prevalence of CHD differs in different areas within a limited region. Geographical variations in CHD prevalence can be explained by variations in socioeconomic status, education, urbanization, climatological factors, ethnicity, and patient-related factors such as comorbidity, lifestyle, and healthcare-seeking behavior. More insight into the epidemiology of CHD is needed. Exploring the environmental risk factors for CHD is also a difficult problem. Maternal factors, maternal health, and diseases such as diabetes mellitus, phenylketonuria, febrile illness, rubella,stress, and obesity have significant relationship with CHD. Maternal lifestyle, drug and medical use, and environmental toxic exposure lead to CHD.

Conclusions
It is very intriguing that the high prevalence of CHD was associated with watershed environmental pollution and specific environmental factors in specific areas. Potential risk factors contribute to CHD, and the mechanism of the environmental risk factor effects deserves special attention. In addition, considering more potential risk factors from the epidemiology perspective and applying different spatial statistical methods are important strategies in CHD studies.
Exploring the spatial and temporal changes in CHD prevalence, reducing the recurrence of CHD, and preparing prevention strategies are new challenges for subsequent studies. We hope that information is gleaned from this study and that more in-depth studies are based on this research. Identifying causal agents of CHD using geographical analysis technology and tools to provide public health professionals and policy makers within areas of elevated risk are important for designing effective intervention programs.