Study area
The study area includes the main island of Taiwan only (excluding all islets), comprising more than 22 million inhabitants in the year 2000, living in an area of 36,000 km2. There are a total of 349 local administrative government areas, which include 5 main urban areas, 2 secondary urban areas, 187 rural townships, and 29 aboriginal townships (Figure 1). According to a bulletin from the Ministry of Interior issued in 1996, urban areas are regions having at least one metropolitan center and can include neighboring cities and townships which share socioeconomic activities. Main urban areas are defined as those with a population larger than one million, specifically, Taipei-Keelung, Kaohsiung, Taichung-Changhua, Jhongli-Taoyuan and Tainan. Secondary urban areas are defined as those with a residential population ranging from 0.3 to 1 million (for example, Hsinchu and Chiayi).
Data collection and management
The data were collected from contractual medical care institutions, which in this study, means institutions where the NHI covers the costs of prescription medicines and treatment at outpatient clinics. Such facilities accumulate detailed databases of medical costs for inpatient care. The number of outpatient cases were classified in relation to disease codes, as defined in the 1975 edition of "The International Classification of Diseases, 9th Revision, Clinical Modification" (hereafter, ICD 9 CM). Criteria for refining the data were first established. Some data were not included in the final statistical data set. For example, cases where patients suffer from diseases which defy code classification, mismatched ID numbers, and so on. Disease codes were classified by gender and age. Cases with the same ID numbers but different diseases were counted as different instances [1].
Medical care data obtained from the NHI, 2006 report were examined, and the prevalence rates of the 20 leading causes of death calculated. Diseases classifications (made according to the International Classification of Disease, Injuries, and Causes of Death, 1975) are indicated in parentheses. They include the following: malignant neoplasms (ICD 08-14); cerebrovascular disease (ICD 29); heart disease (ICD 250, 251, 27, and 28* which includes a partial listing of ICD 420-429); diabetes mellitus (ICD 181); accidents and adverse side effects (ICD E47-E53); pneumonia (ICD 321); chronic liver disease and cirrhosis (ICD 347); nephritis, nephritic syndrome and nephrosis (ICD 350); suicide (ICD E54); hypertensive disease (ICD 26); bronchitis, emphysema and asthma (ICD 323); septicaemia (ICD 038); tuberculosis (ICD 02); ulcers of the stomach and duodenum (ICD 341); certain conditions originating in the perinatal period (ICD 45); congenital anomalies (ICD 44); anaemias (ICD 200); homicide (ICD E55); meningitis (ICD 220); and protein-calorie malnutrition (ICD 192).
Demographic information was provided by the Ministry of Interior [15]. The smallest administrative units coded for examination of the various diseases cases or health care events were precincts and townships. Age-adjusted standard prevalence rates, a direct adjustment using the world population in 2000 as the standard population [16], was then calculated, the results showed the leading causes of death for males and females in each township.
Global Moran's I statistic
The global spatial autocorrelation statistical method was used to measure the correlation among neighboring observations, to find the patterns and the levels of spatial clustering among neighboring districts [17]. The Moran's I statistic, which is similar to the Pearson correlation coefficient [18], is calculated by
where N is the number of districts; w
ij
is the element in the spatial weight matrix corresponding to the observation pair i, j; and x
i
and x
j
are observations for areas i and j with mean u and
Since the weights are row-standardized Σw
ij
= 1, the first step in the spatial autocorrelation analysis is to construct a spatial weight matrix that contains information about the neighborhood structure for each location. Adjacency is defined as immediately neighboring administrative districts, inclusive of the district itself. Non-neighboring administrative districts are given a weight of zero.
Determining spatial weights/connectivity matrices
Spatial contiguity for polygons is the property of sharing a common boundary or vertex. Contiguity analysis is an important method for assessing unusual features in the connectivity distribution [13, 19]. The Queen's measure of contiguity can be utilized to make up for spatial contiguity by incorporating both the Rook and Bishop relationships into a single measure [19].
The administrative districts considered in this study are highly irregular in both shape and size. We compare the first order queen polygon continuity method and a distance-based method, to choose the most appropriate method for quantifying the spatial weights matrix for analysis of the connectivity distributions between neighbors. Figure 2 shows the results of both the distance-based and the first order Queen's contiguity analysis for the administrative district boundaries. When the distance-based method is used there is a larger percentage of contiguity connection between neighbors (greater than 15); whereas the maximum value for the first order Queen's contiguity is 10. The differences between the distance-based contiguity and the first order Queen's contiguity methods are obvious. The connectivity distribution results obtained with the latter highlights the marked parities in connectivity. Based on the results of the connectivity distribution, we construct a first order queen polygon contiguity weight file for districts which share common boundaries and vertices. The spatial weights/connectivity matrices are utilized in the following local G*(d) calculations.
Local G
i
*(d)statistic
The local G
i
*(d) statistic (local G-statistic) is used to test the statistical significance of local clusters (as related to the 20 leading causes of death), and to determine the spatial extent of these clusters [12, 14]. The local G-statistic is useful for identifying individual members of local clusters by determining the spatial dependence and relative magnitude between an observation and neighboring observations [20]. The local G-statistic can be written as follows [12, 21, 22]:
where x is a measure of the prevalence rate of each leading cause of death event within a given polygon (i.e., each administrative district); w
ij
is a spatial weight that defines neighboring administrative districts j to i; W
i
is the sum of the weights w
ij
,
.
Developing the spatial weights w
ij
is the first step to calculating G
i
*(d). The spatial weight matrix includes w
ij
= 1. In this study, adjacency is defined using a first order queen polygon continuity weight file which has been constructed based on the districts that share common boundaries and vertices.
Non-neighboring administrative districts are given a weight of zero. The neighbors of an administrative district are defined as those with which the administrative district shares a boundary. A simple 0/1 matrix is formed, where 1 indicates that the municipalities having a common border or vertex; 0 otherwise [21, 23].
The local G-statistic includes the value in the calculation at i. Assuming that G
i
*(d) is approximately normally distributed [12], the output of G
i
*(d) can be calculated as a standard normal variant with an associated probability from the z-score distribution [24]. Clusters with a 95 percent significance level from a two-tailed normal distribution indicate significant clustering spatially, but only positively significant clusters (the z-score value greater than +1.96) are mapped.
Logistic regression analysis
Similarities between spatial distribution patterns for males and females are displayed. In addition to mapping, logistic regression is also performed. The binary response indicates whether there is significant autocorrelation between administrative districts or areas. There is higher correlation if the absolute value of the z-score of the local G-statistics is larger than 1.96; lower correlation otherwise. Gender is considered as an explanatory variable in the logistic regression model. Thus the model is expressed as
where β
0 and β
1 are the logistic regression coefficients of the model. Pr(Higher correlation) and Pr(Lower correlation) denote the "Higher" and "Lower" correlation probabilities, respectively. Computation is performed with the R-language (R 2.8.1).