Overall forecasting model structure
To generate predictions from 2017 to 2040, we used data from 1990 to 2016 and modelled three disease groups that have been found to be associated with a high-sodium diet by the GBD: CVDs and CKDs from level 2 and SC from level 3 of the GDB hierarchy casual structure. The GBD’s latest study (GBD 2017) covered 195 countries and territories (including Japan) and estimated DALYs and other health metrics for 359 diseases and injuries for each year from 1990 until 2017. The results of the GBD study are available by county and the estimates have been widely used by researchers for scientific studies, and policy-makers and several other stakeholders to argue for decision making, prioritization, and strategic resource allocation [9,10,11,12,13,14,15,16]. The GBD hierarchy causal structure ranges from level 1 to 4. The three cause groups at level 1 are communicable, maternal and neonatal conditions, and nutritional deficiencies; non-communicable diseases; and injuries. These are broken down into 22 diseases and injury categories as level 2 of causes, which are then further disaggregation into level 3, and finally into level 4 of causes with the most detailed 293 diseases groups. Ischaemic heart disease, for example, is classified as non-communicable diseases (level 1), cardiovascular diseases (level 2), cerebrovascular diseases (level 3), and ischaemic heart disease (level 4).
Following the GBD’s forecasting study methodology [17], we developed a three-component model of disease-specific DALYs for the three diseases associated with high salt intake. The model consists of a component on the changes in major behavioural and metabolic risk predictors including salt intake as a main risk predictor of interest in this study; a component on the income per capita, educational attainment, and total fertility rate under 25 years, which were combined into a socio-demographic index (SDI) expressed on a scale from 0 to 1, and time; and an autoregressive integrated moving average (ARIMA) model that captures the unexplained component correlated with time course. Further detail, including data sources, and model formulae are described below.
DALYs and SDI data, 1990–2016
We used the estimates of DALY rate per 100,000 population for CVDs, CKDs, and SC as well as SDI in Japan for the years 1990–2016 published in GBD 2017 [18]. The detailed methodologies for estimating DALYs and SDI are provided in the GBD 2017 summary publications [18, 19]. Data extraction and analysis were performed by sex (men, women, and both sexes combined) and age group (20–49, 50–69, ≥70 years, and all ages). The 0–19 years age group was not considered due to a lack of risk predictor data (see below).
Behavioural and metabolic risk predictor data, 1990–2016
We considered the population level average of salt intake (grams per day) and the prevalence of current smokers, current alcohol drinkers, and obesity for each sex and age groups based on the availability of data. These were obtained from Japan’s NHNS for 1990–2016 by sex and age groups. The NHNS is a nationally representative household survey, which is conducted annually by the Japanese Ministry of Health, Labour and Welfare to clarify dietary habits, nutrition intake, and lifestyle at the population level in Japan [20]. The NHNS consists of three parts: 1) physical tests, such as a blood test, performed by physicians at community centres; 2) an in-person survey of a (weighted) single-day dietary record of households; and 3) a self-reported lifestyle questionnaire (including smoking status and alcohol consumption) accompanying the dietary survey. No urine sodium was measured by NHNS. Detailed descriptions of survey procedures of NHNS are available elsewhere [20, 21]. To explain briefly, dietary intake data survey was conducted on a designated day excluding Sundays and public holidays. Trained interviewers (mainly registered dietitians) instructed household representatives (usually those who were responsible for food preparation) on how to measure food and beverage quantities consumed by the household members using an open-ended recording form. The allocation of shared dishes taken by each household members, food waste, leftovers, and foods eaten away from home were also recorded, as well as the portion size consumed or quantity of foods when weighing was not possible. The trained interviewers visited each household to check the participants’ survey compliance and, if necessary, confirmed portion sizes and converted estimates of portion sizes or quantity of foods. Each food item was coded according to the dietary record and the corresponding food composition list in the sixth edition of the Japanese Standard Tables of Food Composition [22].
In this study, salt intake (grams per day) was calculated as sodium (mg) × 2.54/1000. Obesity was defined as BMI of ≥25 kg/m2 according to the Japan Society for the Study of Obesity [23, 24]. Only those aged 20 years or older were considered in this study because the lifestyle questionnaire was not administered to the younger population aged less than 20 years old. Analytic sample sizes ranged from 6149 to 26,594 between 1990 and 2016. A total of 275,468 (127,571 men and 147,897 women) who were aged 20 or older and who completed the salt intake assessment were used to calculate the average daily salt intake and the prevalence of other predictors from the individual surveys.
ARIMA model for forecast
The ARIMA model was used to forecast future DALY rates with adjustment for several risk predictors. It produces forecasts based on its own past values in the time series (an autoregressive: AR term) with the error made by previous predictions (a moving average: MA term) using the shift and lag of historical information. Integrated (I term) in ARIMA model represents the differencing of raw observed data in order to make the time series stationary, that is, data values are replaced by the difference from the previous values.
In the ARIMA model, a standard notation is ARIMA with p, d, and q, where integer values substitute for the parameters to represent the type of the model described as ARIMA (p, d, q). The parameter is defined as follow: p is the order of the autoregressive; d is the degree of differencing involved; and q is the order of the moving average. Zero value can be used as a parameter, indicating that a particular component is not used in the model. For example, ARIMA (1, 0, 2) indicates no differencing, one AR term and two MA terms in the model. As generally known, the ARIMA model is given by
$$ \left(1-\sum \limits_{i=1}^p{\alpha}_i{L}^i\right){\left(1-L\right)}^d{y}_t=\left(1+\sum \limits_{i=1}^q{\beta}_i{L}^d\right){\varepsilon}_t,\kern0.5em $$
(1)
where yt is the outcome of interest; εt is an (white noise) error term, which is the residual defined as a time series of the difference between an observed and a predicted value at time t; L is time lag operator defined as Lkyt = yt − k; and αi and βi are the ith coefficient parameters of p (AR part) and q (MA part) [25]. The key point is that the time series model should have a serial correlation in the observed data, thus the residuals themselves are independent and identically distributed with zero mean and covariance. If the left-hand side of the Eq. (1) contains the differenced value, appropriate adjustments were also applied on the right-hand side. Before fitting the models, the stationary state of observed data in the series, which means there is constancy to the data over time, was examined using Dickey-Fuller test [25]. If non-stationary was assumed to be plausible, the data were transformed into a stationary time series by taking a suitable difference with order d. The autocorrelation function and partial autocorrelation function were used to identify the stationary status and to decide the range of grid search for the orders of the models. Model parameters were estimated by using maximum likelihood methods. Akaike’s Information Criterion (AIC) was calculated to select optimal models with the orders.
A two-step approach was used to forecast the future DALY rates: the first step was to independently forecast the values of each predictor at the population level (i.e. SDI, the average salt intake (grams per day), the prevalence of obesity (%), current smoker (%), and current alcohol drinker (%)) from 2017 until 2040 using the Equation (1), and the second step was to forecast the log-scaled DALYs rates yt by using the following Equation (2) after plugging the predicted values of the above predictors into xtj:
$$ \left(1-\sum \limits_{i=1}^p{\alpha}_i{L}^i\right){\left(1-L\right)}^d{y}_t=\sum \limits_{j=1}^4{\gamma}_j{L}^d{x}_{tj}+\left(1+\sum \limits_{i=1}^q{\beta}_i{L}^d\right){\varepsilon}_t, $$
(2)
where xtj is the value of j th predictor at time t, and γj is a coefficient parameter for the j th predictor. Equation (2) is the general form of so-called ARIMAX model (ARIMA with eXogeneous inputs), which capture the influence of external factors [26]. As widely adopted in epidemiological time series studies [27,28,29], ARIMAX has the capacity to generate predictions while identifying the underlying patterns of changes of both internal and external nature. All analyses were conducted using R version 3.6.1. The sets of parameters in Equation (2) were separately estimated by age and sex categories.
Future scenarios for salt intake
We assumed several future scenarios to evaluate the impact of salt intake on the DALY rates for the three diseases (CVDs, CKDs, and SC) for 2017–2040: a reference forecasts and three alternative scenarios (best, moderate, worse). The reference forecast assumed that the current trend is maintained: i.e. future salt intake during 2017–2040 was predicted using the ARIMA model defined in Equation (1). In the best scenario, a target daily salt intake (8 g per day) will be achieved in 2023 as per Health Japan 21 (the second term) targets [5] and continue to decline to reach 5 g per day in 2040 as per WHO’s guideline [6]. This scenario assumed a constant monotonic decreasing function from 2017 to 2023, when the 8 g per day is achieved, and a further monotonic decreasing function from 2024 to 2040, when 5 g per day is achieved. The moderate scenario assumed that less than the 8.0 g per day set in the best scenario is achieved in 2040 rather than in 2023, with a monotonic decrease function. The worse scenario is where the most recent salt intake (i.e. the value in 2016) remains constant through 2017 to 2040. By entering these assumed scenario values into the Equation (2) as a predictor, we can then obtain the final prediction value of DALY rates until 2040 for these alternative scenarios. Note that the salt intake as of 2040 is the same by definition in the best and moderate scenarios, and the predicted DALY rates converge mathematically to the same values in 2040.