Modeling road traffic fatalities in Iran’s six most populous provinces, 2015–2016

Background Prevention of road traffic injuries (RTIs) as a critical public health issue requires coordinated efforts. We aimed to model influential factors related to traffic safety. Methods In this cross-sectional study, the information from 384,614 observations recorded in Integrated Road Traffic Injury Registry System (IRTIRS) in a one-year period (March 2015—March 2016) was analyzed. All registered crashes from Tehran, Isfan, Fras, Razavi Khorasan, Khuzestan, and East Azerbaijan provinces, the six most populated provinces in Iran, were included in this study. The variables significantly associated with road traffic fatality in the uni-variate analysis were included in the multiple logistic regression. Results According to the multiple logistic regression, thirty-two out of seventy-one different variables were identified to be significantly associated with road traffic fatality. The results showed that the crash scene significantly related factors were passenger presence(OR = 4.95, 95%CI = (4.54–5.40)), pedestrians presence(OR = 2.60, 95%CI = (1.75–3.86)), night-time crashes (OR = 1.64, 95%CI = (1.52–1.76)), rainy weather (OR = 1.32, 95%CI = (1.06–1.64)), no intersection control (OR = 1.40, 95%CI = (1.29–1.51)), double solid line(OR = 2.21, 95%CI = (1.31–3.74)), asphalt roads(OR = 1.95, 95%CI = (1.39–2.73)), nonresidential areas(OR = 2.15, 95%CI = (1.93–2.40)), vulnerable-user presence(OR = 1.70, 95%CI = (1.50–1.92)), human factor (OR = 1.13, 95%CI = (1.03–1.23)), multiple first causes (OR = 2.81, 95%CI = (2.04–3.87)), fatigue as prior cause(OR = 1.48, 95%CI = (1.27–1.72)), irregulation as direct cause(OR = 1.35, 95%CI = (1.20–1.51)), head-on collision(OR = 3.35, 95%CI = (2.85–3.93)), tourist destination(OR = 1.95, 95%CI = (1.69–2.24)), suburban areas(OR = 3.26, 95%CI = (2.65–4.01)), expressway(OR = 1.84, 95%CI = (1.59–2.13)), unpaved shoulders(OR = 1.84, 95%CI = (1.63–2.07)), unseparated roads (OR = 1.40, 95%CI = (1.26–1.56)), multiple road defects(OR = 2.00, 95%CI = (1.67–2.39)). In addition, the vehicle-connected factors were heavy vehicle (OR = 1.40, 95%CI = (1.26–1.56)), dark color (OR = 1.26, 95%CI = (1.17–1.35)), old vehicle(OR = 1.46, 95%CI = (1.27–1.67)), not personal-regional plaques(OR = 2.73, 95%CI = (2.42–3.08)), illegal maneuver(OR = 3.84, 95%CI = (2.72–5.43)). And, driver related factors were non-academic education (OR = 1.58, 95%CI = (1.33–1.88)), low income(OR = 2.48, 95%CI = (1.95–3.15)), old age (OR = 1.67, 95%CI = (1.44–1.94)), unlicensed driving(OR = 3.93, 95%CI = (2.51–6.15)), not-wearing seat belt (OR = 1.55, 95%CI = (1.44–1.67)), unconsciousness (OR = 1.67, 95%CI = (1.44–1.94)), driver misconduct(OR = 2.51, 95%CI = (2.29–2.76)). Conclusion This study reveals that driving behavior, infrastructure design, and geometric road factors must be considered to avoid fatal crashes. Our results found that the above-mentioned factors had higher odds of a deadly outcome than their counterparts. Generally, addressing risk factors and considering the odds ratios would be beneficial for policy makers and road safety stakeholders to provide support for compulsory interventions to reduce the severity of RTIs.


Introduction
Iran has a serious problem with high traffic levels above average due to several factors, including transportation strategies and sociocultural and economic features.Regarding to the world Health Organization (WHO) data published in 2020, the number of deaths due to road traffic injuries (RTIs) exceeded deaths from heart diseases in Iran [1].It has been also reported that traffic accidents caused approximately 100,000 fatalities and more than 2 million serious injuries over an 8-year period from 2013 to 2020 [2].
So far, global initiatives have sought to understand better and address the underlying mechanisms of road safety, many of which are aligned with the worldwide program of the Decade of Action for Road Safety 2011-2020 prepared by the United Nations Road Safety collaboration (UNRSC) [3].However, despite the increase in road injuries in Iran, the main reasons for such an important issue have not been appropriately identified.Based on the reports by the head of traffic information and control center of Iran traffic police, driver fault was the primary factor in traffic accidents [4].Although driver fault features top the list of causes in Iran, other elements cannot be neglected.There are more causes of traffic accidents, such as environmental, road-related, road-user, vehicle, and driver-related factors.Valuable existing studies have identified only part of the risk factors for RTIs, and there is no comprehensive study in this field yet.For example, Lankarani et al. (2014) aimed to address environmental factors in road traffic crashes.They used data from a cross-sectional study of the traffic police department between March 2010 and December 2010.The results indicated that day time, dusty weather, oily road surfaces, ominous traffic signs, road narrowing, and downhill roads were highly correlated with road crash-related deaths [5].A study by Sherafati et al. (2017) showed that crash severity and length of admission time were the leading causes of inequity in fatality rates between urban and rural regions [6].Hasani et al. (2018) conducted a study to identify the risk of age, gender, time, pedestrian position, accident location, and vehicle type for pedestrian fatality in urban and suburban traffic collisions in Tehran and Alborz Provinces.They found that in urban roads older than 35 years; males; day time, two-way not divided roads, holidays, 4-wheeol vehicles, crossing the road from an unauthorized route were significantly associated with pedestrian fatality.However, only road design (two-way divided roads) was identified in suburban crashes to correlate with pedestrian fatalities [7].In an earlier study, Bakhtiyari et al. (2019) evaluated human risk factors of RTIs using data from a cross-sectional study in Iran.They included all road crash data of five main suburban roads from August to February 2015.Over speeding, not warning a seat belt, reckless overtaking, fatigue and drowsiness, and exceeding the speed limit were determined to be the most important human factors affecting traffic-related deaths [8].
All the studies mentioned above indicate sparse information about risk factors related to crash severity in Iran.Furthermore, it should be considered that we are at the beginning of the United Nations Decade of Action for Road Safety 2021-2030, which emphasizes the importance of taking a holistic approach to road safety [9].Therefore, it is crucial to know where we are, the situation where the field is, and identify what research will be essential for further progress in the future.Therefore, a comprehensive investigation of the epidemiological features of RTIs in all categories of possible risk factors, namely crash scene, vehicle, driver, passenger, and pedestrian characteristics, seems to be a vital concern.In this regard, the primary objective of the present study is to make integrated analyses to identify the main factors that affect road crash severity.To accomplish this goal and address questions on the effects of crash scene, vehicle, driver, passenger, and pedestrian characteristics, the data of a comprehensive study at the national level was used.The findings gained by this study will be helpful information that stakeholders in road safety can use to create effective countermeasures against severe and fatal crashes.
In this study, logistic regression was used to classify the statistically significant risk factors for fatal traffic accidents.The use of logistic regression has been shown to be an effective and trustworthy way to identify the relationship between the dependent and independent variables in traffic accidents.Fiorentini et al. (2020) used the random under-sampling of the majority class (RUMC) resampling technique to deal with imbalanced crash databases.The authors claimed that because classification issues are usually unbalanced, a useful prediction for the minority class may be made.To create crash severity models, four different techniques including Logistic Regression, random Tree, Random Forest, and K-Nearest Neighbor were used.Eight separate models were developed both utilizing or not utilizing RUMC and one of the four machine learning techniques.F1-score, True positive rate (recall), true negative rate, false positive rate, accuracy, precision, and the confusion matrix were calculated to evaluate the efficacy of the various models.This study looked at a dataset of 6,515 crashes that occurred on roads and at crossings in Great Britain from 2005 to 2018.In terms of predictive power, the RUMC-based methods outperformed the algorithms created utilizing the unbalanced dataset.Concerning overall accuracy, the RUMC-Logistic Regression (62.53%) outperformed the RUMC-Random Forest (56.14%),Random Tree (50.97%), and RUMC-K-Nearest Neighbor (48.47%) [10].In a case study, Olayode et al. (2021) found that in predicting the traffic flow at a four-way road intersection, an artificial neural network trained by a particle swarm optimization model performed better than a heuristic Artificial Neural Network model.Moreover, due to their superior testing results, both models were sufficiently reliable in predicting traffic flow [11].
In one of the recent studies, Mohanty et al. (2022) examined the use of artificial neural network and binary logistic regression for modeling crash severity by looking at the role of cars (both as perpetrator & victim).When using the cut-off value equal to 0.5, the binary logistic regression effectively predicted about 75% of outcomes.The number of crashes involved in a particular offender and victim pair crash, the type of validation method used, and the hidden layer used for the study considering different sigmoid activation functions all had a substantial impact on the artificial neural network method's accuracy.ROC curves showed that artificial neural network could correctly forecast 75% of the outcomes.By removing any pairs of vehicles that are present or that have appeared infrequently, this percentage could be increased.Based on a comparison of the two approaches' advantages and disadvantages, binary logistic regression was found to be superior overall.Its only drawback was a lack of applicability when there was a weak correlation between the dependent variable and its predictors.However, the artificial neural network approach is unconstrained by these restrictions due to its machine-learning nature.Using more input data, it delivers predictions with more precision [12].
The following is how the paper is organized: A description of the data and research variables is offered after an overview of the relevant investigations.The findings of both the basic and multiple logistic models are then analyzed and explained, along with the management of missing data and descriptive statistics for these variables.A few closing remarks are then offered.

Data collection and description of variables
Reliable and expanded data collection is crucial to derive sound conclusions.In Iran, Integrated Road Traffic Injury Registry System (IRTIRS) [13] is a comprehensive reference for a crash database.This multi-method study is supported by the World Health Organization, the Iranian Ministry of Health, the Iranian Traffic Police, and the Iranian Forensic Medicine Organization.The development of IRTIR is a national research project started with 2017with the aim of developing an integrated registration of traffic accidents in Iran.In cooperation with other interested organizations, the Ministry of Health and Medical Education (MOHME) and the Road Traffic Injury Research Center of Tabriz University of Medical Sciences decided to develop IRTIR to create an integrated data recording system.Experts fill in reports in five main sections: crash scene (crash type, time, lighting status, weather, etc.), vehicle (vehicle type, color, maneuver, etc.), driver (age, gender, license, etc.), passenger (age, gender, injured organ, etc.) and pedestrian (age, gender, injured organ, etc.).This study covers all accidents in one year (March 2015-March 2016), in which 384,614 road traffic crashes were recorded on all roads in Tehran, Isfan, Fras, Razavi Khorasan, Khuzestan, and East Azerbaijan provinces, the six most populated provinces in Iran. Figure 1 presents a flowchart of dataset preparation for modeling the contributing factors of fatal crashes.
The IRTIRS provides information in four different categories in separated files.The crash database contained details of 208,828 crashes.The crash severity in this database considers three categories: property damage, injury, and fatality.Based on the study purpose, severity data are classified into two categories: (1) damage or injury and (2) fatality.The variable of severity, hence, in this study, is binary.
Additionally, the IRTIRS crash database contains the road name where the crash occurred and its type (alley, main street, side street, main road, side road, rural road, freeway, and expressway).Road names were searched manually in Google Maps to ensure correct recorded road types.In case of any doubt, the information about that particular road was asked traffic police officers.The vehicle-driver database had 370,214 observations, dropping six repeated cases led to the final target database with 370,208 observations.Overall, 27,499 recorded passengers and 27,027 pedestrians were in the following two databases.
After screening all four databases individually, they were combined to make a final master database.Initially, vehicle-driver and passenger databases were combined using crash seri, serial, vehicle type, model, system, color, and plaque number to make a vehicle-driver-passenger database.Subsequently, crash and pedestrian information was added to vehicle-driver-passenger using crash seri and serial number.In the pedestrian combing phase, the data of 94 pedestrians were removed since these pedestrians could not be matched with the diver who struck them.The final database entailed 384,614 observations: 323,884 of which contained crash-vehicle information, 26,337 with crash-vehicle-passenger, 26,264 with crashvehicle-pedestrian, 153 with crash-vehicle-passengerpedestrian, 10 with crash-passenger-pedestrian, 531with crash-passenger, 458 with crash-pedestrian, one with pedestrian-passenger, 6,427 with crash, 25 with vehicle, 477 with passenger, and 47 with pedestrian information.All variable descriptions and categories are detailed in Table 1.In this table, the original categories of each variable, along with the modified ones, are also presented.

Missing data management
Numerous statistical methods have been proposed to manage missing data [14].In epidemiological studies, complete case analyses (CCA) and multiple imputations are standard approaches.The use of comprehensive case analyses, which only take in respondents to all variables for the intended analysis, is more common because of its simplicity and being the default of most statistical soft wares.Despite the advantages mentioned, taking a reduced and unrepresentative sample, leading to lower power and possibly biased results, can be considered this method's major pitfalls [15].Besides, the accuracy of this method strongly relies on assumptions concerning missing-data mechanisms, frequently needing strict missing completely at random (MCAR) assumptions.Based on this assumption, there is no relationship between either observed and unobserved variables for a given subject and the probability of a variable being missing for that subject [16].
Another alternative approach to managing missing data is the imputation method [17].Generally, there are two imputation methods: single (SI) and multiple imputations (MI).In a SI, the imputed value is determined using a specific rule.There are several forms for the SI, including using the last observed value, using the mean, and using the data with the highest frequency.In general, the SI method is not recommended due to the need for assumptions that are often unrealistic and lead to underestimation or overestimation of the P value [15].The other imputation method, MI which has gained popularity in the past few years, was developed to address the CCA drawbacks and SI.MI is a three-stage statistical process limiting uncertainty about missing values by calculating various possibilities or imputations.Vehicle plaque description 38 values indicating plaque type such as: personal regional, free zones, governmenta, etc personal regional, other    [15].
In the multiple imputation method, all participants can present in the analysis and may increase parameter estimation accuracy while reducing bias [18,19].MI is used in this paper in which each variable with missing values is imputed ten times.

Statistical analysis
In a primary descriptive analysis, the data were described as frequencies (percentages) for categorical variables and mean ± SD (standard deviation) for continuous ones.Simple logistic regression models were performed to identify the potential explanatory variables affecting fatal crash likelihood, considering the dependent variable (crash severity: injury or property damage only crashes (Y = 0) and fatal crashes (Y = 1)) in the individual databases.Both CCA and MI in separate and combined databases were considered in simple logistic regression analysis.Since there was no significant difference in the intensity and direction of estimated odd ratios, the multiply imputed and combined database was considered the final database.Given the relatively low number of passenger and pedestrian crashes, the present study only focuses on the explanatory variables of crash and vehicle level.However, descriptive statistics have been generated for these databases; only two binary variables have been considered in the multiple analysis to indicate the presence or absence of the passenger or pedestrian in the desired crash.The effect size was identified with 95% confidence intervals for all variations.All analyses used Stata software (version 14.0; StataCorp, College station, Texas, USA).

Results and discussion
From all variables in Table 1, those affecting the incidence of fatal traffic accidents were considered explanatory variables.Given the highly invalid data concerning distance from the nearest police station, crash longitude coordinate, crash latitude coordinate, road shoulder width, road length, and road width, this study has not considered these variables in further analyses.There were also some identifying variables in Table 1, namely: crash seri, crash serial, officer code, road name, police station, road beginning, road end, vehicle system, vehicle system ID, vehicle company, vehicle parent company, vehicle plaque number, vehicle plaque serial, driver first name, driver last name, driver national ID, driving license ID, passenger first name, passenger last name, passenger national ID, pedestrian first name, pedestrian last name, pedestrian national ID, which were just used either for producing unique linkage ID to combine different databases or finding out whether the databases were correctly combined.In addition, scene status, driver reaction, and driver injury type were removed from simple and multiple analyses because these variables are outcomes.Table 2 offers an explanatory variables summary.From 208,828 crashes recorded in the crash database, 2,237 (1.07%) were fatal.Details about all other explanatory variables have been presented in Table 2.In the case of defining modified levels for a variable, the statistics have been described based on the modified levels.

Analysis of the General Model
Table 3 shows each factor's adjusted odds ratios through the final dataset based on simple and multiple logistic regression models.

Passenger and pedestrian involved in a crash
In Table 3 review, crashes with passengers were 4.95 times, and crashes with pedestrians were 2.60 times more prone to fatal crashes.The presence of passengers may reduce attention to the driving task and exert direct or indirect psychological pressure to drive on less safe roads.In the same vein, it can be assumed that the presence of a passenger may lead to increased stress and, thus, reduced driving performance [20].In addition, pedestrians are highly likely to be more vulnerable compared to other road users because they are less protected than the occupants of closed vehicles.The relatively high vulnerability of pedestrians to traffic accidents in metropolitan areas is consistent with the results of international research [21].

Crash-level variables
The odds ratios of day factor (limited to the weekend and weekday categories), zone type, view obstacles, crash position, road surface, road geometric design, vehicle factor, and road repairing status were not significant in resulting in fatal crashes (all P > 0.05).
Night time followed by twilight/ dawn time, was riskier than daytime (the odds of fatal crashes being at least 1.48 times greater).This may be held supported by the fact that there is high traffic volume during the daytime, which prevents drivers from driving at high speeds.On the other hand, driving during the day provides a better      visual perception and more time to distinguish barriers and react.These conditions make drivers more cautious and better prepared to take necessary measures to reduce the risk of a severe crash.Compared to clear/cloudy weather, the odds of fatal crashes increased by 1.32 times during rainy weather.Meanwhile, snowy weather was 65% less prone to a fatal crash.Foggy/stormy/dusty and clear/cloudy weather conditions were equally likely to lead to fatal crashes.Although the number of road collisions on snowy and rainy days is inevitably higher than on clear and cloudy days, and driving on these days is more dangerous due to limited visibility and tire adhesion, drivers drive more carefully and at lower speeds.In addition, most people avoid unnecessary travel or postpone it to another time.For these reasons, the available documents suggest that less severe traffic accidents (property damage or injury) increase on snowy days, and more severe ones (fatality) increase on these days.
As in previous research [22], the results indicate that roads without specific traffic control are severe road features with high odds of resulting in fatality if a collision occurs.The absence of intersection control has led to a higher possibility of fatal crashes (1.40 times more).Intersection control can force drivers to comply with traffic control.As a salient example, after detecting a vehicle proceeding inside the intersection with not yielding the right of way, the officer can stop and issue a citation for the noncompliant driver.Such targeted enforcements increase legitimacy among offenders and others who observe or hear about these activities.
The line marking showed a significant effect.Broken lines were 1.36 times more likely to induce fatal crashes than crash locations with no line marks.Subsequently, single and double solid lines were even more critical than broken ones: they were 1.54 and 2.21 times more likely to lead to fatal crashes, respectively.This could be related to the fact that    double solid lines mark the boundaries of each way on two-way roads where the risk of a head-on collision and, consequently, death is much higher on these roads.
Regarding road material, asphalt roads were ~ 2 times more likely to result in fatal crashes when compared to sand/clay roads.Drivers are less cautious and alert to their performance, especially regarding speed control when driving on asphalt roads, since they possess more good situations than other road types.This finding is in line with existing literature [23].
Consistent with existing studies [24,25], findings from the present study indicate that crashes happening in nonresidential areas exacerbate the crash outcome more than in other regions (being 2.15 times more fatal).While driving in non-residential areas, drivers are more likely to engage in risky driving behaviors since they usually do not perceive a critical situation in non-residential areas.
Crash severity analysis based on collision mechanisms revealed that involving vulnerable road users was associated with more severe crashes.Since they are directly exposed to impact, they succumb to death and increase the fatality chance.
The results also revealed that it was 13% more likely to die in the presence of human factors in the causation of a road traffic crash.Similar studies showed that human factors (namely: hasty driving, ignoring traffic regulations, fatigue, drowsiness, etc.) were the sole cause of many accidents [26,27].
When dealing with judiciary causation factors: In terms of the first cause, except needing more training, simultaneity of needing more training combined with irresponsibility and other multiple factors played a critical role in increasing the odds of a fatal crash as compared to irresponsibility solely.Failure of organs was almost the same as irresponsibility resulting in deadly crashes.Policymakers have applied numerous measures to alleviate the severity of traffic crashes, the very epitome of which could be speed cameras and police surveillance.Having said that, pedagogical approaches planned for drivers are another way to cultivate more safe drivers by letting them know about traffic safety and improving their driving skills.
Considering prior causes showed that other factors, namely, fatigue and drowsiness, lack of skill in diagnosing traffic situations, slippery or tarred road surfaces, etc., have increased the odds of dying in a road traffic crash by 50%.To elaborate, it is believed that, after drunken driving, drowsiness is the most prominent cause of vehicle accidents.However, many experts believe this is only a conservative estimate, and the actual contribution of fatigue and sleepiness to vehicle accidents may be higher [28].Sleepiness is a component of sleep in the circadian rhythm of sleep and wakefulness.Drowsiness leads to driving automobile accidents because it can impair performance and ultimately lead to the inability to deal with falling asleep behind the wheel.Although sleeping is the most effective way to reduce drowsiness, sometimes it is unavoidable, particularly for professional drivers, to continue driving for some reasons like shift work [29].Accidents caused by fatigue and drowsiness are often severe and have a significant financial burden and catastrophic personal consequences.Therefore, researchers have proposed effective solutions to reduce this problem, including educational activities, behavior changes, and environmental changes [28].
Talking about direct causes, it can be inferred that delay in sighting was the only cause that significantly increased (1.35 times) the odds of a fatal crash compared to irregulation.Delayed vision can be due to drivers' health disorders, particularly adult attention-deficit/hyperactivity disorder (ADHD) or their visual impairment.Symptoms of ADHD namely lack of focus, hyperfocus and disorganization have been proven to be related with crash severity [30].Studies have proposed that impulsiveness and visual inattentiveness are the main contributions to the severity of car accidents in patients with ADHD.In addition, therapies that mitigate ADHD symptoms translate into more safe driving behavior and accordingly decreased rates of serious crash severity [31].Supplementary to this, it has been proved that drivers with poor visual acuity are more prone to road traffic crashes [32].Estimated number of crashes contributed to visual field defect has been reported to be 36% higher.With regard to protanopic color vision defect, it is not allowed for people with theses defect to obtain a commercial license since they cannot diagnosis red traffic lights [33].In the light of above-mentioned descriptions, mental stability and visual functioning of drivers seems to have inevitable results in road traffic crashes and would be fundamental issue that needs to be taken under more consideration.
Compared to a side-swipe collision, within a head-on collision followed by a fixed object collision, the odds of fatal crashes significantly increased by 3.35 times and 2.36 times, respectively.Meanwhile, rear-end and T-bone collisions were almost the same as side-swipe collisions regarding crash severity.Consequences of head-on collision could hurt the driver directly in numerous ways, exacerbating the crash outcome and even leading to fatal crashes.Head-on collisions are the type of crashes with the utmost severity and often lead to injuries and fatalities [34].
Compared to Tehran, the capital city of Iran, Fras, Khuzestan and then Isfahan were accounted for the most risky provinces in Iran where about 62% of all fatal crashes occurred in these provinces.Isfahan, Iran's top tourist destination, provides a classic tourist stop on a travel itinerary from northern cities of Iran to the southern tourist city of Shiraz in Fars province.In addition, these two provinces are attractive tourist destinations for outbound visitors.Accordingly, these two provinces, with high traffic volume and different driving characteristics (risky driving behaviors, drowsy driving, high speed, etc.), exhibit higher rates concerning crash severity and even fatality.On the other hand, in Khuzestan, as a Border city, drivers tend to use foreign cars, leading them to drive more speedily.Other studies show that high speed is crucial in causing severe crashes.Furthermore, a greater fatality rate in these three provinces could be attributed to the following issues: (1) emergency medical services performance.In this regard, the number of at-scene, on-transfer, and in-hospital deaths had better be considered, (2) unsafe roads, (3) higher rates of heavy vehicles, and pedestrian and pedestrian and motorcycle crashes.
The results also showed that commuting areas contributed to crash severity.Going into detail revealed that suburban regions were at least three times more likely to result in fatal crashes when compared to urban areas.It has been reported that crashes occurring on rural roads produce lucid trend patterns toward more severe and even fatal crashes.It is believed that the features of the rural highway, such as rural drivers' typical behaviors (less likely to wear a seat belt, more incredible driving speeds or stop at stop-sign intersections, etc.) and their characteristics (more older drivers or the adversity of reaching in time medical assistance in the time of crash) are leading factors to more frequent fatal crashes on rural roads [35,36].
Compared to the main street, a crash was more likely to involve fatality in an expressway, main road, side road, freeway, and rural road, respectively.These road types are commercial and in suburban areas.Alongside line marking in the aforementioned areas is a double solid line.And, as has already been shown in the previous results of this study, the crash outcome is more severe in suburban areas and roads with double solid lines.
In addition, considering shoulder condition and design of the roads, roads with unpaved shoulders and separated two-way roads contributed to higher risk.Road shoulder provides a necessary stopping lane and serves recovery for errant vehicles beforehand a potential crash occurs.Its omission could hence lead to more severe collisions.Furthermore, unseparated two-way roads, like roads with double solid lines, are more likely to have head-on collisions that are more prone to fatality.
In completing the crash-level variables, it should be mentioned that in addition to the factors discussed above, coincidences of multiple road defects, such as signs, geometric defects, etc., implied a higher risk of fatal crashes (Table 3).Road defects are those where a road design element transfers ambiguous information to drivers, resulting in driver error, or where a change in the road could have reduced the likelihood of a road accident.It has been previously reported that road environments that encourage risky driver behavior (e.g., by inspiring high traffic speeds) or fail to consider safety in all conditions (e.g., at night or in adverse weather conditions) increase the probability of a road accident and its severity indirectly.Hence, a road that is designed and regularly maintained according to operational and functional requirements is critical in influencing drivers' perceptions and resulting in safer roads for all users [37].It has been shown that the road environment element is in poor condition in developing countries due to worse road design and maintenance.In addition, defects of various traffic combinations requiring different infrastructure needs are commonly not observed on roads such as highspeed vehicles, heavy marketable traffic, bicyclists, pedestrians, and motorcycle users [38].However, the growing number of motorized vehicles in developing countries is outstripping the capacity of current transportation infrastructure, leading to increased accident rates and severity levels.

Vehicle-level variables
The following rows in Table 3 provide results regarding vehicle factors.It can be observed that vehicle safety equipment and moving direction were not associated with a fatal crash happening.On the other side, the categories highly related to crashes involving fatality were: heavy vehicle type, vehicles of risky colors with life of fifteen years and more, vehicles with no personal regional plaques, and vehicles with maneuvers such as stopping outside of the road, sudden starting, sudden stopping, and spiral movement.
Although heavy vehicle crashes are less frequent, these crashes are more severe to such an extent that approximately 18 percent of all fatal crashes in 2019 involved heavy vehicles [39].Intense exposure is the leading cause of severe injury or even death in heavy vehicle accidents.It is also noteworthy that these accidents often lead to the death of the users of the other vehicle [40].
The findings about silver color for cars, in particular, clearly contrast with results of a case-control study which concluded that silver vehicles were approximately 50% less prone to serious crashes compared to colored cars.The results of this study are biased due to not considering several critical confounding factors such as vehicle type and personality traits of drivers.It is stated that commercial vehicles that are more likely to severe crashes are predominantly white.Secondly, there might be a relationship between driving behavior and color choice.For instance, more careful drivers may prefer silver color.In contrast, in a paired case-control study, the authors concluded that vehicles with light colors were associated with less dangerous collisions.Although this study tried to account for particular driver and vehicle features and consider many confounders, it failed to consider unmeasurable or unmeasured confounders.It has also been stated that white color, black, blue, grey, green, silver, and red were associated with more serious crashes.This association was even more vital during daylight than in the dark or twilight times [41].
Consistency results exist about vehicle age.The studies assessing the impact of vehicle age on car collisions have found that older vehicles are more prone to be included in severe crashes.It has been proved that older vehicles, as compared to new ones, are more likely to develop defects in terms of safety, like brake failure and tire.On the other hand, older vehicles are less likely to have safety features.Safety equipment and its defects cause a crash and may increase its intensity [42].
The difference between personal and commercial vehicles could be attributed to the fact that commercial vehicles are heavy, and their drivers suffer from sleepiness and fatigue more than private vehicles.On the other hand, commercial cars are usually on highways, expressways, and main roads, which are more critical for intense crashes.In addition, unusual maneuvers such as stopping outside the lane, sudden starting, sudden stopping, and spiral movement are categorized as risky driving behaviors, strongly linked with crash severity.

Driver-level variables
Table 3 also presents results about driver-level factors.Driver fault status and gender were not significant in predicting a fatal crash.The categories with the highest odds ratio of deadly crashes were: divers with non-academic education and middle-income status, driver old age, no driving license, not using the seat belt, driver unconscious or lack of driving skills or violation of the law, and driver misconduct other than spiral movement or over speeding.
Driver education and income, as well as socioeconomic status (SES), play a crucial role in the breakthrough of traffic safety.Cognitive perception, which constructs the way of interpreting and understanding different situations and whether being obedient to rules, is closely related to SES.A driver with a high SES level would hardly ever be under too extreme fear and courage sense and perform more reasonably in a critical situation [43].Beyond behavioral factors, vehicle-related and contextual features can be attributed to the exacerbated risk among individuals with low SES.These people usually suffer from monetary crises to such an extent that they can barely cope with their day-to-day expenses.An inevitable result of being riddled with such unaffordability would possess a vehicle with no advanced safety equipment and a lower crash-test rating.On the other hand, are properties might also have relevance, as there is a striking difference in accessibility to hospital trauma centers between common and high-property areas.Limited access to trauma centers and specialists may increase crash severity and the following mortality rate [44].
It is undoubtedly true that, nowadays, increased traffic crashes among the elderly have become pervasive among a thorough of nations all around the globe [45].Suffering from musculoskeletal disorders and slowed physical activities, older people experience more severe crash outcomes than the middle-aged group in the case of traffic collisions.Furthermore, chronic diseases, the very epitome of osteoporosis, increase the rate of bone fracture and consequently extend the hospitalization period and mortality rate in this generation.This predicament imposes an undue financial burden on the healthcare system of each society by increasing medical costs.So, to stave off this deleterious condition and enhance traffic safety, particular policies had better be the matter of greater emphasis for this age group.
Consistent driving license results prove that unlicensed operators are more likely to be involved in a severe crash and engage in illegal behaviors such as red light running, speeding, drunken driving, and not using a seat belt.Also, these groups are more prone to be at fault than licensed drivers.Since unlicensed drivers are on the rise, measures such as increasing petrol enforcement, expanding the applied penalties, and promoting public knowledge about the dangers of driving without a license and vehicle impoundment need to be taken for this population [46].
Turning to seat belt use, it is evident that seat belt use can considerably decrease non-fatal and fatal injuries both in front and rear seat occupants.In a study, authors found that people in metropolitan and urban areas are likelier than those in rural areas to use seat belts.In addition, gross provincial product, educational level, and legalization were declared to be related to the use of seat belts [47].

Key results and insights
Table 4 summarizes the significant factors, their level, and the safest categories contributing to fatal crashes based on the multiple logistic regression model.For better figurative presentation, explored risk factors and corresponding odds ratios are illustrated in Fig. 2.

Strength and limitation
All registry system variables are presented in Table 1, whether they were used as an explanatory variables or not.Since most categories are based on international classifications, they can be considered referral documents for developing traffic crash registry systems in other countries.In this study, we evaluated unknown segments.Considering some limitations, such as defining upper and lower limits in recording the aforementioned quantitative variables, is suggested in designing, developing, or editing traffic crash registry systems.In addition, the registry system had better be provided via advanced features such as automatic fulfillment of road length and width or shoulder width by selecting road name and type.If so, researchers could use this critical information in the more complicated and specialized analysis.Furthermore, it is worse to notice that, although comparing factors in overall analysis is sound, subgroup analyses sometimes provide better specific information, such as modeling of factors affecting road traffic injuries in expressways' head-on collisions that may lead to more specified decisions for giving areas.So subgroup analyses regarding all significant identified aspects are suggested for further investigation.Since machine learning (ML) methods can be used for prediction peruses and overcome the limitations associated with traditional statistical models, applying ML approaches in recommended subgroup analyses would be of the utmost practicality.Beyond this, more specific analyses about passenger and pedestrian fatalities are also suggested.

Conclusions
In this study, the effect of seventy-one different features from the aspect of crash scene, vehicle, driver, passenger, and a pedestrian was assessed to find their connection with crash outcome in 384,614 collision crashes in the six provinces with the largest population of a developing country; Iran from 2015 to 2016.There was 32 variable to be significantly correlated with fatal crash occurrence.Although road traffic injuries contribute to a global problem, it is more challenging in low-and middle-income counties to such an extent that more than 90% of world fatalities due to collision crashes occur in these counties [48].Information regarding road collisions was available in the separate crash scene, vehicle, driver, passenger, and pedestrian databases, which are now combined.This provides an opportunity to compare all the factors in the overall analysis that may lead to comprehensive decisions.According to the multiple binary logistic regression model, many variables included in the analysis played a significant role in crash severity.The top factors with an odds ratio of at least two which contribute to fatal crashes are the presence of a passenger, unlicensed driving, illegal driving maneuver, head-on collision, crashes in suburban areas, the occurrence of multiple causes for collision, vehicles with not personal-regional plaques, presence of pedestrians, drivers with low-income jobs, driver misconduct, roads with double solid lines, non-residential areas, multiple road defects.Looking more closely at the most significant factors reveals that they are primarily from driving behavior (presence of passenger, unlicensed driving, illegal driving maneuver, occurrence of multiple causes for collision, vehicles with not personalregional plaques, presence of pedestrians, drivers with low-income jobs, driver misconduct, head-on collision), infrastructure design (roads with double solid lines and multiple road defects), and geometric road factors (crashes in suburban areas, non-residential areas).The quantitative values of the impact of the significant features obtained in this study can provide unique guides or recommendations for road managers and policymakers for prioritizing measures to prevent fatal crashes.We believe that the result of this study can be considered for proper and well-designed measures to prevent fatal crashes.

Fig. 1
Fig. 1 Flowchart of dataset preparation for modeling the contributing factors of fatal crashes lack of attention to the front, failure to yield right-of-way, failure to maintain vehicle control, changing direction abruptly, moving backward in reverse gear, failure to longitudinal distance control, deviation to the left, driving in the wrong direction, making incorrect left turn, violation of safe speed, failure to latitudinal distance control, Sudden opening of the vehicle door, deviation to the left due to overtaking, running red light, passing the prohibited place, turning in the prohibited place, vehicle technical defect, over speeding, lack of skills in driving, persistent technical defects of the vehicle, deviation to the right, violation of load regulations, violation of item 4 of the road safety, towing incorrectly, pedestrian fault, other NA lack of attention to the front, failure to yield right-of-way, failure to maintain vehicle control, changing direction abruptly, moving backward in reverse gear, failure to longitudinal distance controlright-of-way, failure to yield rightof-way, over speeding, failure to distance control while overtaking, running red light, passing the prohibited place, illegal overtaking, turning left or right in the prohibited place, turning in the prohibited place, drunken driving, lack of safety equipment for the season, not turning on the lights from sunset to sunrise, not using glasses while driving, defective vehicle lighting system at night, demonstrative movement, spiral movement, crossing the sidewalk NA spiral movement, over speeding, other Passenger first name, , elementary, cycle, middle school, diploma, B.Sc., A.Sc., M.Sc Passenger education based on ISCED isced0, isced1, isced2, isced3, isced5 Passenger education based on previous studies illiterate, primary, non-academic, academic

Fig. 2
Fig. 2 Explored risk factors and odds ratios in predicting road traffic fatalities in Iran, 2015-2016

Table 1
Variables description in Iranian Integrated Road Traffic Injury Registry System (2015-2016) -Crash database

Original variable name Original variable levels Modified variable name Modified variable levels
illiterate, literacy, elementary, cycle, middle school, diploma, B.Sc., A.Sc., M.Sc., PhD, hozavi Driver education based on ISCED isced0, isced1, isced2, isced3, isced5, isced6 Driver education based on previous studies illiterate, primary, nonacademic, academic Driver job self-employed, employee, jobless, housewife, military man, laborer, student, university student, driver, soldier, other NA jobs with high economic status, jobs with middle economic status, jobs with low economic status Driver age (yrs), cont NA NA child, adult, old Type of driving license class A, class B, class C, B1, B1-new, B2, B1-Temporary, B2-temporary, military, special, foreign, international, motorcycle, tractor, not seen, no license NA class A, class B, class C, motorcycle, no license Table 1 (continued) Driver license ID, cont

Table 1 (continued) Original variable name Original variable levels Modified variable name Modified variable levels
Stage 1.Multiple copies of the database are created in which the missing values are replaced.The imputed values are drawn based on the observed values and from the appropriate statistical models and the previous distribution.Each entirely imputed database is different from the other one.
Dotted lines indicate there are several types of classifications.The underlined phrases with equal styles or fonts represent the same classification of original variables and the modified ones str.String, cont.Continuous, NA Not applicable, yrs Years, P Person, T Tone a Phrases without underline considered as missing

Table 2
Explanatory variables summary in Iranian Integrated Road Traffic Injury Registry System (2015-2016) -crash database

Table 2
(continued) Freq.Frequency, Per.Percentage, yrs Years, SD Standard deviation, NA Not applicable

Table 3
Simple and Multiple logistic regression models in predicting fatality based on Iranian-Integrated Road Traffic Injury Registry System (2015-2016)

Table 3
(continued) OR Odds ratio, CI Confidence interval