Skip to main content

Completeness and accuracy of crash outcome data in a cohort of cyclists: a validation study



Bicycling, despite its health and other benefits, raises safety concerns for many people. However, reliable information on bicycle crash injury is scarce as current statistics rely on a single official database of limited quality. This paper evaluated the completeness and accuracy of crash data collected from multiple sources in a prospective cohort study involving cyclists.


The study recruited 2438 adult cyclists from New Zealand’s largest mass cycling event in November 2006 and another 190 in 2008, and obtained data regarding bicycle crashes that were attended by medical personnel or the police and occurred between the date of recruitment and 30 June 2011, through linkage to insurance claims, hospital discharges, mortality records and police reports. The quality of the linked data was assessed by capture-recapture methods and by comparison with self-reported injury data collected in a follow-up survey.


Of the 2590 cyclists who were resident in New Zealand at recruitment, 855 experienced 1336 crashes, of which 755 occurred on public roads and 120 involved a collision with a motor vehicle, during a median follow-up of 4.6 years. Log-linear models estimated that the linked data were 73.7% (95% CI: 68.0%-78.7%) complete with negligible differences between on- and off-road crashes. The data were 83.3% (95% CI: 78.9%-87.6%) complete for collisions. Agreement with the self-reported data was moderate (kappa: 0.55) and varied by personal factors, cycling exposure and confidence in recalling crash events. If self-reports were considered as the gold standard, the linked data had 63.1% sensitivity and 93.5% specificity for all crashes and 40.0% sensitivity and 99.9% specificity for collisions.


Routinely collected databases substantially underestimate the frequency of bicycle crashes. Self-reported crash data are also incomplete and inconsistent. It is necessary to improve the quality of individual data sources as well as record linkage techniques so that all available data sources can be used reliably.

Peer Review reports


Regular cycling provides health and other benefits [14]. However, in New Zealand, using a bicycle is not an attractive mode of travel for many people [5] and accounts for only 2% of total travel time [6]. Cycling is becoming more popular as a sport but just over one-fifth of adults reported participating in either road cycling or mountain biking at least once over twelve months in a recent national survey [7].

For many people, safety concerns are one of the major barriers to riding a bicycle [8, 9]. For each million hours that were spent cycling on New Zealand roads, according to the official statistics, 29 deaths or injuries resulted from collisions with a motor vehicle [10] and 31 injuries resulted in death or hospital inpatient treatment [11]. Furthermore, almost as many bicycle crashes occurred off-road [12].

However, current statistics typically refer to a single official data source, most commonly police crash reports and less frequently hospital records. These data sources are known to disproportionately undercount bicycle crashes [1315]. This is not surprising as many bicycle crashes do not come to the attention of the police or medical personnel, and this undercount amounted to 70% or more of self-reported crashes in overseas studies [16, 17]. While self-reports can provide information on unreported crashes, their validity may be questionable also, for example, due to nonresponse [18], failure to recall [19] and the influence of socially desirable responses [20]. For all these reasons, it has been proposed that “unattended” bicycle crash injuries are excluded when developing indicators of injury incidence [21].

Even for the crashes that were attended medically or by the police, routinely collected databases may not be complete [13, 15] and accurate [22]. Moreover, as the crash data are usually collected for specific administrative purposes, each data source typically captures only a fraction of all crashes [14]. Therefore, using multiple data sources through record linkage may provide a broader, more complete and truer picture of injury, at a relatively low cost.

This paper evaluated the completeness and accuracy of bicycle crash data collected by self-report and by record linkage drawing on four national, routinely collected databases.


Design, setting and participants

The Taupo Bicycle Study is a prospective cohort study of cyclists designed to examine factors associated with regular cycling and injury risk. The sampling frame comprised cyclists, aged 16 year and over, who enrolled online in the Lake Taupo Cycle Challenge. This is New Zealand’s largest mass cycling event, which is held each November and attracts about 10000 cyclists. Participants have varying degrees of cycling experience and they range from competitive sports cyclists and experienced social riders to relative novices of all ages.

Recruitment was undertaken at the time of the 2006 event for the majority of participants, as described, in detail, elsewhere [23]. In brief, email invitations, containing a hyperlink to an information page describing the study, were sent to 5653 participants who provided their email addresses at registration for the event. Those who agreed to take part in the study were taken to a page containing a web questionnaire and asked about demographic characteristics, general cycling activity, previous crash experience and use of injury preventive measures. The questionnaire was completed and submitted by 2438 cyclists (43.1% response rate). Another 190 cyclists were recruited from the 2008 event by including a short description about the study in the event newsletter. Ethical approval was obtained from the University of Auckland Human Participants’ Ethics Committee.

Crash outcome data

Crash outcome data were collected through record linkage to insurance claims, hospital discharges, mortality records and police reports, covering the period from the date of recruitment to 30 June 2011. Record linkage was undertaken by the data custodians using name, gender, date of birth and address as identifiers. All participants consented to link their data to these databases. In addition, a follow-up survey was conducted in December 2009.

Insurance claims

In New Zealand, the Accident Compensation Corporation (ACC) provides personal injury cover for all residents and temporary visitors to New Zealand no matter who is at fault. The claims database is a major source of information on relatively minor injuries with over 80% of the claims related to primary care (e.g., GPs, emergency room treatment) only [24].

Approval for record linkage was obtained from the ACC Research Ethics Committee. A probabilistic linkage followed by a clerical review was undertaken and all claims for bicycle crashes were extracted. The data extracted contain nature and mechanism of injury, health service utilisation and out of hospital cost. Crashes that occurred on public roads and crashes that involved a collision with a motor vehicle were identified from relevant variables as well as from the free text field describing the crash.

Hospital discharge and mortality data

These databases are maintained by the Ministry of Health’s Information Directorate. The National Minimum Dataset (NMDS) contains information about inpatients and day patients discharged after a minimum stay of three hours from all public hospitals and over 90% of private hospitals in New Zealand [25, 26]. The Mortality Collection contains information about all deaths registered in the country [27].

Participant data were matched to a National Health Index (NHI) number, a unique identifier assigned to every person who uses health and disability support services in New Zealand. An electronic match was made where possible, followed by two stages of manual matching for participants who could not be linked electronically. Of 2590 participants who were resident in New Zealand at recruitment, 99.0% were successfully matched. All hospital discharges and deaths due to injuries or other health conditions were extracted.

The hospital discharge data contain diagnoses and diagnostic and therapeutic procedures undertaken in each hospital visit, which are coded under ICD-10-AM. Cycle crashes were identified using the E codes V10-V19; those that occurred on public roads were identified using the E codes V10-V18.3-9, V19.4-6, V19.9; and those that involved a collision with a motor vehicle were identified using the E codes V12-V14, V19.0-2 and V19.4-6. Readmissions were identified as described previously [28] and excluded.

The mortality data contain the underlying cause of death which is coded under ICD-10-AM and is also described in free text fields. However, the coroners’ reports on the cause of injury death were available only up to 31 December 2008. All deaths due to a bicycle crash were identified from the available data.

Police reports

In New Zealand, it is mandatory that any fatal or injury crash involving a collision with a motor vehicle on a public road be reported to the police. A Traffic Crash Report is then completed and sent to the New Zealand Transport Agency where the data are entered in to the Crash Analysis System (CAS) database.

A deterministic linkage followed by a clerical review was undertaken and all bicycle collisions were extracted. The linked data contain location, time and circumstances of the crash, and severity of injury.

Follow-up survey

The survey was conducted in December 2009 using a web questionnaire. The questions asked included: the total number of bicycle crashes experienced during the preceding year, the number of crashes for which claims were lodged with ACC, the number of crashes requiring hospital admission, and the number of crashes that were reported to the police. The participants were also asked to indicate the degree of confidence they had regarding the accuracy of their answers to each question using a five-point scale (very unsure, quite unsure, about 50/50, quite sure, very sure). This confidence rating has been shown to be a useful indicator of recall accuracy for physical activity measures [29].

A total of 1537 participants (58.5%) completed the questionnaire, of whom 70 reported not cycling in the preceding year.


A capture-recapture analysis was undertaken to estimate the number of crashes that had occurred which were not identified through record linkage. In addition, the linked data were compared with the self-reported data collected in the follow-up survey.

Capture- recapture analysis

Capture-recapture methods were originally developed to estimate the size of an animal population, based on proportions of animals that were captured, marked, released and recaptured in two or more random samples. The procedure assumed closeness of the population, mark integrity, independence of the samples and equal probability of being captured in each sample [30]. Since then, similar methods have been applied in epidemiological studies [31].

For this analysis, the study sample was restricted to the 2590 participants who were resident in New Zealand at recruitment. For each participant, bicycle crashes identified from the different databases were matched based on the date of crash allowing for a two-day difference. Log-linear models were used to estimate missing crashes, taking into account possible associations across the databases. The models were fitted to the incomplete multiway contingency table with one missing cell corresponding to absence in all databases. The strength of evidence for each model was assessed using Akaike’s Information Criterion (AIC) and its weight. Based on the model averaged estimate and unconditional standard error, the frequency for the missing cell and its 95% confidence interval (CI) were calculated. Analyses were undertaken for bicycle crashes in general, and also for the specific categories of on-road crashes and crashes involving a collision with a motor vehicle.

Comparison with self-reports

This analysis was based on the 1456 participants who completed the follow-up questionnaire and reported cycling in the preceding year. As some participants may have experienced more than one crash during the specified period, the exact crash date was not asked in the questionnaire. As such, it was not possible to match the linked and self-reported data for each crash identified in the source databases. Instead, agreement was assessed on a person-to-person basis for each database as well as for the combined data. Agreement was established (1) if a participant reported at least one bicycle crash that required medical attention (that is, involved a claim lodged with ACC or required an admission to hospital) or reported to the police in the preceding year, and the linked data also showed at least one bicycle crash during the same period, or (2) if such a crash had not been experienced in the preceding year according to both the self-reported and linked data.

Cohen’s kappa coefficients were used to determine the degree of agreement. In addition, the sensitivity, specificity and predictive values of the linked data were calculated, assuming that self-reports were the gold standard. Analyses were undertaken for all crashes as well as those involving a collision with a motor vehicle. In addition, subgroup analyses were performed for all crashes to examine differences in agreement by participants’ demographic characteristics, amount of cycling, pre-existing medical conditions (heart attack, stroke, cancer, diabetes or high blood pressure) and confidence in recall.

SAS 9.2 (SAS Institute, Cary, North Carolina) and Microsoft Office Excel 2010 (Microsoft Corporation, Redmond, Washington) were used for all analyses.


The average age of the participants was 44.0 years (SD 10.4) and 72.4% were males (Table 1). About half the sample were university graduates (53.9%) and lived in least deprived neighbourhoods (49.9%), and 77.7% lived in main urban areas. On average, participants cycled 5.7 hours a week (SD 3.7; Quartile Range 5).

Table 1 Participants’ demographic characteristics

Bicycle crashes reported at the follow-up survey

Of the 1456 participants who completed the follow-up questionnaire and reported cycling in the preceding year, 432 reported experiencing one or more crashes in the preceding year (Table 2). There were a total of 784 self-reported crashes, of which 57.4% occurred on the road and 17.9% involved a collision with a motor vehicle. Based on the respondent reports, 29.1% of all crashes involved a claim lodged with ACC, 3.7% required hospital admission and 6.5% were reported to the police. A higher proportion of collisions involved medical or police attention with 35.0% resulting in claims to ACC, 7.1% requiring hospital admission and 32.9% being reported to the police.

Table 2 Bicycle crashes reported by participants at the follow-up survey

Bicycle crashes identified through record linkage

During a median follow-up of 4.6 years, only one death occurred due to a bicycle crash. As this fatal crash was recorded in both the Mortality Collection and NMDS databases, the former was excluded in further analysis.

Of the 2590 participants, 855 experienced 1336 bicycle crashes recorded in one or more databases, of which 755 (56.5%) occurred on public roads and 120 (9.0%) involved a collision with a motor vehicle. Only 18 crashes that involved a collision with a motor vehicle were identified from all databases (Table 3).

Table 3 Bicycle crashes matched across different data sources

Completeness of the linked data

As no crashes identified in both the NMDS and CAS databases were found to be missing in the ACC database, the models containing both interaction terms ACC*NMDS and ACC*CAS were excluded. Table 4 shows model-based estimates and unconditional standard errors from the remaining six models. From these data, it was estimated that 477 crashes in general (95% CI: 362–629), 258 on-road crashes (95% CI: 197–338) and 24 collisions (95% CI: 17–32) were missing from all databases. That is, the completeness of the linked data was 73.7% (95% CI: 68.0–78.7%) for all crashes, 74.5% (95% CI: 69.1–79.3%) for on-road crashes, and 83.3% (95% CI: 78.9–87.6%) for collisions.

Table 4 Capture-recapture models estimating missing crashes

Agreement between the linked and self-reported data

There was a moderate agreement (kappa 0.55) between the linked and self-reported data for all crashes as well as crashes involving collisions, with the highest level of agreement observed with the claims data (Table 5). For 4.7% of participants who reported at least one crash (that required medical attention or reported to the police) in the preceding year, there was no crash record in the linked data. In contrast, in 5.6% of participants who did not report a crash, one or more crashes were recorded in the linked data. This disagreement was less pronounced for collisions.

Table 5 Agreement between linked and self-reported data

When self-reports were considered as the gold standard, the linked data for all crashes had 63.1% sensitivity, 93.5% specificity, 59.0% positive predictive value (PPV) and 94.5% negative predictive value (NPV). The sensitivity was counter-intuitively lower but the specificity and predictive values were higher for collisions.

There were variations in agreement by participants’ demographic characteristics, amount of cycling, pre-existing health conditions and confidence in recalling crash events (Table 6). A higher level of agreement was associated with being younger, male and Māori, having a higher level of education, spending less time cycling, not having pre-existing medical conditions, being more socioeconomically deprived and having a higher degree of confidence regarding the accuracy of recall.

Table 6 Agreement between linked and self-reported data by participant characteristics


Main findings

Our findings revealed a substantial underestimation of bicycle crashes in administrative databases. The capture-recapture models estimated that the linked data were 73.7% complete for all crashes with negligible differences between on- and off-road crashes. The linked data were 83.3% complete for collisions. In comparison with self-reports, the linked data had 63.1% sensitivity, 93.5% specificity, and 59.0% PPV and 94.5% NPV for all crashes and 40.0% sensitivity, 99.9% specificity, 91.7% PPV and 97.7% NPV for collisions. Agreement between the linked and self-reported data varied across individual data sources and by participants’ demographic characteristics, amount of cycling, pre-existing medical conditions and recall confidence.

Strengths and limitations

The bicycle crash data collected in this prospective cohort study were obtained through record linkage to four routinely collected databases. This resource efficient method of data collection was designed to minimise potential biases associated with loss to follow-up [32]. This also provided a unique opportunity to evaluate the completeness of bicycle crash records across the spectrum of severity. To the best of our knowledge, this is the first study to compare official vs. self-reported data on bicycle crashes. However, some limitations need attention.

In our capture-recapture analysis, all underlying assumptions may not be completely satisfied. First, the assumption that the study population is closed may be violated by death or emigration of some participants, thereby underestimating the findings [33]. However, such underestimation may not be substantial as only six deaths were identified from the Mortality Collection database and only 23 participants provided an overseas address at the follow-up survey. Moreover, ACC support is available to New Zealand residents if they return home with an injury sustained during an overseas trip of up to six months (or longer if they are travelling on business and paying income tax).

Second, the assumption that each individual has equal probability to be captured in each database may be violated if the probability differs by crash, personal, social and health service factors [21, 34].

Third, the assumption that there are no lost marks between databases (mark integrity) may be violated if ascertainment of relevant cases is affected by inaccuracies in coding of bicycle crash data in each data source [22, 25, 35]. Miscoding may have resulted in failure to identify some bicycle crashes, thereby underestimating the capture-recapture counts [36]. This may account for the counter-intuitive finding of a lower sensitivity for collision crashes compared to all crashes. It is possible that some collisions were miscoded as ‘cyclist only’ crashes as observed previously in the UK [37]. Case ascertainment may also be affected by the quality of record linkage. Although the match rate by NHI was high (99%), mistakes may have occurred during extraction of bicycle crashes from each data source as a conservative approach was used to minimise false matches. While this served as a sensible strategy to estimate unbiased risk ratios in our subsequent analyses [38, 39], it may have underestimated the capture-recapture counts [36].

In addition, the self-reported data, although used as the gold standard in this study, may not be accurate. Inaccuracies in recall or provision of socially desirable responses may have resulted in under- or over-reporting of bicycle crashes. Cyclists generally experience frequent minor crashes, which could make recall of crash experiences during a specified period difficult. In previous research, the injury rates were significantly underestimated if the recall periods were two months or more [19] and the ability to recall was influenced by number, type and severity of injuries, and time elapsed since the injury event [40, 41]. Over-reporting, as observed in relation to motor vehicle crashes [42], is also likely as some reported crashes may have occurred prior to the specified recall period. Moreover, near misses or evasion crashes may have been reported as collisions with a motor vehicle. This may be another explanation for the counter-intuitively lower sensitivity for collisions compared to all crashes. While previous studies reported negative associations between self-reported motor vehicle crashes and social desirability scales [20, 43], little is known about how this bias might impact self-reported bicycle crashes.


Our findings extend the existing literature and inform future attempts to estimate the burden and risk of bicycle-related injuries. As in previous research [16, 17], our findings show that at most 30% of self-reported bicycle crashes were attended by medical personnel or the police. Even in this category of crashes, traditionally used databases may not be complete. Overseas research mainly assessed the completeness of hospital and police databases with varying results [4446]. A New Zealand study found that only 22% of hospital-reported bicycle crashes and 54% of those involving a collision with a motor vehicle appeared in police reports [14]. In this study, 13% of hospital reported crashes and 64% of collisions were linkable to police records whereas 39% of police reported crashes and 43% of collisions were linkable to hospital records.

Very few studies have estimated the completeness of combined databases. In a US study, hospital and police records, if combined, were 80% complete for automobile vs. child bicyclist collisions [44]. However, this level of completeness could be much lower if minor injuries were also considered. In this study, only 12% of bicycle crashes and 43% of collisions extracted from the linked data were recorded in hospital or police databases. To our knowledge, no other studies have assessed the completeness of individual or combined databases for relatively minor injuries.

Even though multiple data sources were used to capture a spectrum of injuries, our capture-recapture counts may still be underestimates given the limitations mentioned above. This is evident in comparisons with the self-reported data where the sensitivity of the linked data was lower than the completeness of data as estimated from the capture-recapture methods. If potential over-reporting is taken into account, however, the actual completeness of the linked data may lie between the two extremes, that is, between 63% and 74% for all crashes and between 40% and 83% for collisions.

In this study, agreement between the self-reported and official data was at most moderate although a higher level of agreement was observed in relation to motor vehicle crashes and unintentional injuries [47]. This may be because, compared to motor vehicle crashes, bicycle crashes occur more frequently and many are less severe, making them less likely to be recalled or coded properly. Our findings suggest that confidence ratings may be a useful tool in assessing the quality of recalled crash data as observed in previous research [29]. There were also variations in agreement by participants’ personal factors, in accordance with earlier research on motor vehicle crashes [48].


There were underestimations and inaccuracies of bicycle crash data collected from different sources. This underscores the need to consider and account for potential biases due to outcome misclassification in our subsequent analyses as well as in other similar studies. Our findings also emphasise the need to improve the quality of individual data sources, to develop comprehensive record linkage techniques, and to enhance the validity and reliability of self-reported information so that all available data sources can be used reliably in our future attempts to capture a complete picture of important injuries.



Accident Compensation Corporation


Crash Analysis System


Linked data


National Health Index


National Minimum Dataset


Negative predictive value


Positive predictive value


Standard deviation


Standard error




  1. Andersen LB, Schnohr P, Schroll M, Hein HO: All-cause mortality associated with physical activity during leisure time, work, sports, and cycling to work. Arch Intern Med. 2000, 160 (11): 1621-1628. 10.1001/archinte.160.11.1621.

    Article  CAS  PubMed  Google Scholar 

  2. Higgins PAT: Exercise-based transportation reduces oil dependence, carbon emissions and obesity. Environ Conserv. 2005, 32 (03): 197-202. 10.1017/S037689290500247X.

    Article  CAS  Google Scholar 

  3. Bassett DR, Pucher J, Buehler R: Walking, cycling, and obesity rates in Europe, North America, and Australia. J Phys Act Health. 2008, 5: 795-814.

    PubMed  Google Scholar 

  4. Oja P, Titze S, Bauman A, de Geus B, Krenn P, Reger-Nash B, Kohlberger T: Health benefits of cycling: a systematic review. Scand J Med Sci Sports. 2011, 21 (4): 496-509. 10.1111/j.1600-0838.2011.01299.x.

    Article  CAS  PubMed  Google Scholar 

  5. Tin Tin S, Woodward A, Thornley S, Ameratunga S: Cycling and walking to work in New Zealand, 1991–2006: regional and individual differences, and pointers to effective interventions. Int J Behav Nutr Phys Act. 2009, 6 (1): 64-10.1186/1479-5868-1186-1164.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Ministry of Transport: Comparing travel modes. 2012, Wellington: Ministry of Transport

    Google Scholar 

  7. Sport New Zealand: Sport and Recreation Profile: Cycling - Findings from the 2007/08 Active New Zealand Survey. 2009, Wellington: Sport New Zealand

    Google Scholar 

  8. Kingham S, Koorey G, Taylor K: Attracting the next 10% of cyclists with the right infrastructure. New Zealand Cycling Conference: 12–13 November 2009; New Plymouth. 2009

    Google Scholar 

  9. Mackie H: 'I want to ride my bike': overcoming barriers to cycling to intermediate schools. New Zealand Transport Agency Research Report No. 380. 2009, New Zealand Transport Agency: Wellington

    Google Scholar 

  10. Ministry of Transport: Risk on the road. Pedestrians, cyclists and motorcyclists. 2012, Wellington: Ministry of Transport

    Google Scholar 

  11. Tin Tin S, Woodward A, Ameratunga SN: Injuries to pedal cyclists on New Zealand roads, 1988–2007. BMC Public Health. 2010, 10: 655-610.1186/1471-2458-1110-1655.

    Article  PubMed  PubMed Central  Google Scholar 

  12. Munster D, Koorey G, Walton D: Role of road features in cycle-only crashes in New Zealand. Transfund New Zealand Research Report No. 211. 2001, Transfund New Zealand: Wellington

    Google Scholar 

  13. Elvik R, Mysen AB: Incomplete accident reporting: Meta-analysis of studies made in 13 countries. Transp Res Rec. 1999, 1665: 133-140. 10.3141/1665-18.

    Article  Google Scholar 

  14. Langley JD, Dow N, Stephenson S, Kypri K: Missing cyclists. Inj Prev. 2003, 9 (4): 376-379. 10.1136/ip.9.4.376.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Tercero F, Andersson R: Measuring transport injuries in a developing country: an application of the capture–recapture method. Accid Anal Prev. 2004, 36 (1): 13-20. 10.1016/S0001-4575(02)00109-4.

    Article  PubMed  Google Scholar 

  16. de Geus B, Vandenbulcke G, Int Panis L, Thomas I, Degraeuwe B, Cumps E, Aertsens J, Torfs R, Meeusen R: A prospective cohort study on minor accidents involving commuter cyclists in Belgium. Accid Anal Prev. 2012, 45: 683-693.

    Article  PubMed  Google Scholar 

  17. Hoffman MR, Lambert WE, Peck EG, Mayberry JC: Bicycle commuter injury prevention: It is time to focus on the environment. J Trauma. 2010, 69 (5): 1112-1119. 10.1097/TA.0b013e3181f990a1.

    Article  PubMed  Google Scholar 

  18. Tivesten E, Jonsson S, Jakobsson L, Norin H: Nonresponse analysis and adjustment in a mail survey on car accidents. Accid Anal Prev. 2012, 48: 401-415.

    Article  PubMed  Google Scholar 

  19. Jenkins P, Earle-Richardson G, Slingerland DT, May J: Time dependent memory decay. Am J Ind Med. 2002, 41 (2): 98-101. 10.1002/ajim.10035.

    Article  PubMed  Google Scholar 

  20. af Wåhlberg AE, Dorn L, Kline T: The effect of social desirability on self reported and recorded road traffic accidents. Transp Res Part F Traffic Psychol Behav. 2010, 13 (2): 106-114. 10.1016/j.trf.2009.11.004.

    Article  Google Scholar 

  21. Cryer C, Langley J: Developing indicators of injury incidence that can be used to monitor global, regional and local trends. 2008, Dunedin: Injury Prevention Research Unit, University of Otago

    Google Scholar 

  22. Davie G, Langley J, Samaranayaka A, Wetherspoon ME: Accuracy of injury coding under ICD-10-AM for New Zealand public hospital discharges. Inj Prev. 2008, 14 (5): 319-323. 10.1136/ip.2007.017954.

    Article  CAS  PubMed  Google Scholar 

  23. Thornley SJ, Woodward A, Langley JD, Ameratunga SN, Rodgers A: Conspicuity and bicycle crashes: preliminary findings of the Taupo Bicycle Study. Inj Prev. 2008, 14 (1): 11-18. 10.1136/ip.2007.016675.

    Article  CAS  PubMed  Google Scholar 

  24. Accident Compensation Corporation: Annual Report 2012. 2012, Wellington: ACC

    Google Scholar 

  25. Health Outcomes International Pty Ltd: Methods and systems used to measure and monitor occupational disease and injury in New Zealand: NOHSAC Technical Report 2. 2005, Wellington: National Occupational Health and Safety Advisory Committee (NOHSAC)

    Google Scholar 

  26. Ministry of Health: National Minimum Dataset (Hospital Inpatient events): Data Mart—Data Dictionary V7.5. 2012, Wellington: Ministry of Health

    Google Scholar 

  27. Ministry of Health: Mortality Collection Data Dictionary Version 1.3. 2009, Wellington: Ministry of Health

    Google Scholar 

  28. Davie G, Samaranayaka A, Langley JD, Barson D: Estimating person-based injury incidence: accuracy of an algorithm to identify readmissions from hospital discharge data. Inj Prev. 2011, 17 (5): 338-342. 10.1136/injuryprev-2011-040090.

    Article  PubMed  Google Scholar 

  29. Cust A, Armstrong B, Smith B, Chau J, van der Ploeg H, Bauman A: Self-reported confidence in recall as a predictor of validity and repeatability of physical activity questionnaire data. Epidemiology. 2009, 20 (3): 433-441. 10.1097/EDE.0b013e3181931539.

    Article  PubMed  Google Scholar 

  30. Cook LM, Brower LP, Croze HJ: The accuracy of a population estimation from multiple recapture data. J Anim Ecol. 1967, 36 (1): 57-60. 10.2307/3014.

    Article  Google Scholar 

  31. Hook EB, Regal RR: Capture-recapture methods in Epidemiology: Methods and limitations. Epidemiol Rev. 1995, 17 (2): 243-264.

    CAS  PubMed  Google Scholar 

  32. Greenland S: Response and follow-up bias in cohort studies. Am J Epidemiol. 1977, 106 (3): 184-187.

    CAS  PubMed  Google Scholar 

  33. Hook EB, Regal RR: Internal validity analysis: A method for adjusting capture-recapture estimates of prevalence. Am J Epidemiol. 1995, 142 (Supplement 9): S48-S52. 10.1093/aje/142.Supplement_9.S48.

    Article  CAS  PubMed  Google Scholar 

  34. Hauer E, Hakkert A: Extent and some implications of incomplete accident reporting. Transp Res Rec. 1988, 1185: 1-10.

    Google Scholar 

  35. McDonald G, Davie G, Langley J: Validity of police-reported information on injury severity for those hospitalized from motor vehicle traffic crashes. Traffic Inj Prev. 2009, 10 (2): 184-190. 10.1080/15389580802593699.

    Article  PubMed  Google Scholar 

  36. Brenner H: Effects of misdiagnoses on disease monitoring with capture-recapture methods. J Clin Epidemiol. 1996, 49 (11): 1303-1307. 10.1016/0895-4356(95)00026-7.

    Article  CAS  PubMed  Google Scholar 

  37. Ward H, Lyons RA, Thoreau R: Under-reporting of road casualties - Phase 1. Road Safety Research Report No. 69. June 2006, London: Department for Transport

    Google Scholar 

  38. Howe GR: Use of computerized record linkage in cohort studies. Epidemiol Rev. 1998, 20 (1): 112-121. 10.1093/oxfordjournals.epirev.a017966.

    Article  CAS  PubMed  Google Scholar 

  39. Blakely T, Salmond C: Probabilistic record linkage and a method to calculate the positive predictive value. Int J Epidemiol. 2002, 31 (6): 1246-1252. 10.1093/ije/31.6.1246.

    Article  PubMed  Google Scholar 

  40. Langley J, Cecchi J, Williams S: Recall of injury events by thirteen year olds. Methods Inf Med. 1989, 28 (1): 24-27.

    CAS  PubMed  Google Scholar 

  41. Warner M, Schenker N, Heinen MA, Fingerhut LA: The effects of recall on reporting injury and poisoning episodes in the National Health Interview Survey. Inj Prev. 2005, 11 (5): 282-287. 10.1136/ip.2004.006965.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. af Wåhlberg AE: On the validity of self-reported traffic accident data. Safety on Road International Conference SORIC'02: 2002; Manama, Bahrain. 2002

    Google Scholar 

  43. Lajunen T, Corry A, Summala H, Hartley L: Impression management and Self-Deception in traffic behaviour inventories. Pers Individ Dif. 1997, 22 (3): 341-353. 10.1016/S0191-8869(96)00221-8.

    Article  Google Scholar 

  44. Dhillon PK, Lightstone AS, Peek-Asa C, Kraus JF: Assessment of hospital and police ascertainment of automobile versus childhood pedestrian and bicyclist collisions. Accid Anal Prev. 2001, 33 (4): 529-537. 10.1016/S0001-4575(00)00066-X.

    Article  CAS  PubMed  Google Scholar 

  45. Cryer PC, Westrup S, Cook AC, Ashwell V, Bridger P, Clarke C: Investigation of bias after data linkage of hospital admissions data to police road traffic crash reports. Inj Prev. 2001, 7 (3): 234-241. 10.1136/ip.7.3.234.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Juhra C, Wieskötter B, Chu K, Trost L, Weiss U, Messerschmidt M, Malczyk A, Heckwolf M, Raschke M: Bicycle accidents – Do we only see the tip of the iceberg?: A prospective multi-centre study in a large German city combining medical and police data. Injury. 2012, 43 (12): 2026-2034. 10.1016/j.injury.2011.10.016.

    Article  CAS  PubMed  Google Scholar 

  47. Begg DJ, Langley JD, Williams SM: Validity of self reported crashes and injuries in a longitudinal study of young adults. Inj Prev. 1999, 5 (2): 142-144. 10.1136/ip.5.2.142.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. McGwin G, Owsley C, Ball K: Identifying crash involvement among older drivers: agreement between self-report and state records. Accid Anal Prev. 1998, 30 (6): 781-791. 10.1016/S0001-4575(98)00031-1.

    Article  PubMed  Google Scholar 

Pre-publication history

Download references


We thank the participating cyclists and organisers of the Lake Taupo Cycle Challenge for their support, and Professor John Langley, Professor Anthony Rodgers and Dr Simon Thornley for their initial contribution to the study. Our thanks also go to the Accident Compensation Corporation, Ministry of Health and New Zealand Transport Agency for provision of bicycle crash data. This work was supported by the Health Research Council of New Zealand [grant number 09/142].

Author information

Authors and Affiliations


Corresponding author

Correspondence to Sandar Tin Tin.

Additional information

Competing interests

No competing interests including financial competing interests.

Authors’ contributions

STT contributed to the conception and design of the study, acquisition, analysis and interpretation of data and drafting of the manuscript. AW and SA contributed to the conception and design of the study, interpretation of data and revision of the manuscript. All authors read and approved the final manuscript.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tin Tin, S., Woodward, A. & Ameratunga, S. Completeness and accuracy of crash outcome data in a cohort of cyclists: a validation study. BMC Public Health 13, 420 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: