To our knowledge, this is the first study to validate primary care smoking data at the regional level. These results show that estimates of regional smoking prevalence from THIN are highly comparable to the corresponding estimates from the current main source of such data. In most regions, smoking prevalence based on THIN data was similar to that found by the GLF from 2006 onwards. Primary care data could therefore be used to help target tobacco control initiatives at the areas with the highest smoking prevalence and to monitor prevalence across regions.
The main limitation of our study is that we were unable to compare THIN data with the corresponding data for all of the UK's regions. The GLF covers Great Britain only, and therefore we could not validate prevalence data for Northern Ireland. However, our results were generally consistent across all regions that were included, and it is likely that THIN smoking prevalence estimates for Northern Ireland are similarly accurate. Further to this, we were unable to explore the comparability of THIN and GLF prevalence estimates beyond 2008. Estimates from these two data sources were similar in the final three years of the study only; further research will be required in the future to ascertain whether this agreement is maintained in subsequent years.
A further limitation of our study is that the GLF and THIN may underestimate smoking prevalence, as both GLF respondents and general practice patients do not have their smoking status biochemically validated. However, the high costs associated with such validation mean that it is extremely difficult to obtain it for such large samples. In addition, because there is considerable variation in the completeness of recording between UK general practices, these results are not necessarily generalisable across all practices.
A final limitation of this study is that the significantly diminished sample sizes of the GLF at the regional level mean that there may be significant error in its estimates. However, during the study period the GLF was the largest survey providing regional prevalence data for Great Britain, and we believe that this is therefore the most appropriate comparator.
Despite the diminished sample size of the THIN data at the regional level, the results of this study are broadly consistent with those of the previous validation study of these data carried out at the national level. As at the national level, prevalence estimates based on THIN from most regions were found to be similar to those based on the GLF from 2006 .
The convergence in prevalence estimates from THIN and the GLF is almost certainly a result of the voluntary, pay-for-performance general practice contract introduced in 2004 . The contract requires GPs to record their patients' smoking status at least every 27 months (every 15 months for patients with specified chronic diseases) and has been taken up by almost all GPs .
Convergence between the two datasets by 2006 was not observed in all regions; there was greater discrepancy between the data sources for the West Midlands, Yorkshire and the Humber and Wales. Regional GLF data are based on small sample sizes, with resultant higher sampling error, as demonstrated by the wide confidence intervals, and any discrepancy between THIN and GLF estimates may reflect uncertainty associated with the GLF data rather than inadequacy of estimates from THIN. Further to this, while we were able to confirm that THIN is representative regionally in terms of age and sex, we have not assessed representativeness in terms of other factors such as social class. This may also account for some of the discrepancy in the three aforementioned regions. That even in these regions, the THIN estimates in two of the final three years (Yorkshire and Humber), the final year (West Midlands) and the final two years (Wales) of the study were within the confidence intervals of the GLF estimates demonstrates that estimates from GLF and THIN for these regions may indeed be comparable. The discrepancy in the final year of data for Yorkshire and Humber may be due to young adults being underrepresented in the THIN population of this region in the final year of the study (as shown in additional file 1).
There are several advantages to using THIN prevalence data compared with the national survey data. THIN data are routinely collected, are released 3-4 times per year, and have a lag of only 3-8 months before data become available . A further advantage is its size; the standard error of THIN's smoking prevalence estimates is significantly smaller than those of the GLF at a national level . At a regional level, GLF estimates are prone to more error due to much reduced sample sizes and confidence intervals are so wide, as demonstrated in Figure 1, that changes from year to year will be difficult to detect; therefore the large sample size in THIN is extremely valuable. Further to this, THIN provides monthly data, which is particularly useful in the evaluation of short term impacts of tobacco control initiatives.
Based on the THIN data, it was found that in 2008 Scotland (24%), the North West (23.5%), and Northern Ireland (23.5%) had the highest smoking prevalence in the UK. The East of England (19%), the West Midlands (19%) and South East England (19%) had the lowest prevalence. There remains substantial variation in smoking prevalence between the regions, with higher prevalence often being observed in regions with the lowest per capita disposable income . Smoking is an important contributor to health inequalities [20, 21]. Therefore, reducing regional differences in smoking prevalence will contribute to alleviating health inequalities in the UK. This study indicates that THIN may be a useful source of data for monitoring these regional differences.
To our knowledge, the current study and that by Szatkowski et al. are the first to explore the possibility of using primary care data to monitor smoking prevalence ; our results indicate that primary care data are a potentially valuable source of such information. Previous research suggests that surveys which monitor smoking prevalence in EU Member States often have small sample sizes and are irregularly carried out . This suggests that the way in which smoking prevalence is monitored internationally has similar limitations to the way it is currently monitored in Britain. Future research exploring the possibility of using primary care data to monitor smoking prevalence in countries other than Britain may therefore be warranted.