Online Readability of COVID-19 Health Information

Introduction: The internet is now the rst line source of health information for many people worldwide. In the current Coronavirus Disease 2019 (COVID-19) global pandemic, health information is being produced, revised, updated and disseminated at an increasingly rapid rate. The general public are faced with a plethora of misinformation regarding COVID-19 and the readability of online information has an impact on their understanding of the disease. The accessibility of online healthcare information relating to COVID-19 is unknown. Methods: The Google® search engine was used to collate the rst twenty webpage URLs for three individual searches for ‘COVID’, ‘COVID-19’, and ‘coronavirus’ from Ireland, the United Kingdom, Canada and the United States. The Gunning Fog Index (GFI), Flesch-Kincaid Grade (FKG) Score, Flesch Reading Ease Score (FRES), Simple Measure of Gobbledygook (SMOG) score were calculated to assess the readability. Results: There were poor levels of readability webpages reviewed, with only 17.2% of webpages at a universally readable level. There was a signicant difference in readability between the different webpages based on their information source (p <0.01). Public Health organisations and Government organisations provided the most readable COVID-19 material, while digital media sources were signicantly less readable. There were no signicant differences in readability between regions. Conclusion: Much of the general public have relied on online information during the pandemic. Information on COVID-19 should be made more readable, and those writing webpages and information tools should ensure universal accessibility is considered in their production. Governments and healthcare practitioners should have an awareness of the online sources of information available, and ensure that readability of our own productions is at a universally readable level which will increase understanding and adherence to health guidelines. and GFI scores. Shapiro-Wilk test determined data distribution. Mean (SD) was used for normally distributed data, while median (range) was used for skewed data. Spearman’s (for skewed) and Pearson’s correlation coecient (for normally distributed) was used to assess association between readability scores. ANOVAs and Kruskall-Wallis tests were used to compare differences between the results for univariate group analysis. A 5% level of signicance was used for all statistical tests. All statistical analysis was performed using GraphPad Prism software (La CA, 2020),


Introduction
The Coronavirus Disease 2019 (COVID-19) pandemic has led to an expected increase in the level of online searches on the condition. Internet users are now frequently searching for health related information and as a tool to answer questions about symptoms, diagnoses and treatment 1 . Social distancing, lockdowns and self-isolation policies worldwide have also meant patients' access to their primary care has decreased and reliance on online information has increased. This is re ected in the rise in Google® Trends searches for 'coronavirus', 'COVID' and 'COVID-19' in recent months 2 .
The internet as a source of health information is unregulated and the quality, reliability, and accessibility to the reader is variable. Many webpages provide inaccurate or questionable information and this can be harmful 3 . A small number of studies have already reported on the quality of COVID-19 related health information 4 , and indeed the misinformation that has appeared on webpages and in particular on social media in recent months 3,5 . The quality of information relating to COVID-19 assessed found that there are often discrepancies between health information issued by public health organisation and general information available on other digital media 6 .
Several tools are available to assess the readability of information, such as the Gunning Fox Index (GFI), the Flesch Reading Ease Score (FRES), the Flesch-Kincaid Grade (FKG) and the Simple Measure of Gobbledygook (SMOG) score 7 . These tools are established validated readability tools and are validated in health information studies and the English language 8 . The readability of health information related to COVID-19 has not been published. We sought to evaluate the readability of online information relating to COVID-19 in four English speaking regions: Ireland, the United Kingdom, Canada and the United States.

Webpage Search and Identi cation
The Googleâ search engine was used to collate the rst twenty webpage URLs for three individual searches for 'COVID', 'COVID-19', and 'coronavirus'. The searches were conducted from geolocation search engine settings to re ect the webpages found in Ireland, the United Kingdom, Canada and the United States. All searches were conducted on 17 th April 2020. When searching for information on the internet users typically will pick one of the rst ve search results, and will typically rephrase their search criteria instead of proceeding to the second page (or further) 9 .

Readability Assessment Tools
Four scores were used to calculate readability of the webpages; the Gunning Fog Index (GFI), the Flesch Kincaid Index (FKG) Score and Flesch Reading Ease Score (FRES) and the Simple Measure of Gobbledygook (SMOG) Index (Appendix 1). To ensure consistency and avoid human error the readability tests were done using an online readability calculator to provide FRES, FKG, GFI and SMOG scores 10 . All webpages were screened by the readability tool and hyperlinks, non-standard text, abbreviations and author names were not included in the analysis to prevent low-skewing of results.

Statistical Analysis
Descriptive statistics were calculated for SMOG, FRES, FKG and GFI scores. Shapiro-Wilk test determined data distribution. Mean (SD) was used for normally distributed data, while median (range) was used for skewed data. Spearman's (for skewed) and Pearson's correlation coe cient (for normally distributed) was used to assess association between readability scores. ANOVAs and Kruskall-Wallis tests were used to compare differences between the results for univariate group analysis. A 5% level of signi cance was used for all statistical tests. All statistical analysis was performed using GraphPad Prism software

Results
The searches were performed using the keywords: coronavirus, COVID, COVID-19. The rst 20 webpages were collated from each search and the search was conducted geolocated to Ireland, the United Kingdom, Canada and the United States, totalling 240 webpages (Appendix 2). Of the 240 webpages analysed 53% (n=127) were government organisations or public health organisation webpages, 29% (n=69) were digital or social media webpages, 5% (n=11) were from scienti c or educational institutions and 14% (n=33) were from other sources (Table 1a). There was a signi cant differences between regional spread of webpage sources (ANOVA, p<0.03), and there was a matching inverse correlation between webpage sources between countries (Spearman correlation p<0.07, -.2) and between continents (p<0.03, -.2).
FRES results were parametric, while FKG, GFI and SMOG scores were all non-parametric. Only 17.2% (n=165) of all the readability scores analysed demonstrated a universally readable level. 19% (n=45) of FRES scores were at a universally readable level (>60), 32% (n=77) of FKG scores (target <8), 37% (n=88) of GFI scores (target <8), and only 30% (n=73) of SMOG scores were at a universally readable levels (<10). The mean readability scores for webpages searched from all regions were below the standard universal readability levels, and there were no signi cant differences comparably between regions (Table  1b).
There were signi cant differences between the readability of webpages depending on the information source for all readability scores FRES (p<0.02), FKG (p<0.01), GFI (p<0.0004), and SMOG (p<0.0001) ( Table 1c). The most readable sources across the majority of the scores were webpages issued by government and public health organisations.
The association between readability scores and rankings were moderately correlated, in particular relating to SMOG scores (Spearman correlation p <0.05, .225). All four readability scores (FRES, FKG, GFI and SMOG) correlated with each other signi cantly (Spearman's correlation p<0.01, .409 to .927). There was a positive association between source of information category and ranking of the webpage on the search engine results (Spearman's correlation p<0.04, 0.2). pneumonia, and the inconsistent and sometimes dangerous information and misinformation that is occurring online, in particular on social media 11 . A fundamental necessity to understanding and engaging with health information is the accessibility and readability of the information and while there is a pressure and immediacy to publish information at short notice, readability should be considered when producing health literature and information 11 .
The webpages analysed were mostly below a universal level for readability. That the best performing readability score found only 37% of webpages readable to a universal audience does not re ect well for the health information produced and disseminated online. This poor readability level affects understanding of the health information; resulting in poor adherence to hygiene measures, socialdistancing measures, and further public health recommendations 5 .
Webpages most likely to be viewed are webpages on the rst page of search results 9 , making website rankings an important factor for consideration 12 . The moderately positive correlation between source type and ranking of webpages on the search results is reassuring as the majority of webpages were published by public health organisations or government bodies.
Search engines have the ability to manipulate ranking settings, and sponsored search results can often tamper with what audiences see rst 12 . Google®, has been making corporate decisions to arti cially rank high-pro le health information from respectable prevalence such as the World Health Organisation since early March 2020 13 . This might explain why Government and Public Health bodies account for 53% (n=127) of search results, and while this is reassuring because readability tends to be higher from information from those origins, the mean readability scores in this study remain poor. These differences seen between countries and continents in both the type of source information available is worth considering, given that there is a clear difference in readability between sources.
The correlations between the various readability scores was reassuring and showed that while there are some differences that the trend in detecting poor readability was similar between tests. While much has been published in the last few weeks on the quality of health information and the misinformation relating to COVID-19, this is the rst assessment of readability of online information on COVID-19.
We acknowledge the limitations of this study. There are a number of weaknesses associated with each of the readability scores 14,15 . The tests rely on numbers of words in sentences, or syllables in words, which may not always re ect the reading level. The scores do not consider layout, infographics or gures that often help accessibility and understanding of accompanying literature. Like all infodemiology research the nature of researching online health information is limited by the constantly changing, revising and updating of online material. This study may have different results if repeated at another time.

Conclusion
The majority of webpages relating to COVID-19 are not at a universal reading level in four major English speaking regions. However, reassuringly most webpages originated from public health organisations and government bodies. While there is an urgency in a global pandemic to publish guidance and health information, there is an onus on publishers from all information sources to publish information that is readable for all levels of comprehension, which will in turn lead to better levels of education and adherence to guidance.

Declarations
Ethical Approval or Consent None Funding or Sponsorship None

Con ict of Interest None
Data Repository Original data can be provided

Author Contributions
APW conducted data collection, data analysis, manuscript writing and contributed to study design. MJC conducted data analysis and paper writing, AO'N and MO'D were involved with manuscript writing, KPT did data analysis and collection, CM, SJM and EdB conceived study design and wrote the manuscript, with nal oversight.