The correlation between Google trends and salmonellosis

Background Salmonella infection (salmonellosis) is a common infectious disease leading to gastroenteritis, dehydration, uveitis, etc. Internet search is a new method to monitor the outbreak of infectious disease. An internet-based surveillance system using internet data is logistically advantageous and economical to show term-related diseases. In this study, we tried to determine the relationship between salmonellosis and Google Trends in the USA from January 2004 to December 2017. Methods We downloaded the reported salmonellosis in the USA from the National Outbreak Reporting System (NORS) from January 2004 to December 2017. Additionally, we downloaded the Google search terms related to salmonellosis from Google Trends in the same period. Cross-correlation analysis and multiple regression analysis were conducted. Results The results showed that 6 Google Trends search terms appeared earlier than reported salmonellosis, 26 Google Trends search terms coincided with salmonellosis, and 16 Google Trends search terms appeared after salmonellosis were reported. When the search terms preceded outbreaks, “foods” (t = 2.927, P = 0.004) was a predictor of salmonellosis. When the search terms coincided with outbreaks, “hotel” (t = 1.854, P = 0.066), “poor sanitation” (t = 2.895, P = 0.004), “blueberries” (t = 2.441, P = 0.016), and “hypovolemic shock” (t = 2.001, P = 0.047) were predictors of salmonellosis. When the search terms appeared after outbreaks, “ice cream” (t = 3.077, P = 0.002) was the predictor of salmonellosis. Finally, we identified the most important indicators of Google Trends search terms, including “hotel” (t = 1.854, P = 0.066), “poor sanitation” (t = 2.895, P = 0.004), “blueberries” (t = 2.441, P = 0.016), and “hypovolemic shock” (t = 2.001, P = 0.047). In the future, the increased search activities of these terms might indicate the salmonellosis. Conclusion We evaluated the related Google Trends search terms with salmonellosis and identified the most important predictors of salmonellosis outbreak. Supplementary Information The online version contains supplementary material available at 10.1186/s12889-021-11615-w.


Introduction
Salmonella infection (salmonellosis) is a common infectious disease caused by Salmonella bacteria. Salmonella bacteria belong to gram-negative bacteria in the Enterobacteriaceae family [1]. This kind of bacteria lives in the intestines and is transmitted by the faecal-oral transmission route [2]. The foods that can be contaminated with Salmonella include raw meat, poultry, seafood, raw eggs, fruits, and vegetables [3]. The symptoms of Salmonella infection are classified as gastroenteritis, including nausea, abdominal pain, diarrhoea, vomiting, headache, fever, and blood in stool [4]. Typhoid fever is a serious intestinal infection caused by S. typhi [5]. The complications of S. typhi infection include dehydration, bacteremia, reactive arthritis, etc. A total of 2269 outbreaks and 48,178 salmonellosis were reported in the USA from 2009 to 2018. Therefore, it is important to prevent Salmonella transmission with appropriate methods, such as washing hands after using toilets, and avoiding eating raw eggs. Because the sources of Salmonella infection are everywhere, the transmission route (faecal-oral transmission) is difficult to break. In addition, healthy people are generally susceptible to this infection. Therefore, there would be many infected people worldwide.
Internet search is a new method to monitor the outbreak of infectious diseases [6]. The internet-based surveillance system using internet data is logistically advantageous and economical to identify term-related diseases. Traditional systems to monitor infectious diseases depend on laboratory tests, physician diagnoses, and data collection by the authorities. Thus, the reporting on the emerging outbreak of infectious diseases is often delayed by one to 2 weeks in the traditional surveillance system. In the last two decades, the internet has become an important medium for the general public, public health practitioners, and doctors to obtain healthrelated information [7]. Internet-based surveillance systems could forecast the outbreak of infectious diseases by following online web-based activities. Therefore, it seems to be a promising strategy to monitor disease outbreaks based on internet search behaviour. Valdivia et al. used Google Flu Trends to monitor the activity of influenza in Europe [8]. They found a relationship between the disease activity peak and flu-related internet search. Seifter et al. used Google Trends to predict the epidemiology of Lyme disease [9]. Some Google Trends terms such as "tick bite" and "cough" were considered indicators of Lyme disease epidemiology. Ayyoubzadeh et al. used Google Trends to evaluate the COVID-19 incidence in Iran [10]. The researchers concerned the first wave of COVID-19 in Iran. They found that Google search terms, including handwashing, hand sanitizer, and antiseptic topics, were related to COVID-19 prevalence in Iran. The associations between COVID-19 cases and the search mentioned above terms were related to media discourse on non-pharmaceuticals interventions to mitigate the pandemic. Internet-based surveillance systems could be a useful method to evaluate outbreaks of infectious disease. Google Trends analyses the popularity of top search queries in Google Search.
Previous studies have not identified the relationship between salmonellosis and Internet searches. In this study, we examined the relationship between salmonellosis and Google Trends in the USA from January 2004 to December 2017. We hoped to reveal the correlation between Internet search trend and salmonellosis. In addition, we explored the most important Google search terms that indicate salmonellosis with multiple regression analysis. We wish to reveal the predictors of salmonellosis. Furthermore, we discovered the relationship between Google search terms with Shigella and E. coli infection to validate the function of Google Trends. We prospected Internet search trend could provide a useful strategy to monitor the infectious disease outbreak.

The related Google search terms
We determined the Google search terms related to Salmonella as follows. The search terms related to Salmonella belonged to several categories. The causes of Salmonella infection were infection with foods contaminated with faeces. The related terms linked to possible causes included egg, bacteria, chicken, peanut, peanut butter, butter, cookie dough, cucumber, chicken nuggets, etc. The risk factors for Salmonella infection include social activities to increase the possibility of infection. The related terms of social activities included restaurant, party, hotel, travel, epidemiology, etc. The symptoms of Salmonella infections were considered gastroenteritis. The related terms relevant to symptoms included nausea, stomach flu, gastroenteritis, headache, vomiting, hypoxia, stomach cramps, etc. Salmonella infection might have the same symptoms as other infectious diseases, such as Shigella and E. coli. The related terms included influenza, E. coli symptoms, cholera, Vibrio, etc. Because this is a bacterial infection, antibiotics are useful as a treatment for Salmonella infection. The related terms included ciprofloxacin, ceftriaxone, antibiotics, etc. The flowchart for Google search terms selection was shown in Fig. 1.
In addition, the total search terms in our study are listed in Supplementary Table 1.

Google trends exploration
Google Trends is a useful tool provided by Google to analyse the search query popularity in Google. It is a public website (https://trends.google.com/trends) belonging to Google Inc. Google Trends provides keyword-related data including search volume index and geographical information about search engine users. It can be used for comparative keyword research and to discover event-triggered spikes in keyword search volume. Google Trends also allows the user to compare the relative search volume of searches between two or more terms. The values in Google Trends ranges from 0 to 100, representing search interest in different regions and times. A value of 0 indicates that the search queries are not popular enough for this search term. A value of 50 indicates that the search term is half as popular. A value of 100 indicates that the search term has peak popularity. In this study, we defined the region as "United States", category as "Health", and custom time range as "1/1/2004-12/31/2017" on the Google Trends website.

The salmonellosis data collection
Salmonellosis data of US and states were obtained from the National Outbreak Reporting System (NORS) in the Centers for Disease Control and Prevention (CDC). NORS is a web-based tool to report infectious disease incidences in the United States launched by CDC [11]. NORS included enteric disease outbreaks and nonenteric disease induced by viral, bacterial, parasitic, toxin, chemical, etc. NORS also included non-enteric disease with foodborne and waterborne. It is used by territorial, state, and local health departments in US to report all waterborne and foodborne disease outbreaks and enteric disease outbreaks transmitted by contact with environmental sources, infected persons or animals, or unknown modes of transmission to CDC. We downloaded the data on salmonellosis from NORS. Because NORS has not updated the information on infectious disease outbreaks and illnesses after 2018, we defined the time range from January 2004 to December 2017. The source of the salmonella outbreaks data was downloaded from National Outbreak Reporting System ( N O R S ) d a s h b o a r d ( h t t p s : / / w w w n . c d c . g o v / norsdashboard/). The data of Shigella and E. coli were obtained from NORS, either.

Cross correlation analysis
In this study, we used cross-correlation analysis in the SPSS software (23.0) to measure the similarity between salmonellosis and search terms in Google Trends. A cross-correlation function is used to discover the relationship between two time series variables. One time series variable value might have preceded or followed movement to another time series variable value. Cross-correlation analysis could distinguish whether movement in one time series variable values tends precedes or follows movement in the other time series variable values. With cross-correlation analysis, we tried to determine whether Google search terms were preceding or following salmonellosis.

Multiple regression analysis
After performing cross-correlation analysis, we conducted multiple regression analysis to discover the predictor variables based on other variables. Multiple regression analysis is an extension of simple linear regression. In this study, we evaluated the significant predictor variables with salmonella infection illness. The predictor in our study was the significant predictor variables (P < 0.05) with salmonella infection illness in multiple regression analysis. It also provides the model overall fit and relative contribution of the predictor to the total variance explained. In this study, we combined the correlated search terms together after conducting cross-correlation analysis. We found that the predictor variables depend on the correlated search terms. In addition, we used the scikit-learn (sklearn) package in Python to conduct the machine learning. The data were divided into training dataset (2004-2016) and test dataset (2017). Data from 2004 to 2016 was used to predict salmonella outbreaks in 2017 with sklearn package.

Characteristics of salmonellosis in the USA
In this study, we downloaded data on salmonellosis from NORS in the CDC from January 2004 to December 2017. A total of 2636 salmonellosis and 62,447 salmonellosis were recorded in NORS from January 2004 to December 2017. Of these reported salmonellosis, 8730 were hospitalised, and 101 died. Using these data, we calculated the salmonellosis every month during the period. The monthly change of salmonellosis was shown in Fig. 2.

Google trends search terms that preceded salmonellosis
After conducting cross-correlation analysis, we found that 6 Google Trends search terms showed high activities prior to salmonellosis. The 6 Google search terms included salmon, nontyphoidal, foods, beef, vegetable, and ground beef ( Table 1). Except for the term nontyphoidal, the other 5 terms belonged to food-related search terms. This reflected that contaminated foods could transmit Salmonella bacteria to healthy people. The possible contaminated foods included sea food such as salmon, meat such as beef, and vegetables.
Then, we used multiple regression analysis in these 6 search terms that preceded Salmonella outbreaks to identify the predictor. The results of multiple regression analysis showed that "food" (t = 2.927, P = 0.004) was a predictor of Salmonella outbreak.

Google trends search terms that coincided with salmonellosis
In this study, we found that many Google Trends search terms coincided with salmonellosis. The related search terms included chicken, cucumber, restaurant, hotel, pork, bathroom, water, hypovolemic shock, dehydration, feel chilly, toxins, poor sanitation, bar, bradycardia, transmission, fruits, drinking water, avocado, salad, sandwich, sushi, tuna, cheese, blueberries, coleslaw, and mango ( Table 2). These terms belonged to several aspects related to Salmonella. First, Salmonella can be transmitted by contaminated water and a variety of foods. The related terms included chicken, cucumber, pork, water, drinking water, fruits, avocado, salad, sandwich, sushi, tuna, cheese, blueberries, coleslaw, and mango. Second, Salmonella might spread from people to people. The related terms included restaurant, hotel, bathroom, poor sanitation, bar, transmission. The possible symptoms of Salmonella included feel chilly, bradycardia, dehydration, and hypovolemic shock.

Google trends search terms that followed after salmonellosis
We also discovered Google Trends search terms that followed the outbreaks of salmonellosis. The related search terms included Salmonella, poison, tomato, tomatoes, party, abdominal cramps, Salmonella symptoms, rash, melon, sea food, lettuce, ice cream, bbq, carneasada, watermelon, and chicken salad ( Table 3). The possible contaminated foods included tomato, tomatoes, melon, sea food, lettuce, ice cream, bbq, carneasada, watermelon, and chick salad. The symptoms of Salmonella included Salmonella symptoms, abdominal cramps, and rash.
We also used multiple regression analysis in these 16 Google Trends search terms that followed the outbreaks of salmonellosis to identify the predictor. The results of

Correlation between Google trends search terms and salmonellosis in Massachusetts and California
We tried to use a smaller geographical unit (state) to explore the relation. Massachusetts and California were selected to analyze. Massachusetts is one of the states to have highest population density in USA. California has most cases of salmonellosis from 2004 to 2017. So, we selected Massachusetts and California to explore the relation between salmonellosis and Google Trends. The results of Massachusetts and California were shown in Tables 4 and 5.
In Massachusetts, we found that 6 Google Trends search terms showed prior to salmonellosis. The Google Trends search terms included foods, bar, fruits, seafood, avocado, outbreak. In addition, search term "abdominal  cramps" was coincided with salmonellosis. 9 Google Trends search terms were followed after salmonellosis including dehydration, rash, lettuce, etc. In California, 4 Google search terms were related with salmonellosis including transmission, salmonella symptoms, uveitis, etc.

Correlation between Google trends search terms with
Shigella and E. coli in US Furthermore, we analyzed the similar infectious diseases including Shigella and E. coli in US. We found that several Google Trends search terms were with Shigella and E. coli. The results were shown in Tables 6 and 7.  Finally, we tried to identify the most important Google search terms that correlated with salmonellosis. Multiple regression analysis was conducted. We combined the Google search terms from the previous three sections that preceded, coincided and followed salmonellosis. A total of 48 Google search terms were correlated with salmonellosis in cross-correlation analysis. Through multiple regression analysis, 4 search terms were identified as the most important indicators of salmonellosis outbreaks. These 4 search terms were "hotel" (t = 1.854, P = 0.066), "poor sanitation" (t = 2.895, P = 0.004), "blueberries" (t = 2.441, P = 0.016), and "hypovolemic shock" (t = 2.001, P = 0.047). Interestingly, the results in this section for a total of 48 Google search terms were  almost the same as those for 26 Google Trends search terms that coincided with salmonellosis. These 4 most important indicators referred to different aspects of Salmonella infection. The time trends including salmonellosis and best-suited search terms were showed in Fig. 3. Moreover, we also tried to processed data from 2004 to 2016 to predict salmonella outbreaks in 2017. We used the scikit-learn (sklearn) software in Python to conduct the machine learning. The data were divided into training dataset (2004-2016) and test dataset (2017). We used 4 best-suited search terms (hotel, hypovolemic shock, poor sanitation, blueberries) to conduct the machine learning. The result was shown in Supplementary  Fig. 1.

Discussion
Salmonella is a common food borne pathogen in the USA [12]. Foods are the source of most of Salmonella infections. Contaminated foods and water that look and smell untainted can infect healthy people [13]. In addition, Salmonella can be transmitted from pets to people and from people to people [14,15]. Most Salmonella patients suffer from fever, diarrhoea, stomach cramps, nausea, vomiting, and headache [16]. Infections can be serious for infants and older adults [17,18]. Because Salmonella infection is foodborne, it is also preventable. First, people should wash hands before and after preparing foods [19]. Second, different kinds of foods, such as raw meat, eggs, and seafood, should be separated.
In our study, we discovered the correlation between salmonellosis with Google search terms. We found that a total of 48 Google search terms were associated with salmonellosis. These terms represented different aspects of salmonellosis. The first group of Google search terms are associated with foods and fruits. The search terms in  [20]. The results of multiple regression analysis indicated that "blueberries" was a predictor of salmonellosis outbreaks. Blueberries have a relatively short shelf life compared with other kinds of fruits, such as melon and watermelon [21]. Previous studies found that Salmonella could grow on harvested blueberries at retail display temperatures, while Salmonella did not grow on strawberries under the same conditions [22]. The complicated production chain of blueberries might contribute to Salmonella contamination. Blueberries rely heavily on humans to harvest, which promotes Salmonella transmission. Many reported outbreaks of Salmonella were associated with blueberries [23]. Various methods have been developed to reduce Salmonella in blueberries. These methods include antimicrobial solution with freezing, ozone, and UV light [24,25]. Apart from blueberries, "ice cream" was also a predictor of Salmonella outbreaks. Ice cream, especially homemade creams, contains raw eggs and milk. Raw eggs and milk might be contaminated with Salmonella. A nation-wide outbreak of salmonellosis has been reported to be transmitted from ice cream in the USA [26]. The researchers discovered that one brand of ice cream (Schwan's) was responsible for the nationwide outbreak of Salmonella infection. The FDA also detected Salmonella in ice cream production facilities in the US [27]. To avoid the risk of Salmonella infection, Fig. 3 The trends of salmonellosis and best-suited search terms people should make ice cream with pasteurized egg products or pasteurized shell eggs. Other methods to make safe ice cream are to use cooked egg base or to prepare ice cream without eggs. Apart from food-related Google search terms, we also found that salmonellosis were correlated with public places and activities that might be contaminated with Salmonella. Such places and activities included hotels, bathrooms, bars, parties, and bbq. In these public places and activities, healthy people might be infected from foods and facilities. For example, it was reported that sauces and salsa prepared at hotels in Dallas and Texas were considered vehicles for salmonella outbreaks. The investigation pointed out that hotel food workers infected with Salmonella were linked this salmonellosis outbreak, which affected 617 people from 46 states in the US [28]. In our study, the multiple regression analysis also showed that "hotel" and "poor sanitation" were predictors of salmonellosis. If public places such as hotels and bars have poor sanitation, people can easily be infected with Salmonella, leading to salmonellosis outbreaks.
Finally, we also found that "hypovolemic shock" was a predictor of salmonellosis outbreaks. Hypovolemic shock is a life-threatening symptom of Salmonella infection. In typhoid fever and paratyphoid fever, severe vomiting and diarrhoea cause electrolyte and liquid loss, leading to microenvironment imbalance [29]. Fluid and electrolyte imbalance was related to a decrease in arterial pressure and circulating blood volume. Hypovolemic shock and septic shock would occur in this situation. People with Salmonella infection suffer from severe vomiting and diarrhoea. They might be afraid that this situation would lead to hypovolemic shock. Therefore, the search term "hypovolemic shock" was correlated with salmonellosis.
The results of single state (Massachusetts and California) were not meets expectation very well. We inferred the data collection style of States in NORS dataset might be the reason. In NORS dataset with state, the illnesses were divided into multistate outbreaks and single-state outbreaks. Multistate outbreak might include illnesses from other states. In Massschusetts, we found that Google search terms including bar, fruits, seafood, avocado were prior to salmonellosis. These terms might be the reasons of salmonellosis outbreak in Massachusetts. Dehydration and rash were followed after salmonellosis. These terms might be the delayed symptoms of salmonella infection.
Lastly, we tried to predict 2017 salmonellosis outbreaks based on 2004-2016 dataset with machine learning. However, the results of prediction of future outbreak based on the specific search terms were not meets expectation very well. We inferred some reasons might explain the results. Firstly, salmonella was transmitted by fecal-oral transmission route. Human and animals could be infected with salmonella by contaminated food or water. Salmonellosis outbreaks were not regularly. Secondly, our study only included the dataset of Google Trends. Other data might influence the Salmonellosis outbreaks such as weather, health status, population density were not enrolled in our study.
Our study mainly focused on the effect of Google Trends to monitor the salmonellosis outbreaks. The purpose of Big Data utilization is now shifting toward forecasting from monitoring. So, the directions of the studies in the future should focus on accurate and precise forecasting of infectious disease outbreak.
There have some limitations in this study. Firstly, we used the Internet search engine Google Trends to evaluate the results. Other Internet engines such as Twitter, Facebook have not enrolled in our study. Secondly, we analyzed the data monthly owning to the Google Trends provided the data monthly. The seasonal data has not been evaluated in this study. In the future, the analysis could be adjusted to seasonality if the seasonal data were available. Thirdly, the data of US was analyzed in this study. Other regions worldwide such as European, Asia have not been investigated. Lastly, the prediction of salmonellosis outbreaks has not meets expectation in our study. In the future, the prediction of infectious disease based on the specific search terms could be investigated if the raw data was expanded.

Conclusion
In this study, we discovered the correlation between Google search terms and salmonellosis in the US from 2004 to 2017. We investigated related Google terms with salmonellosis and identified most important indicators of salmonellosis outbreak. We found that Google Trend was a useful method to monitor salmonellosis outbreak. We also validated the Google Trends with Shigella and E. coli. Thus, we considered Google Search could be used to monitor infectious disease.