The Google search engine continues to demonstrate its use as a tool for predicting human behaviour and activity based on the frequency of topics searched by its users. Google data has been used to track flu activity and even fertility behaviour. A Banca D’Italia November 2012 working paper by Francesco D’Amuri and Juri Marcucci is the latest installment on using the predictive power of Google searches and applies it to forecast US unemployment. As the authors note, such an approach has already been examined for unemployment in Germany, Italy and Israel.
D’Amuri and Marcucci construct a Google Index(GI) of internet job-search intensity and use it as a leading indicator to predict the US monthly unemployment rate. Indeed, they argue it is the best leading indicator for predicting this variable when comparing it with the use of Initial Claims which is a widely accepted leading indicator for the US unemployment rate. This they do by estimating standard time series (ARMA) models of monthly unemployment that are grouped into models that include or exclude the GI over assorted time spans. Generally, the Google-based models perform with lower MSEs. They conclude:” Notwithstanding its limited time availability (Google data are available since January 2004) we believe that the GI should routinely be included in time series models to predict unemployment dynamics. We fully expect that the use of internet-based data will become widespread in economic research in the near future.” Of course, the rather short time range for which a Google Index (GI) can be constructed for is an issue given that the Initial Claims (IC) data are available going back to 1967.
The construction of the Google Index (GI) in this case is focused on the use of the keyword “jobs”. According to the authors: “First, we found that the keyword “jobs” was the most popular among different job-search-related keywords… the second reason why we chose the keyword “jobs” is that we believe that it is widely used across the broadest range of job seekers, and as a consequence is less sensitive to the presence of demand or supply shocks specific to subgroups of workers that could bias the values of the GI”. Combinations of terms such as “public jobs” are also used. In an attempt to improve precision, authors even subtract the keyword searches for “Steve Jobs” which would undoubtedly be quite important especially given the time range for the GI is post 2004.
Given the prediction made by the authors that the use of internet-based data will become more widespread in economic research in the future, one cannot help but wonder if indicators constructed by such techniques will not be more susceptible to “manipulation.” Frances Woolley on Twitter recently drew my attention to the “gaming” of Google Scholar citations. Would it not be possible for indicators of unemployment or retail activity to be “gamed” by having computer programs written and set loose on the internet that input certain search terms on a massive scale in order to create “trends” that can be exploited for marketing or stock market purposes? Do we want to tie changes in money supply and interest rates to expansions or contractions in economic activity picked up by Google searching if we are not sure where the source of the activity is coming from? What safeguards can be put in place to filter out any such effects? I’m not saying older or traditional economic indicators are the best – all data sources have their issues. However, I think we need to anticipate what the issues of new indicators might be especially in the case of internet-based indicators.