ePrints@IIScePrints@IISc Home | About | Browse | Latest Additions | Advanced Search | Contact | Help

Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data

Rangarajan, Prashant and Mody, Sandeep K and Marathe, Madhav (2019) Forecasting dengue and influenza incidences using a sparse representation of Google trends, electronic health records, and time series data. In: PLOS COMPUTATIONAL BIOLOGY, 15 (11).

[img]
Preview
PDF
plo_com_bio_15-11_2019.pdf - Published Version

Download (1MB) | Preview
[img]
Preview
PDF
pcbi.1007518.s001.pdf - Published Supplemental Material

Download (190kB) | Preview
Official URL: http://dx.doi.org/10.1371/journal.pcbi.1007518

Abstract

Author summary Dengue and influenza-like illness (ILI) are leading causes of viral infection in the world and hence it is important to develop accurate methods for forecasting their incidence. We use Autoregressive Likelihood Ratio method, which is a computationally efficient implementation of the variable selection method, in order to obtain a sparse (non-lasso) representation of time series, Google Trends and electronic health records (for ILI) data. This method is used to forecast dengue incidence in five countries/states and ILI incidence in USA. We show that this method outperforms existing time series methods in forecasting these diseases. The method is general and can also be used to forecast other diseases. Dengue and influenza-like illness (ILI) are two of the leading causes of viral infection in the world and it is estimated that more than half the world's population is at risk for developing these infections. It is therefore important to develop accurate methods for forecasting dengue and ILI incidences. Since data from multiple sources (such as dengue and ILI case counts, electronic health records and frequency of multiple internet search terms from Google Trends) can improve forecasts, standard time series analysis methods are inadequate to estimate all the parameter values from the limited amount of data available if we use multiple sources. In this paper, we use a computationally efficient implementation of the known variable selection method that we call the Autoregressive Likelihood Ratio (ARLR) method. This method combines sparse representation of time series data, electronic health records data (for ILI) and Google Trends data to forecast dengue and ILI incidences. This sparse representation method uses an algorithm that maximizes an appropriate likelihood ratio at every step. Using numerical experiments, we demonstrate that our method recovers the underlying sparse model much more accurately than the lasso method. We apply our method to dengue case count data from five countries/states: Brazil, Mexico, Singapore, Taiwan, and Thailand and to ILI case count data from the United States. Numerical experiments show that our method outperforms existing time series forecasting methods in forecasting the dengue and ILI case counts. In particular, our method gives a 18 percent forecast error reduction over a leading method that also uses data from multiple sources. It also performs better than other methods in predicting the peak value of the case count and the peak time.

Item Type: Journal Article
Publication: PLOS COMPUTATIONAL BIOLOGY
Publisher: PUBLIC LIBRARY SCIENCE
Additional Information: Copyright of this article belongs to PUBLIC LIBRARY SCIENCE
Keywords: SEASONAL INFLUENZA; INFECTIONS; PREDICTION; EPIDEMICS; DYNAMICS; BURDEN
Department/Centre: Division of Physical & Mathematical Sciences > Mathematics
Date Deposited: 24 Dec 2019 07:17
Last Modified: 24 Dec 2019 07:17
URI: http://eprints.iisc.ac.in/id/eprint/64212

Actions (login required)

View Item View Item