Predicting COVID-19 cases using Reddit posts and other online resources

Abstract

This paper evaluates the ability to predict COVID-19 caseloads in local areas using the text of geographically specific subreddits, in conjunction with other features. The problem is constructed as a binary classification task on whether the caseload change exceeds a threshold or not. We find that including Reddit features, alongside other informative resources, improves the models’ performance in predicting COVID-19 cases. On top of this, we show that exclusive use of Reddit features can act as a strong alternative data source for predicting a short-term rise in caseload due to its strong performance and the fact that it is readily available and updates instantaneously.

Publication
In SwissText 2021
Felix Drinkall
Felix Drinkall
Oxford PhD Student and ex-GB Athlete

My main research interest is the intersection between Natural Language Processing and Time Series Forecasting.