COVID-Dynamics-Model-Comparison

Introduction

Comparative study of the main techniques used for COVID modeling where the information available is infected curve. The objective is to identify those univariate techniques that produce the best results, analyzing whether the more complex models are really able to provide better predictions.

Since COVID-19 was declared a pandemic, the urgency to obtain accurate predictive methods to help institutions make decisions on measures to apply and the uncertainty surrounding the virus has facilitated the publication and application of different techniques. The motivation of this study is to compare them, in particular compartmental epidemiological models, linear regression models, ARIMA family models and recurrent neural networks.

Data

COVID-19 cases in Spain reported by province daily (see source).

EDA and Data processing

Exploratory Data Analysis was conducted to explore time series patterns (global and local trends, structural changes, seasonalities...), data inconsistencies, outliers, etc.

Data processing:

Aggregate data from province-level to national-level since the point of interest lies on a global level
Remove variables of hospitalized individuals and ICU inpatients (not relevant for this analysis) and rename columns for ease of analysis
Add population (total population of Spain as constant) and recovery cases, required for SIR model study (population=susceptible)
Smooth data by a mean average of 7 periods (days) to remove seasonal fluctuations caused by absence of data during weekends: the series exhibit seasonal fluctuations with period 7 (due to the lack of data communication from the communities on weekends)
Outliers were identified duting summer and Christmas season, but they are inherent to the series
Forecasting horizon set up to 14 days in the future

Modeling

A set of models were fitted and evaluated (MAE, RMSE, MAPE, RMSLE) on different windows with time series cross validation (see walk-forward schema and expanding walk-forward schema). Implementation and mathematical details are well elaborated in each notebook:

Epidemiological models (SIR, SIS)
Trend Extrapolation with polynomial, exponential, logistic or Gompertz curves.
Linear Regression
ARIMA (and SARIMA)
Recurrent Neural Networks (RNN)

Results

All the metrics increase as the time horizon increases for all the models, which is reasonable, since the farther the future point is from the known observations, the greater the uncertainty.
It can be seen that the best model for any metric is the ARIMA(2,1,5). The RNN considered is incapable of correctly capturing the dynamics of the virus, which is manifested by generating predictions that are insufficiently accurate. The SIS model and linear regression follow a similar evolution except for the RMSLE, when the linear regression model increases drastically from time horizon 8 onwards. This may be because the series studied does not verify the hypotheses of the SIS model and is unable to provide parameters with epidemiological significance. Consequently, the model has no epidemiological interpretation but becomes a mere regression adjustment.
Finally, it should be recalled that none of the models studied verifies the initial hypotheses. Therefore, the results could be improved by studying another type of method.
More details

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
analysis		analysis
data		data
modeling		modeling
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-Dynamics-Model-Comparison

Introduction

Data

EDA and Data processing

Modeling

Results

About

Releases

Packages

Languages

License

marialonsogar/COVID-Dynamics-Model-Comparison

Folders and files

Latest commit

History

Repository files navigation

COVID-Dynamics-Model-Comparison

Introduction

Data

EDA and Data processing

Modeling

Results

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages