Getting started using the `forecast`

package for time series data in R, as quickly as possible and no explanations.

Source: Forecasting: Principles and Practice

# Data Prep

Coerce your data to `ts`

format:

```
library(tidyverse)
library(forecast)
<- ts(df, start = c(1981,1), frequency = 12) myts
```

# Exploring and Plotting `ts`

Data

`autoplot()`

: Useful function to plot data and forecasts

## Seasonality

`ggseasonplot()`

: Create a seasonal plot`ggsubseriesplot()`

: Create mini plots for each season and show seasonal means

## Lags and ACF

`gglagplot()`

: Plot the time series against lags of itself`ggAcf()`

: Plot the autocorrelation function (ACF)

## White Noise and the Ljung-Box Test

White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.

You can use the Ljung-Box test to check if a time series is white noise, here’s an example with 24 lags:

`Box.test(data, lag = 24, type="Lj")`

p-value > 0.05 suggests data are not significantly different than white noise

# Model Selection

The `forecast`

package includes a few common models out of the box. Fit the model and create a `forecast`

object, and then use the `forecast()`

function on the object and a number of `h`

periods to predict.

Example of the workflow:

```
<- window(data, start = 1980)
train <- naive(train)
fit checkresiduals(fit)
<- forecast(fit, h=4)
pred accuracy(pred, data)
```

## Naive Models

Useful to benchmark against naive and seasonal naive models.

`naive()`

`snaive()`

## Residuals

Residuals are the difference between the model’s fitted values and the actual data. Residuals should look like white noise and be:

- Uncorrelated
- Have mean zero

And ideally have:

- Constant variance
- A normal distribution

`checkresiduals()`

: helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.

## Evaluating Model Accuracy

Train/Test split with window function:

`window(data, start, end)`

: to slice the `ts`

data

Use `accuracy()`

on the model and test set

`accuracy(model, testset)`

: Provides accuracy measures like MAE, MSE, MAPE, RMSE etc

Backtesting with one step ahead forecasts, aka “Time series cross validation” can be done with a helper function `tsCV()`

.

`tsCV()`

: returns forecast errors given a `forecastfunction`

that returns a `forecast`

object and number of steps ahead `h`

. At `h`

= 1 the forecast errors will just be the model residuals.

Here’s an example using the `naive()`

model, forecasting one period ahead:

`tsCV(data, forecastfunction = naive, h = 1)`

# Many Models

## Exponential Models

`ses()`

: Simple Exponential Smoothing, implement a smoothing parameter alpha on previous data`holt()`

: Holt’s linear trend, SES + trend parameter. Use`damped`

=TRUE for damped trending`hw()`

: Holt-Winters method, incorporates linear trend and seasonality. Set`seasonal`

=“additive” for additive version or “multiplicative” for multiplicative version

### ETS Models

The `forecast`

package includes a function `ets()`

for your exponential smoothing models. `ets()`

estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc)

- Error = \(\{A, M\}\)
- Trend = \(\{N, A, Ad\}\)
- Seasonal = \(\{N, A, M\}\)

## Transformations

May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a **Box-Cox** transformation.

`BoxCox()`

: Box-Cox uses a `lambda`

parameter between -1 and 1 to stabilize the variance. A `lambda`

of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.

**Differencing** is another transformation that uses differences between observations to model changes rather than the observations themselves.

## ARIMA

**Parameters**: \((p,d,q)(P,D,Q)m\)

Parameter | Description |
---|---|

p | # of autoregression lags |

d | # of lag-1 differences |

q | # of Moving Average lags |

P | # of seasonal AR lags |

D | # of seasonal differences |

Q | # of seasonal MA lags |

m | # of observations per year |

`Arima()`

: Implementation of the ARIMA function, set `include.constant`

= TRUE to include drift aka the constant

`auto.arima()`

: Automatic implentation of the ARIMA function in `forecast`

. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a `lambda`

argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn `stepwise`

= FALSE to consider more models at the expense of more time.

## Dynamic Regression

Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.

Usage example:

```
<- auto.arima(data, xreg = xreg_data)
fit <- forecast(fit, xreg = newxreg_data) pred
```

## Dynamic Harmonic Regression

Dynamic Regression with `K`

fourier terms to model seasonality. With higher `K`

the model becomes more flexible.

Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging. `Arima()`

and `auto.arima()`

may run out of memory at large seasonal periods (i.e. >200).

```
# Example with K = 1 and predict 4 periods in the future
<- auto.arima(data, xreg = fourier(data, K = 1),
fit seasonal = FALSE, lambda = 0)
<- forecast(fit, xreg = fourier(data, K = 1, h = 4)) pred
```

## TBATS

Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time. Cons: Slow.

- T: Trigonemtric terms for seasonality
- B: Box-Cox transformations for heterogeneity
- A: ARMA errors for short term dynamics
- T: Trend (possibly damped)
- S: Seasonal (including multiple and non-integer periods)