- Nelson Tang
Getting started using the
forecast package for time series data in R, as quickly as possible and no explanations.
Coerce your data to
library(tidyverse) library(forecast) myts <- ts(df, start = c(1981,1), frequency = 12)
Exploring and Plotting
autoplot(): Useful function to plot data and forecasts
ggseasonplot(): Create a seasonal plot
ggsubseriesplot(): Create mini plots for each season and show seasonal means
Lags and ACF
gglagplot(): Plot the time series against lags of itself
ggAcf(): Plot the autocorrelation function (ACF)
White Noise and the Ljung-Box Test
White Noise is another name for a time series of iid data. Purely random. Ideally your model residuals should look like white noise.
You can use the Ljung-Box test to check if a time series is white noise, here's an example with 24 lags:
Box.test(data, lag = 24, type="Lj")
p-value > 0.05 suggests data are not significantly different than white noise
forecast package includes a few common models out of the box. Fit the model and create a
forecast object, and then use the
forecast() function on the object and a number of
h periods to predict.
Example of the workflow:
train <- window(data, start = 1980) fit <- naive(train) checkresiduals(fit) pred <- forecast(fit, h=4) accuracy(pred, data)
Useful to benchmark against naive and seasonal naive models.
Residuals are the difference between the model's fitted values and the actual data. Residuals should look like white noise and be:
- Have mean zero
And ideally have:
- Constant variance
- A normal distribution
checkresiduals(): helper function to plot the residuals, plot the ACF and histogram, and do a Ljung-Box test on the residuals.
Evaluating Model Accuracy
Train/Test split with window function:
window(data, start, end): to slice the
accuracy() on the model and test set
accuracy(model, testset): Provides accuracy measures like MAE, MSE, MAPE, RMSE etc
Backtesting with one step ahead forecasts, aka "Time series cross validation" can be done with a helper function
tsCV(): returns forecast errors given a
forecastfunction that returns a
forecast object and number of steps ahead
h = 1 the forecast errors will just be the model residuals.
Here's an example using the
naive() model, forecasting one period ahead:
tsCV(data, forecastfunction = naive, h = 1)
ses(): Simple Exponential Smoothing, implement a smoothing parameter alpha on previous data
holt(): Holt's linear trend, SES + trend parameter. Use
damped=TRUE for damped trending
hw(): Holt-Winters method, incorporates linear trend and seasonality. Set
seasonal="additive" for additive version or "multiplicative" for multiplicative version
forecast package includes a function
ets() for your exponential smoothing models.
ets() estimates parameters using the likelihood of the data arising from the model, and selects the best model using corrected AIC (AICc)
- Error =
- Trend =
- Seasonal =
May need to transform the data if it is non-stationary to improve your model prediction. To deal with non-constant variance, you can use a Box-Cox transformation.
BoxCox(): Box-Cox uses a
lambda parameter between -1 and 1 to stabilize the variance. A
lambda of 0 performs a natural log, 1/3 does a cube root, etc while 1 does nothing and -1 performs an inverse transformation.
Differencing is another transformation that uses differences between observations to model changes rather than the observations themselves.
|p||# of autoregression lags|
|d||# of lag-1 differences|
|q||# of Moving Average lags|
|P||# of seasonal AR lags|
|D||# of seasonal differences|
|Q||# of seasonal MA lags|
|m||# of observations per year|
Arima(): Implementation of the ARIMA function, set
include.constant = TRUE to include drift aka the constant
auto.arima(): Automatic implentation of the ARIMA function in
forecast. Estimates parameters using maximum likelihood and does a stepwise search between a subset of all possible models. Can take a
lambda argument to fit the model to transformed data and the forecasts will be back-transformed onto the original scale. Turn
stepwise = FALSE to consider more models at the expense of more time.
Regression model with non-seasonal ARIMA errors, i.e. we allow e_t to be an ARIMA process rather than white noise.
fit <- auto.arima(data, xreg = xreg_data) pred <- forecast(fit, xreg = newxreg_data)
Dynamic Harmonic Regression
Dynamic Regression with
K fourier terms to model seasonality. With higher
K the model becomes more flexible.
Pro: Allows for any length seasonality, but assumes seasonal pattern is unchanging.
auto.arima() may run out of memory at large seasonal periods (i.e. >200).
# Example with K = 1 and predict 4 periods in the future fit <- auto.arima(data, xreg = fourier(data, K = 1), seasonal = FALSE, lambda = 0) pred <- forecast(fit, xreg = fourier(data, K = 1, h = 4))
Automated model that combines exponential smoothing, Box-Cox transformations, and Fourier terms. Pro: Automated, allows for complex seasonality that changes over time. Cons: Slow.
- T: Trigonemtric terms for seasonality
- B: Box-Cox transformations for heterogeneity
- A: ARMA errors for short term dynamics
- T: Trend (possibly damped)
- S: Seasonal (including multiple and non-integer periods)