Differences and Lags in Time Series

Time series analysis often involves calculating differences and lags to understand the dynamics of the data. In R, we can use the diff.xts and lag.xts functions from the xts package for these purposes. However, it’s important to understand how these functions depend on the frequency of the data.

library(xts)
library(here)

euro <- readRDS(file = here("databases/euro.rds"))

Differences and Lags with diff.xts and lag.xts

Calculating Differences

The diff.xts function calculates the differences between successive elements in a time series. For example, the 2-period difference is calculated as follows:

euro$d2xus <- diff.xts(euro$XUS, 2)

Calculating Lags

The lag.xts function shifts the time series by a specified number of periods. For example, the 2-period lag is calculated as follows:

euro$l2xus <- lag.xts(euro$XUS, 2)

Problems with Daily Data

With daily data, a common issue arises due to the way R calculates lags. The lag.xts function calculates the lag based on the number of observations, not calendar days. This can lead to incorrect lags when there are missing days in the data (e.g., weekends and holidays).

Let’s illustrate this with an example:

# View data around the end of the year
euro["2022-12-27/", c("XUS", "l2xus")]
              XUS  l2xus
2022-12-27 1.0638 1.0614
2022-12-28 1.0608 1.0635
2022-12-29 1.0661 1.0638
2022-12-30 1.0702 1.0608
2023-01-02 1.0662 1.0661
2023-01-03 1.0546 1.0702
2023-01-04 1.0599 1.0662
2023-01-05 1.0520 1.0546
2023-01-06 1.0644 1.0599

In this case, the lagged value for January 2 is taken from December 29, not from December 31, which is not in the series.

Fixing the Problem

To fix this problem, we need to ensure that the time series accounts for all calendar days, even if some days are missing from the original data. Here’s how we can do that:

Step 1: Create a Sequence of All Dates

First, we create a sequence of all dates from the beginning to the end of the dataset. We also re-read the euro dataset:

euro <- readRDS(file = here("databases/euro.rds"))
new_dates <- seq(from = start(euro), 
                 to = end(euro), 
                 by = "day")

Step 2: Merge the Sequence with the Original Data

Next, we merge this sequence with our euro dataset to include all calendar days:

euro <- merge(euro, new_dates)

euro["2022-12-27/", c("XUS")]
              XUS
2022-12-27 1.0638
2022-12-28 1.0608
2022-12-29 1.0661
2022-12-30 1.0702
2022-12-31     NA
2023-01-01     NA
2023-01-02 1.0662
2023-01-03 1.0546
2023-01-04 1.0599
2023-01-05 1.0520
2023-01-06 1.0644

Step 3: Recalculate the Lag

Now, we recalculate the lag with the complete sequence of dates:

euro$l2xus <- lag.xts(euro$XUS, 2)

Step 4: Verify the Fix

Finally, we verify that the lag is now calculated correctly:

euro["2022-12-27/", c("XUS", "l2xus")]
              XUS  l2xus
2022-12-27 1.0638     NA
2022-12-28 1.0608 1.0635
2022-12-29 1.0661 1.0638
2022-12-30 1.0702 1.0608
2022-12-31     NA 1.0661
2023-01-01     NA 1.0702
2023-01-02 1.0662     NA
2023-01-03 1.0546     NA
2023-01-04 1.0599 1.0662
2023-01-05 1.0520 1.0546
2023-01-06 1.0644 1.0599

By including all calendar days in the series, the lag is correctly calculated based on calendar days rather than observations.

This tutorial has shown how to handle the calculation of differences and lags in time series data, especially when dealing with daily data. By creating a complete sequence of dates and merging it with the original data, we ensure that functions like lag.xts calculate lags correctly based on calendar days.