Using DMXe in R

This tutorial will guide you through how to integrate DMXe capabilities within R, including adding data to a DMXe file, metadata management, reading data, and series operations. We’ll use IMF Datatools that we set up in the previous section.

Pre-requisites

In the previous section, we installed Python and IMF Datatools, and then loaded the reticulate package to help bridge the communication between R and Python. We also imported the sys module for IMF Datatools library. Please double check that these steps are complete before continuing with this section. They are mentioned below for ease of reference.

library(reticulate)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(purrr)  
sys <- import("sys")
imf_datatools  <-  import("imf_datatools")

We also need to set up our DMXe utility functions. We must import the DMXe writer utilities from IMF Datatools.

writer <- import("imf_datatools.dmxe_writer_utilities", convert = FALSE)

DMXe Operations

Getting Data for your DMXe file

There are multiple ways to add data to a DMXe file. We will go over how to manually do it via a data frame, or importing directly from another data source like Haver. Let’s first manually create a data frame and add a column for indexing for example quarterly GDP data.

# Manually created data frame with quarterly data for 2020-2023
df_manual <- data.frame(
  GDP = c(100, 101, 102, 103, 105, 106, 107, 108, 110, 111, 112, 113, 115, 116, 117, 118),
  dates = c("2020Q1", "2020Q2", "2020Q3", "2020Q4",
            "2021Q1", "2021Q2", "2021Q3", "2021Q4",
            "2022Q1", "2022Q2", "2022Q3", "2022Q4",
            "2023Q1", "2023Q2", "2023Q3", "2023Q4")
)

Let’s also download the data directly using IMF Datatools from Haver for USD GDP.

# Download Haver data for GDP for the USA
df_haver <- imf_datatools$get_haver_data('GDP@USECON')

We want to make sure all the data in our DMXe file is in the same format to be able to combine it. Let’s convert our Haver data to a data frame and ensure both of our series are in quarterly format. Let’s also filter our Haver data to represent the same quarters as our manual data.

# Convert Haver data to a data frame and ensure quarterly format
df_haver <- as.data.frame(df_haver)
df_haver$Date <- as.Date(rownames(df_haver))

# Format Haver dates as quarters
df_haver$Quarter <- paste0(format(df_haver$Date, "%Y"), 
                           "Q", 
                           ((as.numeric(format(df_haver$Date, "%m")) - 1) %/% 3) + 1)

# Filter Haver data for relevant quarters
df_haver_filtered <- subset(df_haver, Quarter %in% df_manual$dates)
df_haver_filtered <- df_haver_filtered[, c("Quarter", "GDP@USECON")]
colnames(df_haver_filtered) <- c("Date", "GDP_Haver")

Now that both series are in the same format, let’s combine the two data frames.

# Combine the data frames
df_combined <- merge(df_manual, df_haver_filtered, by.x = "dates", by.y = "Date", all = TRUE)
colnames(df_combined) <- c("Date", "GDP_Manual", "GDP_Haver")
print(df_combined)
     Date GDP_Manual GDP_Haver
1  2020Q1        100   19935.4
2  2020Q2        101   21684.6
3  2020Q3        102   22068.8
4  2020Q4        103   22656.8
5  2021Q1        105   23368.9
6  2021Q2        106   23922.0
7  2021Q3        107   24777.0
8  2021Q4        108   25215.5
9  2022Q1        110   25805.8
10 2022Q2        111   26272.0
11 2022Q3        112   26734.3
12 2022Q4        113   27164.4
13 2023Q1        115   27453.8
14 2023Q2        116   27967.7
15 2023Q3        117   28297.0
16 2023Q4        118   28624.1

Saving to DMXe

Now that we have our series formatted correctly, we can save to a new DMXe file.

When saving data in DMXe, it helps to specify the date format explicitly. Let’s ensure the dates are correctly formatted in YYYY-MM-DD before saving.

# Define output filename
outfilename <- 'test.dmxe'

# Ensure Date column is in the correct format for quarterly data
df_combined$Date <- as.Date(sapply(df_combined$Date, function(x) {
  as.Date(ifelse(grepl("Q1", x), paste0(substr(x, 1, 4), "-01-01"),
                 ifelse(grepl("Q2", x), paste0(substr(x, 1, 4), "-04-01"),
                        ifelse(grepl("Q3", x), paste0(substr(x, 1, 4), "-07-01"),
                               paste0(substr(x, 1, 4), "-10-01")))))
}))

Now save each series to the defined output DMXe file. Your console will return a 0 if the operation is successful.

# Save each series to the DMXe file, specifying the date column and frequency as quarterly
writer$save_dmxe_data(outfilename, df_combined[, c("Date", "GDP_Manual")], datecol = "Date", freq = "Q")
0
writer$save_dmxe_data(outfilename, df_combined[, c("Date", "GDP_Haver")], datecol = "Date", freq = "Q")
0

Updating metadata for a series

After saving data, you can add or update metadata to describe the series further. Here’s how you would update the descriptor metadata.

# Define metadata
meta <- list("Descriptor" = "GDP series for testing")

# Update metadata for the GDP_Manual series
writer$save_dmxe_metadata(outfilename, "GDP_Manual", meta)
0

Retrieving Data from DMXe File

To verify the saved data, retrieve it by specifying the series name from the DMXe file.

# Read data back from DMXe file
df_retrieved_manual <- writer$read_dmxe_data(outfilename, "GDP_Manual")
df_retrieved_haver <- writer$read_dmxe_data(outfilename, "GDP_Haver")
print(df_retrieved_manual)
            GDP_Manual
dates                 
2020-01-01       100.0
2020-04-01       101.0
2020-07-01       102.0
2020-10-01       103.0
2021-01-01       105.0
2021-04-01       106.0
2021-07-01       107.0
2021-10-01       108.0
2022-01-01       110.0
2022-04-01       111.0
2022-07-01       112.0
2022-10-01       113.0
2023-01-01       115.0
2023-04-01       116.0
2023-07-01       117.0
2023-10-01       118.0
print(df_retrieved_haver)
            GDP_Haver
dates                
2020-01-01    19935.4
2020-04-01    21684.6
2020-07-01    22068.8
2020-10-01    22656.8
2021-01-01    23368.9
2021-04-01    23922.0
2021-07-01    24777.0
2021-10-01    25215.5
2022-01-01    25805.8
2022-04-01    26272.0
2022-07-01    26734.3
2022-10-01    27164.4
2023-01-01    27453.8
2023-04-01    27967.7
2023-07-01    28297.0
2023-10-01    28624.1

To pull multiple series at once, you can define multiple series as a vector and then retrieve them all in a loop.

Perform Calculations on DMXe Data

We can also use DMXe’s built-in calculation functions, such as tsPCH, to calculate the year-over-year percentage change of the data. Let’s do that for our data.

# Calculate Year-over-Year Percentage Change for the GDP_Haver series
df_yoy <- writer$read_dmxe_data(outfilename, "calc(tsPCH(GDP_Haver, 4, 1))")
print(df_yoy)
            calc(tsPCH(GDP_Haver, 4, 1))
dates                                   
2021-01-01                     17.223131
2021-04-01                     10.317921
2021-07-01                     12.271623
2021-10-01                     11.293298
2022-01-01                     10.427962
2022-04-01                      9.823593
2022-07-01                      7.899665
2022-10-01                      7.728976
2023-01-01                      6.386161
2023-04-01                      6.454400
2023-07-01                      5.845300
2023-10-01                      5.373577

You are now equipped with the basics to working with DMXe in R: adding and saving data with consistent formatting, updating metadata, reading and manipulating data, and performing calculations.