This tutorial will guide you through how to integrate DMXe capabilities within R, including adding data to a DMXe file, metadata management, reading data, and series operations. We’ll use IMF Datatools that we set up in the previous section.
Pre-requisites
In the previous section, we installed Python and IMF Datatools, and then loaded the reticulate package to help bridge the communication between R and Python. We also imported the sys module for IMF Datatools library. Please double check that these steps are complete before continuing with this section. They are mentioned below for ease of reference.
library(reticulate)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
There are multiple ways to add data to a DMXe file. We will go over how to manually do it via a data frame, or importing directly from another data source like Haver. Let’s first manually create a data frame and add a column for indexing for example quarterly GDP data.
# Manually created data frame with quarterly data for 2020-2023df_manual <-data.frame(GDP =c(100, 101, 102, 103, 105, 106, 107, 108, 110, 111, 112, 113, 115, 116, 117, 118),dates =c("2020Q1", "2020Q2", "2020Q3", "2020Q4","2021Q1", "2021Q2", "2021Q3", "2021Q4","2022Q1", "2022Q2", "2022Q3", "2022Q4","2023Q1", "2023Q2", "2023Q3", "2023Q4"))
Let’s also download the data directly using IMF Datatools from Haver for USD GDP.
# Download Haver data for GDP for the USAdf_haver <- imf_datatools$get_haver_data('GDP@USECON')
We want to make sure all the data in our DMXe file is in the same format to be able to combine it. Let’s convert our Haver data to a data frame and ensure both of our series are in quarterly format. Let’s also filter our Haver data to represent the same quarters as our manual data.
# Convert Haver data to a data frame and ensure quarterly formatdf_haver <-as.data.frame(df_haver)df_haver$Date <-as.Date(rownames(df_haver))# Format Haver dates as quartersdf_haver$Quarter <-paste0(format(df_haver$Date, "%Y"), "Q", ((as.numeric(format(df_haver$Date, "%m")) -1) %/%3) +1)# Filter Haver data for relevant quartersdf_haver_filtered <-subset(df_haver, Quarter %in% df_manual$dates)df_haver_filtered <- df_haver_filtered[, c("Quarter", "GDP@USECON")]colnames(df_haver_filtered) <-c("Date", "GDP_Haver")
Now that both series are in the same format, let’s combine the two data frames.
# Combine the data framesdf_combined <-merge(df_manual, df_haver_filtered, by.x ="dates", by.y ="Date", all =TRUE)colnames(df_combined) <-c("Date", "GDP_Manual", "GDP_Haver")print(df_combined)
Now that we have our series formatted correctly, we can save to a new DMXe file.
When saving data in DMXe, it helps to specify the date format explicitly. Let’s ensure the dates are correctly formatted in YYYY-MM-DD before saving.
# Define output filenameoutfilename <-'test.dmxe'# Ensure Date column is in the correct format for quarterly datadf_combined$Date <-as.Date(sapply(df_combined$Date, function(x) {as.Date(ifelse(grepl("Q1", x), paste0(substr(x, 1, 4), "-01-01"),ifelse(grepl("Q2", x), paste0(substr(x, 1, 4), "-04-01"),ifelse(grepl("Q3", x), paste0(substr(x, 1, 4), "-07-01"),paste0(substr(x, 1, 4), "-10-01")))))}))
Now save each series to the defined output DMXe file. Your console will return a 0 if the operation is successful.
# Save each series to the DMXe file, specifying the date column and frequency as quarterlywriter$save_dmxe_data(outfilename, df_combined[, c("Date", "GDP_Manual")], datecol ="Date", freq ="Q")
After saving data, you can add or update metadata to describe the series further. Here’s how you would update the descriptor metadata.
# Define metadatameta <-list("Descriptor"="GDP series for testing")# Update metadata for the GDP_Manual serieswriter$save_dmxe_metadata(outfilename, "GDP_Manual", meta)
0
Retrieving Data from DMXe File
To verify the saved data, retrieve it by specifying the series name from the DMXe file.
# Read data back from DMXe filedf_retrieved_manual <- writer$read_dmxe_data(outfilename, "GDP_Manual")df_retrieved_haver <- writer$read_dmxe_data(outfilename, "GDP_Haver")print(df_retrieved_manual)
To pull multiple series at once, you can define multiple series as a vector and then retrieve them all in a loop.
Perform Calculations on DMXe Data
We can also use DMXe’s built-in calculation functions, such as tsPCH, to calculate the year-over-year percentage change of the data. Let’s do that for our data.
# Calculate Year-over-Year Percentage Change for the GDP_Haver seriesdf_yoy <- writer$read_dmxe_data(outfilename, "calc(tsPCH(GDP_Haver, 4, 1))")print(df_yoy)
You are now equipped with the basics to working with DMXe in R: adding and saving data with consistent formatting, updating metadata, reading and manipulating data, and performing calculations.