Accessing iData

This section will show you how you can access iData from R using SDMX and our iData helpers.

SDMX

Organizations such as the OECD, ILO, EUROSTAT, and the African Development Bank (AfDB) utilize SDMX (Statistical Data and Metadata Exchange) to disseminate their data. The IMF’s iData platform has also adopted this international standard, which enables seamless data communication by providing a common format.

SDMX standardizes and streamlines the exchange of complex datasets, such as GDP figures or employment statistics, along with detailed metadata describing definitions, sources, and time periods. By adopting a structured, machine-readable format like XML or JSON, SDMX facilitates automated processes and large-scale data exchanges.

This standardization improves data consistency, enhances usability, and reduces compatibility issues, making SDMX a cornerstone for international collaboration, particularly in environments requiring accurate and comprehensive data sharing.

Using bookr for an imfData wrapper

The rsdmx package allows users to retrieve data in R from any organization using the SDMX standard. However, its broad functionality makes the interface unnecessarily complex for accessing only IMF data and can be challenging for users new to SDMX.

To simplify this, we created custom wrapper functions available in the internal bookr libary, which can alternatively also be loaded using the imf_data_utils.R file. These functions streamline the process, enabling efficient retrieval and analysis of both public and authenticated IMF datasets.

The following sections will guide you through using these tools, so you can focus on data analysis without dealing with the complexities of SDMX queries.

Setting Up Your Environment

You first need to the install the rsdmx package. You cannot install it the usual way. To be able to access imfdata you need to install the latest version from Github. Before you can install it, you first need to load (or install) the devtools package.

#install.packages("devtools")
library(devtools)

You then can install the rsdmx package from Github. Please note the force=TRUE argument, which helps prevent installation problems.

install_github("opensdmx/rsdmx", force = TRUE)

Next, you can load our internal bookr package, amongst the others needed, and the imf_data_utils script that we created to read iData:

library(rsdmx)
library(AzureAuth)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(here)
here() starts at C:/IMF-R-Book
library(bookr)
Bookr package has been loaded.
source(here("utils/theme_and_colors_IMF.R"))  # Custom IMF themes for charts
source(here("utils/imf_data_utils.R")) # Custom functions for iData

Accessing public WEO data

As the next example, we will retrieve unemployment rates and real GDP for the USA and the Netherlands from the WEO database:

weo_data <- imfdata_by_countries_and_series(
  department = "RES",
  dataset = "WEO",
  countries = c("USA", "NLD"),
  series = c("LUR","NGDP_R"),
  frequency = "A"
  
)
str(weo_data)

Accessing public WEO Data with labels

We will demonstrate how to retrieve unemployment rates (LUR) for the USA and the Netherlands from the WEO (World Economic Outlook) database. Using our wrapper function, the data will include descriptive labels for dimensions like countries and series.

# Retrieve labeled WEO data for unemployment rates (LUR)
weo_unemployment_data <- imfdata_by_countries_and_series(
  department = "RES",          # RES: Research Department
  dataset = "WEO",             # WEO: World Economic Outlook database
  countries = c("USA", "NLD"), # USA and Netherlands
  series = c("LUR"),           # Unemployment rates
  frequency = "A",             # Annual data
  needs_labels = TRUE          # Include descriptive labels
)

# Preview the last few rows of the data
tail(weo_unemployment_data)

You will now notice that this output contains the multilingual labels for the country names and descriptive labels for the indicators.

URL Structure Breakdown

In both examples, the rsdmx library calls a URL. Let’s explain how it is structured:

https://api.imf.org/external/sdmx/2.1/data/IMF.RES,WEO/USA+NLD.LUR+NGDP_R.A/all/

  1. Base URL: https://api.imf.org/external/sdmx/2.1/data/

    • This is the base API endpoint provided by the IMF for retrieving SDMX (Statistical Data and Metadata Exchange) data.
  2. Dataset Identifier: IMF.RES,WEO

    • IMF.RES: RES department
    • WEO: WEO database .
  3. Dimension Filters (key) : USA+NLD.LUR+NGDP_R.A

    • This specifies the filters applied to the data request:
      • USA+NLD: Limits the query to the United States (USA) and the Netherlands (NLD).
      • LUR+NGDP_R: Limits the query
      • A: Annual data
  4. Wildcard for Observations: /all/

    • This part specifies that all available observations (time series data) should be retrieved.
    • The all here refers to the providerref, which filters data based on the source organization. For example, EUROSTAT might maintain a single dataflow with contributions from multiple providers, such as Norway’s central bank or statistical office.
    • This parameter is not relevant to IMF data, as the IMF does not use its dataflows for collection.

The first ten columns are from the database. The last two we have added. As you can see TIME_PERIOD is a string. We have converted it from a number to a date. Similar OBS_VALUE is a string. We have converted it to a number in value.

Getting data sets using the key

Let’s go back to the url. Note that we have a key USA+NLD.LUR+NGDP_R.A. We can also fetch the data with this key using the imfdata_by_key function:

weo_data <- imfdata_by_key(
  department = "RES",
  dataset = "WEO",
  key="USA+NLD.LUR+NGDP_R.A"
  
)

We don’t need to build the key by hand. We can construct it from a list.

countries=c("USA","NLD")
vars=c("LUR","NGDP_R")
freq="A"
key=list(countries,vars,freq)

weo_data <- imfdata_by_key(
  department = "RES",
  dataset = "WEO",
  key=key
  
)

For most datasets, the key consists of three parts: the countries, the variables and the frequency. For those sets, it is easiest to use the imfdata_by_countries_and_series function.

There are a few datasets that have more dimensions. For example, the CPI database has five dimensions.

How to find the key for your data

It is best to find the number of dimensions and possible values using the iData add-in in Excel. You can see that the CPI dataset has five dimensions: countries, index type, expenditure category, type of transformation and frequency

Once you have downloaded the data in Excel you can see how the key is constructed in the column SERIES_CODE:

You can use this information to build the key. If you want to get all possible values you use NULL

# CPI with some dimensions omitted
countries = c("BEL", "USA")
index = NULL
category = c("CP04", "CP02")
transform = NULL
frequency = c("A")

key = list(countries, index, category, transform, frequency)


CPI_data <- imfdata_by_key(department = "STA",
                        dataset = "CPI",
                        key = key)

str(CPI_data)

Alternatively, instead of downloading the dataset and examining it in Excel, you can programmatically retrieve the available values for specific facets like countries or series. This is especially useful for building dynamic queries.

For example, we can retrieve all available countries for the WEO dataset programmatically:

# Retrieve all available countries in the WEO dataset
countries = NULL  # Use NULL to fetch all available countries
series = c("LUR", "NGDP_R")  # Specify a series for filtering
frequency = c("A")           # Specify frequency (Annual)

# Construct the key
key = list(countries, series, frequency)

# Fetch data with the constructed key
available_countries <- imfdata_by_key(department = "RES", dataset = "WEO", key = key)

# Extract unique country codes
unique_countries <- unique(available_countries$COUNTRY)

# Print the unique country codes
print(unique_countries)

This will output a list of all countries available in the WEO dataset. Once you retrieve the codes, you can use them to construct queries tailored to your analysis. For example, you can filter specific countries or include all countries in your dataset.

Restricted datasets

If you want to access a restricted dataset you need to add the needs_auth=T argument to the function. You will be asked to login (or if you done that in the past hour, it will get a token from cache).

weo_data <- imfdata_by_countries_and_series(
  department = "RES",
  dataset = "WEO_LIVE",
  countries = c("USA", "NLD"),
  series = c("LUR","NGDP_R"),
  frequency = "A",
  needs_auth = T
)

Available datasets

The easiest way to find out the available datasets is to look at the IMF Data add-in in Excel. For each dataset it will show you the available sets. Note that if you are logged in, you will see more datasets than if you are not logged in.

For each dataset, you can also see the dimensions. To find the the precise code, its is best to select your variable in Excel and then look at the generated keys.

For example, when I selected the monthly forecast for the one-year ahead unemployment rate I get the following:

Let’s retrieve this in R. We take all countries and use the code:

countries=NULL

code="UPRATE.1YF_M.MEAN._Z.M"
key=list(countries,code)

cf_data<-imfdata_by_key(department = "CSF",dataset = 'CF',needs_auth = T,
                       key=key)
str(cf_data)

Errors

If you get an error it is likely that you have either misspecified the dataset or have an incorrect key. For example, if you use dataset WEO2 instead of WEO you will get this error:

If you find columns like `TIME_PERIOD` or `OBS_VALUE` missing in your retrieved dataset you likely have misspecified the key.

Utilizing rsdmx, you can efficiently access and analyze SDMX data, including IMF data.