#install.packages("devtools")
library(devtools)
Accessing iData
This section will show you how you can access iData from R using SDMX and our iData helpers.
SDMX
Organizations such as the OECD, ILO, EUROSTAT, and the African Development Bank (AfDB) utilize SDMX (Statistical Data and Metadata Exchange) to disseminate their data. The IMF’s iData platform has also adopted this international standard, which enables seamless data communication by providing a common format.
SDMX standardizes and streamlines the exchange of complex datasets, such as GDP figures or employment statistics, along with detailed metadata describing definitions, sources, and time periods. By adopting a structured, machine-readable format like XML or JSON, SDMX facilitates automated processes and large-scale data exchanges.
This standardization improves data consistency, enhances usability, and reduces compatibility issues, making SDMX a cornerstone for international collaboration, particularly in environments requiring accurate and comprehensive data sharing.
Using bookr for an imfData wrapper
The rsdmx
package allows users to retrieve data in R from any organization using the SDMX standard. However, its broad functionality makes the interface unnecessarily complex for accessing only IMF data and can be challenging for users new to SDMX.
To simplify this, we created custom wrapper functions available in the internal bookr libary, which can alternatively also be loaded using the imf_data_utils.R
file. These functions streamline the process, enabling efficient retrieval and analysis of both public and authenticated IMF datasets.
The following sections will guide you through using these tools, so you can focus on data analysis without dealing with the complexities of SDMX queries.
Setting Up Your Environment
You first need to the install the rsdmx
package. You cannot install it the usual way. To be able to access imfdata you need to install the latest version from Github. Before you can install it, you first need to load (or install) the devtools package.
You then can install the rsdmx
package from Github. Please note the force=TRUE
argument, which helps prevent installation problems.
install_github("opensdmx/rsdmx", force = TRUE)
Next, you can load our internal bookr package, amongst the others needed, and the imf_data_utils
script that we created to read iData:
library(rsdmx)
library(AzureAuth)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(lubridate)
library(here)
here() starts at C:/IMF-R-Book
library(bookr)
Bookr package has been loaded.
source(here("utils/theme_and_colors_IMF.R")) # Custom IMF themes for charts
source(here("utils/imf_data_utils.R")) # Custom functions for iData
Accessing public WEO data
As the next example, we will retrieve unemployment rates and real GDP for the USA and the Netherlands from the WEO database:
<- imfdata_by_countries_and_series(
weo_data department = "RES",
dataset = "WEO",
countries = c("USA", "NLD"),
series = c("LUR","NGDP_R"),
frequency = "A"
)str(weo_data)
Accessing public WEO Data with labels
We will demonstrate how to retrieve unemployment rates (LUR
) for the USA and the Netherlands from the WEO (World Economic Outlook) database. Using our wrapper function, the data will include descriptive labels for dimensions like countries and series.
# Retrieve labeled WEO data for unemployment rates (LUR)
<- imfdata_by_countries_and_series(
weo_unemployment_data department = "RES", # RES: Research Department
dataset = "WEO", # WEO: World Economic Outlook database
countries = c("USA", "NLD"), # USA and Netherlands
series = c("LUR"), # Unemployment rates
frequency = "A", # Annual data
needs_labels = TRUE # Include descriptive labels
)
# Preview the last few rows of the data
tail(weo_unemployment_data)
You will now notice that this output contains the multilingual labels for the country names and descriptive labels for the indicators.
URL Structure Breakdown
In both examples, the rsdmx
library calls a URL. Let’s explain how it is structured:
https://api.imf.org/external/sdmx/2.1/data/IMF.RES,WEO/USA+NLD.LUR+NGDP_R.A/all/
Base URL:
https://api.imf.org/external/sdmx/2.1/data/
- This is the base API endpoint provided by the IMF for retrieving SDMX (Statistical Data and Metadata Exchange) data.
Dataset Identifier:
IMF.RES,WEO
- IMF.RES: RES department
- WEO: WEO database .
Dimension Filters (key) :
USA+NLD.LUR+NGDP_R.A
- This specifies the filters applied to the data request:
- USA+NLD: Limits the query to the United States (USA) and the Netherlands (NLD).
- LUR+NGDP_R: Limits the query
- A: Annual data
- This specifies the filters applied to the data request:
Wildcard for Observations:
/all/
- This part specifies that all available observations (time series data) should be retrieved.
- The
all
here refers to theproviderref
, which filters data based on the source organization. For example, EUROSTAT might maintain a single dataflow with contributions from multiple providers, such as Norway’s central bank or statistical office. - This parameter is not relevant to IMF data, as the IMF does not use its dataflows for collection.
The first ten columns are from the database. The last two we have added. As you can see TIME_PERIOD is a string. We have converted it from a number to a date. Similar OBS_VALUE is a string. We have converted it to a number in value.
Getting data sets using the key
Let’s go back to the url. Note that we have a key USA+NLD.LUR+NGDP_R.A
. We can also fetch the data with this key using the imfdata_by_key
function:
<- imfdata_by_key(
weo_data department = "RES",
dataset = "WEO",
key="USA+NLD.LUR+NGDP_R.A"
)
We don’t need to build the key by hand. We can construct it from a list.
=c("USA","NLD")
countries=c("LUR","NGDP_R")
vars="A"
freq=list(countries,vars,freq)
key
<- imfdata_by_key(
weo_data department = "RES",
dataset = "WEO",
key=key
)
For most datasets, the key consists of three parts: the countries, the variables and the frequency. For those sets, it is easiest to use the imfdata_by_countries_and_series
function.
There are a few datasets that have more dimensions. For example, the CPI
database has five dimensions.
How to find the key for your data
It is best to find the number of dimensions and possible values using the iData add-in in Excel. You can see that the CPI dataset has five dimensions: countries, index type, expenditure category, type of transformation and frequency
Once you have downloaded the data in Excel you can see how the key is constructed in the column SERIES_CODE:
You can use this information to build the key. If you want to get all possible values you use NULL
# CPI with some dimensions omitted
= c("BEL", "USA")
countries = NULL
index = c("CP04", "CP02")
category = NULL
transform = c("A")
frequency
= list(countries, index, category, transform, frequency)
key
<- imfdata_by_key(department = "STA",
CPI_data dataset = "CPI",
key = key)
str(CPI_data)
Alternatively, instead of downloading the dataset and examining it in Excel, you can programmatically retrieve the available values for specific facets like countries or series. This is especially useful for building dynamic queries.
For example, we can retrieve all available countries for the WEO dataset programmatically:
# Retrieve all available countries in the WEO dataset
= NULL # Use NULL to fetch all available countries
countries = c("LUR", "NGDP_R") # Specify a series for filtering
series = c("A") # Specify frequency (Annual)
frequency
# Construct the key
= list(countries, series, frequency)
key
# Fetch data with the constructed key
<- imfdata_by_key(department = "RES", dataset = "WEO", key = key)
available_countries
# Extract unique country codes
<- unique(available_countries$COUNTRY)
unique_countries
# Print the unique country codes
print(unique_countries)
This will output a list of all countries available in the WEO dataset. Once you retrieve the codes, you can use them to construct queries tailored to your analysis. For example, you can filter specific countries or include all countries in your dataset.
Restricted datasets
If you want to access a restricted dataset you need to add the needs_auth=T
argument to the function. You will be asked to login (or if you done that in the past hour, it will get a token from cache).
<- imfdata_by_countries_and_series(
weo_data department = "RES",
dataset = "WEO_LIVE",
countries = c("USA", "NLD"),
series = c("LUR","NGDP_R"),
frequency = "A",
needs_auth = T
)
Available datasets
The easiest way to find out the available datasets is to look at the IMF Data add-in in Excel. For each dataset it will show you the available sets. Note that if you are logged in, you will see more datasets than if you are not logged in.
For each dataset, you can also see the dimensions. To find the the precise code, its is best to select your variable in Excel and then look at the generated keys.
For example, when I selected the monthly forecast for the one-year ahead unemployment rate I get the following:
Let’s retrieve this in R. We take all countries and use the code:
=NULL
countries
="UPRATE.1YF_M.MEAN._Z.M"
code=list(countries,code)
key
<-imfdata_by_key(department = "CSF",dataset = 'CF',needs_auth = T,
cf_datakey=key)
str(cf_data)
Errors
If you get an error it is likely that you have either misspecified the dataset or have an incorrect key. For example, if you use dataset WEO2 instead of WEO you will get this error:
If you find columns like `TIME_PERIOD` or `OBS_VALUE` missing in your retrieved dataset you likely have misspecified the key.
Utilizing rsdmx
, you can efficiently access and analyze SDMX data, including IMF data.