Utilizing CEIC Data in R

In this tutorial, we will explore how to integrate CEIC data into R, allowing us to access and analyze a wide range of economic indicators. CEIC’s vast repository of global economic data can be a valuable tool for economists and research analysts, making it essential to understand how to retrieve and work with this data effectively.

Prerequisites

Install and Load Packages

To get started, you’ll need to install the CEIC package from the CEIC repository and load the necessary libraries.

# Install the 'ceic' package from the specified repository. 
install.packages("ceic", repos = "https://downloads.ceicdata.com/R/", type = "source")
# Load the necessary libraries
library(zoo)

Attaching package: 'zoo'
The following objects are masked from 'package:base':

    as.Date, as.Date.numeric
library(xts)
library(ceic)

Authentication

You must log in to access the CEIC database. The ceic.login() function initiates the login process. Replace the email below with your CEIC account email and password.

ceic.login("youremail@imf.org",password = "Your password") 

If you encounter an “error in file”, you might need to manually create a folder for storing CEIC data. You can do this using the command below, and then running the login command again.

dir.create("~/R", recursive = TRUE, showWarnings = TRUE)

Data Exploration using the CEIC Website

Finding series codes

The CEIC website offers an easy user-friendly interface to allow you to explore series codes. First, login at https://www.ceicdata.com/en.

Then, browse and search by key words in the top search bar. For example, let’s look for U.S. unemployment rate. You will see multiple options pop up as available series.

In this example, the first series that pops up is the series we are looking for. When clicking on the series, it opens a more detailed menu that allows you to see the ID, the SR Code, and some chart, data, and statistics. This is helpful if only looking for one series.

If you are looking for multiple series, you can drag them into the my insights panel. Let’s use for example four different unemployment series for the U.S. Your insights panel should look something like this.

If you click on the default view, ‘View 1’, you can easily visualize the data as a graph or a table. You will also notice a button on the top left called ‘series’.

When clicking on the ‘Series’ button in your view, it allows you to see the key information for your series, and most importantly the series IDs.

Retrieving the series code in R

Now that we have easily identified the series codes we are looking for, we can retrieve them in R using their ID. Let’s first start by retrieving the general seasonally adjusted unemployment series for the U.S.

# Retrieve time series using its ID
sa_unemp_us <- ceic.series(40902301)

We can optionally view the structure of the series, giving a more detailed view of the timepoints and metadata using the command ‘View’.

View(sa_unemp_us)

The retrieved data is structured as a list with two key components:

  1. Metadata: Contains information like the name, frequency, unit, and source of the data.

  2. Timepoints: Stores the actual time series data.

We can also quickly plot the data to visualize what it looks like using this command.

#Plot all available data
plot(sa_unemp_us$timepoints,main=sa_unemp_us$metadata$name)

Downloading Multiple Time Series

If we want to download all the other unemployment series codes we were looking at, we would do it by defining a vector of series IDs, and then retrieving them. Here, we retrieve the timepoints and metadata separately.

# Define a vector of series IDs
unempl_ids <- c('40952001', '40953401', '40954701')

# Retrieve data for all series
multi_unempl_data <- ceic.timepoints(unempl_ids)

# Retrieve metadata for all series
multi_unempl_metadata <- ceic.metadata(unempl_ids)

Merging and Structuring the Data

Once we have retrieved multiple time series data from CEIC (in this case, multiple U.S. unemployment-related series), we often want to combine them into a single dataset for comparison or further analysis. This is particularly useful if you want to compare trends across different variables over the same time period.

When you retrieve multiple time series from CEIC, they are stored as individual objects within a list. To work with these multiple series as a single dataset, we need to merge them. The do.call("merge", ...) function in R is a convenient way to combine the individual zoo time series into a single object.

  • The do.call() function applies a function (in this case, merge) to all elements in a list.

  • merge() aligns the time series based on their dates. If there are missing values for any series on a particular date, the result will include NA for those points.

Here is the code that merges the unemployment time series data, also giving you a preview of the new aligned series at the end.

# Merge the time series data 
merged_data <- do.call("merge", multi_unempl_data)  
# Set column names based on series names from metadata 
colnames(merged_data) <- multi_unempl_metadata$name 
#View the tail end of the data
tail(merged_data)
           Unemployment Rate Unemployment Rate: Male Unemployment Rate: Female
2024-12-01               3.8                     4.0                       3.6
2025-01-01               4.4                     4.7                       4.0
2025-02-01               4.5                     4.8                       4.1
2025-03-01               4.2                     4.5                       3.9
2025-04-01               3.9                     4.2                       3.5
2025-05-01               4.0                     4.1                       3.9

The merging process ensures three main things:

  • Time Alignment: time series are aligned along the same timeline. For example, if one series has data starting in 2000, and another starts in 2005, the merged dataset will have NA values for the earlier years for the second series, ensuring consistency in the time index.

  • Column Naming: After merging, each column in the resulting dataset corresponds to one of the time series. The column names are derived from the metadata of the series, allowing you to easily identify which series each column represents.

  • Handling Missing Data: If one or more series have missing data for a certain time period, the merged dataset will include NA in those places.

Now, you have a single time series object containing multiple variables which can be used for further analysis.

Visualizing the Merged Data

You can easily visualize the merged dataset by plotting the series together, comparing their trends over time. For example, let’s visualize the three time series we have retrieved all in separate plots side by side for the time period of 2000 to the most recently available data.

# Define the range of interest as Date objects
start_date <- as.Date("2000-01-01")
end_date <- Sys.Date()

# Subset the merged data based on the date range
subset_data <- merged_data[index(merged_data) >= start_date & index(merged_data) <= end_date]

# Set up the plotting area to have 3 rows and 1 column (stacked plots)
par(mfrow = c(3, 1))

# Plot each series separately, stacking them vertically
plot.zoo(subset_data[, 1], 
         main = colnames(subset_data)[1], 
         xlab = "Date", 
         ylab = "Value", 
         col = "blue")

plot.zoo(subset_data[, 2], 
         main = colnames(subset_data)[2], 
         xlab = "Date", 
         ylab = "Value", 
         col = "green")

plot.zoo(subset_data[, 3], 
         main = colnames(subset_data)[3], 
         xlab = "Date", 
         ylab = "Value", 
         col = "red")

This provides a simple way to combine CEIC’s powerful dataset retrieval, with R’s flexible time series handling to easily manipulate, merge and visualize economic data.

Data Exploration Directly in R

We first explored how to search for data using the CEIC website. Now, we will explore how to find data from CEIC directly in R.

Exploring Classifications

Classifications group data into meaningful categories, such as regions, sectors, or types of economic indicators. This structure allows for efficient filtering and searching through CEIC’s large datasets. For example, classifications may help you locate all datasets related to balance of payments, banking statistics, interest rates and more.

We can explore available classifications in CEIC directly in R using the following command:

# Explore CEIC classifications
ceic.classifications()
                                    name     id
1                      National Accounts 200016
2                             Production 200017
3  Sales, Orders, Inventory and Shipment 200018
4                           Construction 200019
5             Properties and Real Estate 200020
6          Government and Public Finance 200021
7                            Demographic 200022
8                          Labour Market 200023
9    Domestic Trade and Household Survey 200024
10                         Foreign Trade 200025
11                    Balance of Payment 200026
12                   Inflation and Price 200027
13                              Monetary 200028
14                    Banking Statistics 200029
15                      Foreign Exchange 200030
16                         Interest Rate 200031
17                            Investment 200032
18                       Commodity Price 200033
19          Business and Economic Survey 200034
20                               Tourism 200035
21                             Transport 200036
22      Technology and telecommunication 200037
23                      Financial Market 200038

By understanding the classifications, you can focus on relevant data categories improving the efficiency of your analysis.

Searching for Data Series in R

CEIC offers many datasets, and you can search for specific data series using keywords, regions, and other filters. Here, we will search for the “Industrial Production Index” in G7 countries, focusing only on monthly data.

# Search for the Industrial Production Index in G7 countries
search_result <- ceic.search(
  keyword = "Industrial Production Index", #add the keyword
  region = ceic.regions("G7"), #add the region
  subscribed_only = TRUE, 
  frequency = "M", #add the frequency
  source = ceic.sources("CEIC Data") #looks at the entire CEIC data
)
Results found: 16
Batch size: 16
# Display the top search results
head(search_result)
         id                                                         name unit
1 414412237             Industrial Production Index: YoY: Monthly: Japan    %
2 414441527     Industrial Production Index: YoY: Monthly: United States    %
3 211484702 Industrial Production Index: YoY: Monthly: sa: United States    %
4 249416501         Industrial Production Index: YoY: Monthly: sa: Japan    %
5 414412187           Industrial Production Index: YoY: Monthly: Germany    %
6 414412177            Industrial Production Index: YoY: Monthly: France    %
        country province frequency status    source  startDate    endDate
1         Japan     <NA>   Monthly Active CEIC Data 1954-01-01 2025-04-01
2 United States     <NA>   Monthly Active CEIC Data 1920-01-01 2025-04-01
3 United States     <NA>   Monthly Active CEIC Data 1920-01-01 2025-04-01
4         Japan     <NA>   Monthly Active CEIC Data 1954-01-01 2025-04-01
5       Germany     <NA>   Monthly Active CEIC Data 1959-01-01 2025-04-01
6        France     <NA>   Monthly Active CEIC Data 1991-01-01 2025-04-01
  multiplierCode            lastUpdateTime keySeries periodEnd classification
1             NA 2025-06-13T04:42:43+00:00     false        31           <NA>
2             NA 2025-05-15T13:24:13+00:00     false        31           <NA>
3             NA 2025-05-15T13:24:13+00:00     false        31           <NA>
4             NA 2025-06-13T04:45:21+00:00     false        31           <NA>
5             NA 2025-06-06T06:33:16+00:00     false        31           <NA>
6             NA 2025-06-06T07:09:35+00:00     false        31           <NA>
  indicator remarks replacements           mnemonic isForecast isRevised
1      <NA>                 <NA>                         false      <NA>
2      <NA>                 <NA>                         false      <NA>
3      <NA>                 <NA> US.IPI.VO.SA-YoY-M      false      <NA>
4      <NA>                 <NA> JP.IPI.VO.SA-YoY-M      false      <NA>
5      <NA>                 <NA>                         false      <NA>
6      <NA>                 <NA>                         false      <NA>
  hasSchedule seriesTag  subscribed
1       false                  true
2       false                  true
3       false  UBHAAAAAA       true
4       false JBAAAAAAAA       true
5       false                  true
6       false                  true

Looking at the top portion of the search result might not provide a clear enough picture of the data structure you are looking at. Let’s dive into the details.

Data Structure

When you retrieve data from CEIC using ceic.search(), the result is structured as a data frame. Each row represents a data series, and columns provide metadata, including the name, region, frequency, source, and unique ID for that series. If we use the View function, we can optionally view the entire data frame in a new window. For example:

# Optionally view the data frame
View(search_result)

You’ll see a table with columns like:

  • ID: The unique identifier for each series.

  • Name: A brief description of the series (e.g., “Industrial Production Index”).

  • Country: The geographical area to which the data applies (e.g., “United States”).

  • Frequency: The data frequency (e.g., “Monthly”).

  • Start Date: The date which the data series starts.

  • End Date: The date which the data series ends.

  • Last updated time: The date which the series was last updated.

This structure helps you understand what kind of data you’re working with and provides critical metadata for further analysis.

You would retrieve the data in R in the same way we previously explored, utilizing the series codes directly. This is just a method to allow you to explore data directly in R, rather than on the website interface.

By following this tutorial, you’ve learned how to access, explore, and retrieve data from CEIC.