Creating an R Chart Without Source Data Using AI

In this tutorial, we will learn how to create a chart in R when the original data is not available. We will use an example chart from the most recent Uruguay Staff Report and leverage AI tools to guide us in estimating the data and building the chart.

Getting Started

At times, we come across charts we want to replicate but are unable to locate the source data. Let’s use the chart below as an example.

This chart represents credit to the private sector in selected Latin American countries as a percentage of GDP.

Asking AI for assistance

When the source data is missing, we can ask AI tools, like ChatGPT to estimate the values for us, and help write the necessary R code to create a similar visualization.

Estimating the Data using AI

First, copy and paste the chart you are trying to replicate into AI agents like ChatGPT, and ask for estimation of the data points and the code to replicate the chart. Your conversation would look something like this.

Your AI tool, like ChatGPT, will respond something like this:

Here is the R code provided from ChatGPT.

# Load necessary libraries
library(ggplot2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

# Create the data frame with estimated values
data <- data.frame(
  Country = c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),
  Credit_to_Private_Sector = c(80, 70, 60, 50, 45, 25, 20)
)

# Define the LA7 average
la7_average <- 50

# Create the bar chart
ggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +
  geom_bar(stat = "identity", fill = ifelse(data$Country == "URY", "red", "steelblue"), width = 0.6) +
  geom_hline(yintercept = la7_average, linetype = "dashed", color = "black", linewidth = 1.2) +
  annotate("text", x = 7, y = la7_average + 2, label = "LA7 average", color = "black", hjust = 1) +
  labs(
    title = "Credit to Private Sector: 2023 1/",
    subtitle = "(Percent of GDP)",
    x = NULL,
    y = "Percent",
    caption = "1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations."
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(size = 10, hjust = 0),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10)
  )

You will notice that it is close, but not quite exactly the same.

Refining your Chart using AI

Using AI tools is an iterative process, thus there will typically be a few iterations before you obtain the exact chart you want. For example, we will now ask to further align the chart with what we were looking for, i.e., sorting the columns properly. You can ask AI something similar to the below.

# Load necessary libraries
library(ggplot2)
library(dplyr)

# Create the data frame with estimated values
data <- data.frame(
  Country = c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),
  Credit_to_Private_Sector = c(80, 70, 60, 50, 45, 25, 20)
)

# Define the LA7 average
la7_average <- 50

# Reorder the Country factor based on Credit_to_Private_Sector values (from largest to smallest)
data$Country <- factor(data$Country, levels = data$Country[order(-data$Credit_to_Private_Sector)])

# Create the bar chart
ggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +
  geom_bar(stat = "identity", fill = ifelse(data$Country == "URY", "red", "steelblue"), width = 0.6) +
  geom_hline(yintercept = la7_average, linetype = "dashed", color = "black", linewidth = 1.2) +
  annotate("text", x = 7, y = la7_average + 2, label = "LA7 average", color = "black", hjust = 1) +
  labs(
    title = "Credit to Private Sector: 2023 1/",
    subtitle = "(Percent of GDP)",
    x = NULL,
    y = "Percent",
    caption = "1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations."
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, size = 12),
    plot.caption = element_text(size = 10, hjust = 0),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.text.x = element_text(size = 10),
    axis.text.y = element_text(size = 10)
  )

This code is almost perfect, especially considering we didn’t even ask it to use our IMF formatting template!

For our finishing touches, let’s ask ChatGPT to help move the LA7 average data label into the legend, and also use the IMF format to look like the original chart. Your request can look something like the below.

# Load necessary libraries
library(ggplot2)
library(dplyr)
library(readxl)
library(here)

here() starts at C:/IMF-R-Book

# Load IMF theme utilities
source(here("utils/theme_and_colors_IMF.R"))
source(here("utils/Add_text_to_figure_panel.R"))


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

# Create the data frame with estimated values
data <- data.frame(
  Country = c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),
  Credit_to_Private_Sector = c(80, 70, 60, 50, 45, 25, 20)
)

# Define the LA7 average
la7_average <- 50

# Reorder the Country factor based on Credit_to_Private_Sector values (from largest to smallest)
data$Country <- factor(data$Country, levels = data$Country[order(-data$Credit_to_Private_Sector)])

# Create a dummy data frame for the LA7 average line legend
dummy_line <- data.frame(Country = c(NA), Credit_to_Private_Sector = c(NA))

# Create the bar chart using IMF-style formatting
ggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +
  geom_bar(stat = "identity", fill = ifelse(data$Country == "URY", "#900000", "#4B82AD"), 
           color = "grey70", size = 0.4) +  # Softer grey borders with thinner lines
  geom_hline(aes(yintercept = la7_average, linetype = "LA7 average"), color = "black", linewidth = 1.2) +
  scale_linetype_manual(name = "", values = "dashed") +  # Add dashed line in the legend
  labs(
    title = "Credit to Private Sector: 2023 1/",  # Custom title
    subtitle = "(Percent of GDP)",  # Custom subtitle
    x = "",  # Removing X axis label
    y = "",  # Removing Y axis label
    caption = "1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations."  # IMF-style caption
  ) +
  theme_imf() +  # Apply IMF theme
  theme(
    plot.title = element_text(hjust = 0, size = 16, face = "bold", color = "#4B82AD"),  # Left-align title with corrected IMF blue
    plot.subtitle = element_text(hjust = 0, size = 12, color = "#4B82AD"),  # Left-align subtitle with IMF blue
    plot.caption = element_text(size = 8, hjust = 0, color = "black"),  # Smaller caption font size
    legend.position = c(0.75, 0.95),  # Position the legend inside the plot area
    legend.direction = "horizontal",  # Arrange the legend horizontally
    legend.background = element_rect(fill = "transparent"),  # Transparent background for the legend
    legend.text = element_text(size = 9, color = "black"),  # Slightly larger legend text
    panel.grid.major = element_blank(),  # Remove grid lines
    panel.grid.minor = element_blank(),  # Remove grid lines
    axis.text.x = element_text(size = 10, color = "black"),  # Set x-axis text size and color
    axis.text.y = element_text(size = 10, color = "black"),  # Set y-axis text size and color
    axis.line = element_line(size = 0.3, color = "grey80"),  # Softer grey axis lines
    plot.margin = margin(t = 10, b = 10, l = 10, r = 10)  # Adjust margins to match IMF style
  ) +
  guides(linetype = guide_legend(override.aes = list(color = "black")))  # Ensure the dashed line appears black in the legend

Of course, this method is not 100% accurate, as we are asking AI to best estimate data points, but it provides us with a good alternative when trying to recreate a chart when the source data is not available.