In this tutorial, we will learn how to create a chart in R when the original data is not available. We will use an example chart from the most recent Uruguay Staff Report and leverage AI tools to guide us in estimating the data and building the chart.
Getting Started
At times, we come across charts we want to replicate but are unable to locate the source data. Let’s use the chart below as an example.
This chart represents credit to the private sector in selected Latin American countries as a percentage of GDP.
Asking AI for assistance
When the source data is missing, we can ask AI tools, like ChatGPT to estimate the values for us, and help write the necessary R code to create a similar visualization.
Estimating the Data using AI
First, copy and paste the chart you are trying to replicate into AI agents like ChatGPT, and ask for estimation of the data points and the code to replicate the chart. Your conversation would look something like this.
Your AI tool, like ChatGPT, will respond something like this:
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
# Create the data frame with estimated valuesdata <-data.frame(Country =c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),Credit_to_Private_Sector =c(80, 70, 60, 50, 45, 25, 20))# Define the LA7 averagela7_average <-50# Create the bar chartggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +geom_bar(stat ="identity", fill =ifelse(data$Country =="URY", "red", "steelblue"), width =0.6) +geom_hline(yintercept = la7_average, linetype ="dashed", color ="black", linewidth =1.2) +annotate("text", x =7, y = la7_average +2, label ="LA7 average", color ="black", hjust =1) +labs(title ="Credit to Private Sector: 2023 1/",subtitle ="(Percent of GDP)",x =NULL,y ="Percent",caption ="1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations." ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =12),plot.caption =element_text(size =10, hjust =0),panel.grid.major =element_blank(),panel.grid.minor =element_blank(),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10) )
You will notice that it is close, but not quite exactly the same.
Refining your Chart using AI
Using AI tools is an iterative process, thus there will typically be a few iterations before you obtain the exact chart you want. For example, we will now ask to further align the chart with what we were looking for, i.e., sorting the columns properly. You can ask AI something similar to the below.
# Load necessary librarieslibrary(ggplot2)library(dplyr)# Create the data frame with estimated valuesdata <-data.frame(Country =c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),Credit_to_Private_Sector =c(80, 70, 60, 50, 45, 25, 20))# Define the LA7 averagela7_average <-50# Reorder the Country factor based on Credit_to_Private_Sector values (from largest to smallest)data$Country <-factor(data$Country, levels = data$Country[order(-data$Credit_to_Private_Sector)])# Create the bar chartggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +geom_bar(stat ="identity", fill =ifelse(data$Country =="URY", "red", "steelblue"), width =0.6) +geom_hline(yintercept = la7_average, linetype ="dashed", color ="black", linewidth =1.2) +annotate("text", x =7, y = la7_average +2, label ="LA7 average", color ="black", hjust =1) +labs(title ="Credit to Private Sector: 2023 1/",subtitle ="(Percent of GDP)",x =NULL,y ="Percent",caption ="1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations." ) +theme_minimal() +theme(plot.title =element_text(hjust =0.5, size =14, face ="bold"),plot.subtitle =element_text(hjust =0.5, size =12),plot.caption =element_text(size =10, hjust =0),panel.grid.major =element_blank(),panel.grid.minor =element_blank(),axis.text.x =element_text(size =10),axis.text.y =element_text(size =10) )
This code is almost perfect, especially considering we didn’t even ask it to use our IMF formatting template!
For our finishing touches, let’s ask ChatGPT to help move the LA7 average data label into the legend, and also use the IMF format to look like the original chart. Your request can look something like the below.
# Load IMF theme utilitiessource(here("utils/theme_and_colors_IMF.R"))source(here("utils/Add_text_to_figure_panel.R"))
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
# Create the data frame with estimated valuesdata <-data.frame(Country =c("CHL", "BRA", "PRY", "PER", "COL", "URY", "MEX"),Credit_to_Private_Sector =c(80, 70, 60, 50, 45, 25, 20))# Define the LA7 averagela7_average <-50# Reorder the Country factor based on Credit_to_Private_Sector values (from largest to smallest)data$Country <-factor(data$Country, levels = data$Country[order(-data$Credit_to_Private_Sector)])# Create a dummy data frame for the LA7 average line legenddummy_line <-data.frame(Country =c(NA), Credit_to_Private_Sector =c(NA))# Create the bar chart using IMF-style formattingggplot(data, aes(x = Country, y = Credit_to_Private_Sector)) +geom_bar(stat ="identity", fill =ifelse(data$Country =="URY", "#900000", "#4B82AD"), color ="grey70", size =0.4) +# Softer grey borders with thinner linesgeom_hline(aes(yintercept = la7_average, linetype ="LA7 average"), color ="black", linewidth =1.2) +scale_linetype_manual(name ="", values ="dashed") +# Add dashed line in the legendlabs(title ="Credit to Private Sector: 2023 1/", # Custom titlesubtitle ="(Percent of GDP)", # Custom subtitlex ="", # Removing X axis labely ="", # Removing Y axis labelcaption ="1/ Latest observation available for PER is 2022.\nSources: IMF staff calculations."# IMF-style caption ) +theme_imf() +# Apply IMF themetheme(plot.title =element_text(hjust =0, size =16, face ="bold", color ="#4B82AD"), # Left-align title with corrected IMF blueplot.subtitle =element_text(hjust =0, size =12, color ="#4B82AD"), # Left-align subtitle with IMF blueplot.caption =element_text(size =8, hjust =0, color ="black"), # Smaller caption font sizelegend.position =c(0.75, 0.95), # Position the legend inside the plot arealegend.direction ="horizontal", # Arrange the legend horizontallylegend.background =element_rect(fill ="transparent"), # Transparent background for the legendlegend.text =element_text(size =9, color ="black"), # Slightly larger legend textpanel.grid.major =element_blank(), # Remove grid linespanel.grid.minor =element_blank(), # Remove grid linesaxis.text.x =element_text(size =10, color ="black"), # Set x-axis text size and coloraxis.text.y =element_text(size =10, color ="black"), # Set y-axis text size and coloraxis.line =element_line(size =0.3, color ="grey80"), # Softer grey axis linesplot.margin =margin(t =10, b =10, l =10, r =10) # Adjust margins to match IMF style ) +guides(linetype =guide_legend(override.aes =list(color ="black"))) # Ensure the dashed line appears black in the legend
Of course, this method is not 100% accurate, as we are asking AI to best estimate data points, but it provides us with a good alternative when trying to recreate a chart when the source data is not available.