Explaining and modifying code

Introduction

In this section we will show you how AI tools, such as ChatGPT, can assist in automating and extending the creation of data visualizations. We will use AI to analyze WEO data by creating a panel chart of unemployment rates and extending it to plot GDP growth against employment growth with regression analysis.

Understanding Existing Code

Let’s say that, for example, you were sent the following code on a panel of unemployment data by country. Let’s quickly run it to see what it currently produces.

# Load required libraries
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
here() starts at C:/IMF-R-Book
library(ggplot2)
library(haven)

# Load custom IMF themes and colors

source(here("utils/theme_and_colors_IMF.R"))
source(here("utils/Add_text_to_figure_panel.R"))

Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine
# Import the WEO data
weo <- read_dta(here::here("databases/WEOApr2023Pub.dta"))

# Select countries based on the data
selected_countries <- c("Euro area", "Japan", "United Kingdom", "Canada", 
                        "Australia", "New Zealand", "Switzerland", 
                        "Sweden", "Norway")

# Filter and preprocess the data
weo_filtered <- weo %>%
  filter(
    country %in% selected_countries, 
    !is.na(year) & year >= 1980 & year <= 2010,
    !is.na(lur)
  ) %>%
  select(country, year, lur)

# Ensure the filtered data is not empty
if (nrow(weo_filtered) == 0) {
  stop("No data available for the specified countries and year range.")
}

# Create the panel chart
fig_unemployment_panel <- ggplot(weo_filtered, aes(x = year, y = lur)) +
  geom_line(size = 1, color = blue) +  # Set line color to IMF blue explicitly
  facet_wrap(~ country, scales = "free") +  # Adjusted for free_y to allow different y-scales
  labs(
    title = "Unemployment Rate by Country (1980–2010)",
    subtitle = "(In Percent)",
    x = "Year",
    y = "Unemployment Rate",
    caption = "Sources: IMF WEO April 2023 database."
  ) +
  theme_imf_panel() +  # Apply IMF panel theme globally
  theme(
    plot.title = element_text(size = 16, face = "bold", hjust = 0, color = blue),  # Main title in IMF theme blue
    plot.subtitle = element_text(size = 12, hjust = 0, color = blue),             # Subtitle in IMF theme blue
    strip.text = element_text(size = 12, face = "bold", hjust = 0, color = blue), # Subplot titles in IMF theme blue
    axis.text.x = element_text(angle = 90, hjust = 0.5),                           # Horizontal x-axis labels
    axis.ticks.length = unit(0.25, "cm"),                                         # Adjust tick size for clarity
    panel.spacing = unit(1, "lines"),                                             # Add spacing between panels
    strip.background = element_blank(),                                           # Remove background for facet labels
    panel.border = element_blank(),                                               # Remove borders around subplots
    legend.position = "none"                                                      # Remove legend
  ) +
  scale_x_continuous(
    breaks = seq(1980, 2010, 5),  # Ensure regular 5-year breaks for all x-axes
    expand = c(0.02, 0.02)        # Slight expansion to avoid crowding at edges
  ) +
  coord_cartesian(xlim = c(1980, 2010))  # Explicitly set consistent x-axis limits
fig_unemployment_panel

Using AI to summarize and explain code

You may not be have much time to read the entire code, so this is where AI tools can come in handy to help you. You can briefly ask AI to summarize what this code does and what data it uses. Just copy and paste the code, and ask a prompt like the below.

This gives you a general understanding of what this code does and where it is getting its data from.

Using Existing Code

Now that we have understood that the data used is the entire WEO 2023 dataset, we can use a similar version of this code to create our own charts. All we need to simply do is to ask AI to help us do that.

Initiating an AI prompt

Let’s ask AI to look at the code and create a similar panel chart for us, but let’s make it a little more complicated. Let’s use this dataset to plot GDP growth on the X-axis, employment growth in the Y-axis, add a regression line, and add R-squared coefficients to the panel. Your prompt to AI would look something like the below:

AI will come back to you with a similar code for your new chart creation.

Perhaps when you run it though, you notice it doesn’t work well, or returns some errors.

Debugging with AI

To effectively debug with AI, you need to think critically and recall whether you provided the system with all the necessary info.

You might realize, for example, that we didn’t specify that the names of the series just that it was GDP growth and employment. You now follow up with AI to make sure to specify the series names.

This code now runs and produces exactly what we wanted! Even with just one iteration, AI was able to help create a fully new panel with different variables, regression lines, and R-squared values.

# Load required libraries
library(tidyverse)
library(here)
library(ggplot2)
library(haven)

# Import the WEO data
weo <- read_dta(here::here("databases/WEOApr2023Pub.dta"))

# Define the selected countries
selected_countries <- c("Euro area", "Japan", "United Kingdom", "Canada", 
                        "Australia", "New Zealand", "Switzerland", 
                        "Sweden", "Norway")

# Filter and preprocess the data
weo_filtered <- weo %>%
  filter(
    country %in% selected_countries, 
    !is.na(year) & year >= 1990 & year <= 2024,  # Filter for years 1990–2024
    !is.na(ngdppch),  # Filter non-missing GDP growth
    !is.na(le_pch)    # Filter non-missing employment growth
  ) %>%
  select(country, year, ngdppch, le_pch)

# Ensure the filtered data is not empty
if (nrow(weo_filtered) == 0) {
  stop("No data available for the specified countries and year range.")
}

# Add regression stats to each panel
weo_with_regression <- weo_filtered %>%
  group_by(country) %>%
  summarize(
    regression = list(lm(le_pch ~ ngdppch, data = cur_data())),
    r_squared = summary(lm(le_pch ~ ngdppch, data = cur_data()))$r.squared,
    x_position = quantile(ngdppch, 0.8, na.rm = TRUE),  # Dynamically adjust x placement
    y_position = quantile(le_pch, 0.9, na.rm = TRUE)   # Dynamically adjust y placement
  ) %>%
  ungroup()

# Merge R² data back into the main data for annotation
weo_annotated <- weo_filtered %>%
  left_join(weo_with_regression %>% select(country, r_squared, x_position, y_position), by = "country")

# Create the panel chart with regression lines and R²
panel_chart <- ggplot(weo_annotated, aes(x = ngdppch, y = le_pch)) +
  geom_point(aes(color = country), alpha = 0.6) +  # Scatter plot
  geom_smooth(method = "lm", se = FALSE, color = "black", linetype = "dotted", linewidth = 1) +  # Regression line
  facet_wrap(~ country, scales = "free") +  # One panel per country
  labs(
    title = "Relationship Between GDP Growth and Employment Growth",
    subtitle = "1990–2024",
    x = "GDP Growth (%)",
    y = "Employment Growth (%)",
    caption = "Sources: IMF WEO April 2023 database."
  ) +
  theme_minimal() +
  theme(
    strip.text = element_text(size = 12, face = "bold"),
   axis.text.x = element_text(angle = 0, vjust = 0.5),   
    legend.position = "none"
  ) +
  geom_text(
    data = weo_with_regression, 
    aes(
      x = x_position,  # Dynamically place R² slightly left of the 80th percentile
      y = y_position,  # Place R² slightly below the 90th percentile
      label = paste0("R² = ", round(r_squared, 2))
    ),
    inherit.aes = FALSE, hjust = 1, vjust = 1, size = 3, color = "black"
  )
panel_chart
`geom_smooth()` using formula = 'y ~ x'

Now that we have the panel chart we want, let’s make it IMF format!

# Load required libraries
library(tidyverse)
library(here)
library(ggplot2)
library(haven)

# Load custom IMF themes and colors
source(here("utils/theme_and_colors_IMF.R"))
source(here("utils/Add_text_to_figure_panel.R"))

# Import the WEO data
weo <- read_dta(here::here("databases/WEOApr2023Pub.dta"))

# Define the selected countries
selected_countries <- c("Euro area", "Japan", "United Kingdom", "Canada", 
                        "Australia", "New Zealand", "Switzerland", 
                        "Sweden", "Norway")

# Filter and preprocess the data
weo_filtered <- weo %>%
  filter(
    country %in% selected_countries, 
    !is.na(year) & year >= 1990 & year <= 2024,  # Filter for years 1990–2024
    !is.na(ngdppch),  # Filter non-missing GDP growth
    !is.na(le_pch)    # Filter non-missing employment growth
  ) %>%
  select(country, year, ngdppch, le_pch)

# Ensure the filtered data is not empty
if (nrow(weo_filtered) == 0) {
  stop("No data available for the specified countries and year range.")
}

# Add regression stats to each panel
weo_with_regression <- weo_filtered %>%
  group_by(country) %>%
  summarize(
    regression = list(lm(le_pch ~ ngdppch, data = cur_data())),
    r_squared = summary(lm(le_pch ~ ngdppch, data = cur_data()))$r.squared,
    coef_intercept = coef(lm(le_pch ~ ngdppch, data = cur_data()))[1],
    coef_slope = coef(lm(le_pch ~ ngdppch, data = cur_data()))[2],
    x_position = min(ngdppch, na.rm = TRUE),  # Place at the minimum x value
    y_position = max(le_pch, na.rm = TRUE)   # Place at the maximum y value
  ) %>%
  ungroup()

# Merge R² and regression coefficients back into the main data for annotation
weo_annotated <- weo_filtered %>%
  left_join(weo_with_regression %>% select(country, r_squared, coef_intercept, coef_slope, x_position, y_position), by = "country")

# Create the panel chart with regression lines, R², and equation
fig_gdp_vs_employment <- ggplot(weo_annotated, aes(x = ngdppch, y = le_pch)) +
  geom_point(shape = 21, fill = blue, color = blue, alpha = 0.6, size = 2.5) +  # Smaller scatter plot points in IMF blue
  geom_smooth(method = "lm", se = FALSE, color = blue, linetype = "dashed", linewidth = 1) +  # Dashed regression line in IMF blue
  facet_wrap(~ country, scales = "free") +  # One panel per country
  labs(
    title = "Relationship Between GDP Growth and Employment Growth",
    subtitle = "(1990–2024, In Percent)",
    x = "GDP Growth (%)",
    y = "Employment Growth (%)",
    caption = "Sources: IMF WEO April 2023 database."
  ) +
  theme_imf_panel() +  # Use IMF panel theme for consistent style
  theme(
    plot.caption = element_text(size = 8, hjust = 0),
    plot.title = element_text(size = 16, face = "bold", hjust = 0, color = blue),  # Title in IMF blue
    plot.subtitle = element_text(size = 12, hjust = 0, color = blue),             # Subtitle in IMF blue
    strip.text = element_text(size = 12, face = "bold", hjust = 0, color = blue), # Subplot titles in IMF blue
    axis.text.x = element_text(angle = 0, hjust = 0.5),                           # Horizontal x-axis labels
    strip.background = element_blank(),                                           # Remove background for facet labels
    panel.spacing = unit(1, "lines"),                                             # Add spacing between panels
    panel.border = element_blank(),                                               # Remove aggressive black borders
    legend.position = "none"                                                      # Remove legend
  ) +
  # Add regression equation and R² as text annotations
  geom_text(
    data = weo_with_regression, 
    aes(
      x = x_position,  # Place at the minimum x value (left)
      y = y_position,  # Place at the maximum y value (top)
      label = paste0(
        "y = ", round(coef_slope, 2), "x + ", round(coef_intercept, 2), "\n",
        "R² = ", round(r_squared, 2)
      )
    ),
    inherit.aes = FALSE, hjust = -0.1, vjust = 1.1, size = 3.5, color = "black"
  )
fig_gdp_vs_employment
`geom_smooth()` using formula = 'y ~ x'

You have successfully used AI to help you take an existing R code and dataset and create your own chart panel.