In this section we will show you how AI tools, such as ChatGPT, can assist in automating and extending the creation of data visualizations. We will use AI to analyze WEO data by creating a panel chart of unemployment rates and extending it to plot GDP growth against employment growth with regression analysis.
Understanding Existing Code
Let’s say that, for example, you were sent the following code on a panel of unemployment data by country. Let’s quickly run it to see what it currently produces.
# Load required librarieslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
here() starts at C:/IMF-R-Book
library(ggplot2)library(haven)# Load custom IMF themes and colorssource(here("utils/theme_and_colors_IMF.R"))source(here("utils/Add_text_to_figure_panel.R"))
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
# Import the WEO dataweo <-read_dta(here::here("databases/WEOApr2023Pub.dta"))# Select countries based on the dataselected_countries <-c("Euro area", "Japan", "United Kingdom", "Canada", "Australia", "New Zealand", "Switzerland", "Sweden", "Norway")# Filter and preprocess the dataweo_filtered <- weo %>%filter( country %in% selected_countries, !is.na(year) & year >=1980& year <=2010,!is.na(lur) ) %>%select(country, year, lur)# Ensure the filtered data is not emptyif (nrow(weo_filtered) ==0) {stop("No data available for the specified countries and year range.")}# Create the panel chartfig_unemployment_panel <-ggplot(weo_filtered, aes(x = year, y = lur)) +geom_line(size =1, color = blue) +# Set line color to IMF blue explicitlyfacet_wrap(~ country, scales ="free") +# Adjusted for free_y to allow different y-scaleslabs(title ="Unemployment Rate by Country (1980–2010)",subtitle ="(In Percent)",x ="Year",y ="Unemployment Rate",caption ="Sources: IMF WEO April 2023 database." ) +theme_imf_panel() +# Apply IMF panel theme globallytheme(plot.title =element_text(size =16, face ="bold", hjust =0, color = blue), # Main title in IMF theme blueplot.subtitle =element_text(size =12, hjust =0, color = blue), # Subtitle in IMF theme bluestrip.text =element_text(size =12, face ="bold", hjust =0, color = blue), # Subplot titles in IMF theme blueaxis.text.x =element_text(angle =90, hjust =0.5), # Horizontal x-axis labelsaxis.ticks.length =unit(0.25, "cm"), # Adjust tick size for claritypanel.spacing =unit(1, "lines"), # Add spacing between panelsstrip.background =element_blank(), # Remove background for facet labelspanel.border =element_blank(), # Remove borders around subplotslegend.position ="none"# Remove legend ) +scale_x_continuous(breaks =seq(1980, 2010, 5), # Ensure regular 5-year breaks for all x-axesexpand =c(0.02, 0.02) # Slight expansion to avoid crowding at edges ) +coord_cartesian(xlim =c(1980, 2010)) # Explicitly set consistent x-axis limits
fig_unemployment_panel
Using AI to summarize and explain code
You may not be have much time to read the entire code, so this is where AI tools can come in handy to help you. You can briefly ask AI to summarize what this code does and what data it uses. Just copy and paste the code, and ask a prompt like the below.
This gives you a general understanding of what this code does and where it is getting its data from.
Using Existing Code
Now that we have understood that the data used is the entire WEO 2023 dataset, we can use a similar version of this code to create our own charts. All we need to simply do is to ask AI to help us do that.
Initiating an AI prompt
Let’s ask AI to look at the code and create a similar panel chart for us, but let’s make it a little more complicated. Let’s use this dataset to plot GDP growth on the X-axis, employment growth in the Y-axis, add a regression line, and add R-squared coefficients to the panel. Your prompt to AI would look something like the below:
AI will come back to you with a similar code for your new chart creation.
Perhaps when you run it though, you notice it doesn’t work well, or returns some errors.
Debugging with AI
To effectively debug with AI, you need to think critically and recall whether you provided the system with all the necessary info.
You might realize, for example, that we didn’t specify that the names of the series just that it was GDP growth and employment. You now follow up with AI to make sure to specify the series names.
This code now runs and produces exactly what we wanted! Even with just one iteration, AI was able to help create a fully new panel with different variables, regression lines, and R-squared values.
# Load required librarieslibrary(tidyverse)library(here)library(ggplot2)library(haven)# Import the WEO dataweo <-read_dta(here::here("databases/WEOApr2023Pub.dta"))# Define the selected countriesselected_countries <-c("Euro area", "Japan", "United Kingdom", "Canada", "Australia", "New Zealand", "Switzerland", "Sweden", "Norway")# Filter and preprocess the dataweo_filtered <- weo %>%filter( country %in% selected_countries, !is.na(year) & year >=1990& year <=2024, # Filter for years 1990–2024!is.na(ngdppch), # Filter non-missing GDP growth!is.na(le_pch) # Filter non-missing employment growth ) %>%select(country, year, ngdppch, le_pch)# Ensure the filtered data is not emptyif (nrow(weo_filtered) ==0) {stop("No data available for the specified countries and year range.")}# Add regression stats to each panelweo_with_regression <- weo_filtered %>%group_by(country) %>%summarize(regression =list(lm(le_pch ~ ngdppch, data =cur_data())),r_squared =summary(lm(le_pch ~ ngdppch, data =cur_data()))$r.squared,x_position =quantile(ngdppch, 0.8, na.rm =TRUE), # Dynamically adjust x placementy_position =quantile(le_pch, 0.9, na.rm =TRUE) # Dynamically adjust y placement ) %>%ungroup()# Merge R² data back into the main data for annotationweo_annotated <- weo_filtered %>%left_join(weo_with_regression %>%select(country, r_squared, x_position, y_position), by ="country")# Create the panel chart with regression lines and R²panel_chart <-ggplot(weo_annotated, aes(x = ngdppch, y = le_pch)) +geom_point(aes(color = country), alpha =0.6) +# Scatter plotgeom_smooth(method ="lm", se =FALSE, color ="black", linetype ="dotted", linewidth =1) +# Regression linefacet_wrap(~ country, scales ="free") +# One panel per countrylabs(title ="Relationship Between GDP Growth and Employment Growth",subtitle ="1990–2024",x ="GDP Growth (%)",y ="Employment Growth (%)",caption ="Sources: IMF WEO April 2023 database." ) +theme_minimal() +theme(strip.text =element_text(size =12, face ="bold"),axis.text.x =element_text(angle =0, vjust =0.5), legend.position ="none" ) +geom_text(data = weo_with_regression, aes(x = x_position, # Dynamically place R² slightly left of the 80th percentiley = y_position, # Place R² slightly below the 90th percentilelabel =paste0("R² = ", round(r_squared, 2)) ),inherit.aes =FALSE, hjust =1, vjust =1, size =3, color ="black" )
panel_chart
`geom_smooth()` using formula = 'y ~ x'
Now that we have the panel chart we want, let’s make it IMF format!
# Load required librarieslibrary(tidyverse)library(here)library(ggplot2)library(haven)# Load custom IMF themes and colorssource(here("utils/theme_and_colors_IMF.R"))source(here("utils/Add_text_to_figure_panel.R"))# Import the WEO dataweo <-read_dta(here::here("databases/WEOApr2023Pub.dta"))# Define the selected countriesselected_countries <-c("Euro area", "Japan", "United Kingdom", "Canada", "Australia", "New Zealand", "Switzerland", "Sweden", "Norway")# Filter and preprocess the dataweo_filtered <- weo %>%filter( country %in% selected_countries, !is.na(year) & year >=1990& year <=2024, # Filter for years 1990–2024!is.na(ngdppch), # Filter non-missing GDP growth!is.na(le_pch) # Filter non-missing employment growth ) %>%select(country, year, ngdppch, le_pch)# Ensure the filtered data is not emptyif (nrow(weo_filtered) ==0) {stop("No data available for the specified countries and year range.")}# Add regression stats to each panelweo_with_regression <- weo_filtered %>%group_by(country) %>%summarize(regression =list(lm(le_pch ~ ngdppch, data =cur_data())),r_squared =summary(lm(le_pch ~ ngdppch, data =cur_data()))$r.squared,coef_intercept =coef(lm(le_pch ~ ngdppch, data =cur_data()))[1],coef_slope =coef(lm(le_pch ~ ngdppch, data =cur_data()))[2],x_position =min(ngdppch, na.rm =TRUE), # Place at the minimum x valuey_position =max(le_pch, na.rm =TRUE) # Place at the maximum y value ) %>%ungroup()# Merge R² and regression coefficients back into the main data for annotationweo_annotated <- weo_filtered %>%left_join(weo_with_regression %>%select(country, r_squared, coef_intercept, coef_slope, x_position, y_position), by ="country")# Create the panel chart with regression lines, R², and equationfig_gdp_vs_employment <-ggplot(weo_annotated, aes(x = ngdppch, y = le_pch)) +geom_point(shape =21, fill = blue, color = blue, alpha =0.6, size =2.5) +# Smaller scatter plot points in IMF bluegeom_smooth(method ="lm", se =FALSE, color = blue, linetype ="dashed", linewidth =1) +# Dashed regression line in IMF bluefacet_wrap(~ country, scales ="free") +# One panel per countrylabs(title ="Relationship Between GDP Growth and Employment Growth",subtitle ="(1990–2024, In Percent)",x ="GDP Growth (%)",y ="Employment Growth (%)",caption ="Sources: IMF WEO April 2023 database." ) +theme_imf_panel() +# Use IMF panel theme for consistent styletheme(plot.caption =element_text(size =8, hjust =0),plot.title =element_text(size =16, face ="bold", hjust =0, color = blue), # Title in IMF blueplot.subtitle =element_text(size =12, hjust =0, color = blue), # Subtitle in IMF bluestrip.text =element_text(size =12, face ="bold", hjust =0, color = blue), # Subplot titles in IMF blueaxis.text.x =element_text(angle =0, hjust =0.5), # Horizontal x-axis labelsstrip.background =element_blank(), # Remove background for facet labelspanel.spacing =unit(1, "lines"), # Add spacing between panelspanel.border =element_blank(), # Remove aggressive black borderslegend.position ="none"# Remove legend ) +# Add regression equation and R² as text annotationsgeom_text(data = weo_with_regression, aes(x = x_position, # Place at the minimum x value (left)y = y_position, # Place at the maximum y value (top)label =paste0("y = ", round(coef_slope, 2), "x + ", round(coef_intercept, 2), "\n","R² = ", round(r_squared, 2) ) ),inherit.aes =FALSE, hjust =-0.1, vjust =1.1, size =3.5, color ="black" )
fig_gdp_vs_employment
`geom_smooth()` using formula = 'y ~ x'
You have successfully used AI to help you take an existing R code and dataset and create your own chart panel.