Scatterplots with labels and regression lines

Introduction

In this tutorial, we’ll be creating a scatterplot that shows the relationship between GDP per capita levels in 1994 and the growth of GDP per capita from 1994 to 2019 for countries in the Americas and Europe. We’ll be using the ggplot2 package in R, which allows for more aesthetically pleasing and complex visualizations with less code compared to base R.

Set up Libraries

First, we need to load the necessary libraries for data manipulation (dplyr), country code conversion (countrycode), data reading (haven), and plotting (ggplot2, ggrepel).

library(ggplot2)
library(haven)
library(countrycode)
library(dplyr)
library(here)
library(ggrepel)

Custom Theme and Colors

We have a custom script with predefined colors and themes for our plots:

source(here("utils/theme_and_colors_IMF.R"))

Note: The color red defined in the script is not the same as the default red in R. In our script, red is defined as #C00050, which is a particular shade of red.

Importing the Data

In the section in Base R, we calculated the data. We won’t go over this calculation again, but simply import the data.

growth_data<-readRDS(here("databases/convergence.rds"))

head(growth_data)
# A tibble: 6 × 5
  iso3c continent ypop_1994 ypop_2019  growth
  <chr> <chr>         <dbl>     <dbl>   <dbl>
1 ABW   Americas      29.3       28.9 -0.0616
2 AIA   Americas      11.7       15.0  1.00  
3 ALB   Europe         3.81      12.9  4.89  
4 ARG   Americas      17.8       21.8  0.803 
5 ATG   Americas      12.1       15.6  1.02  
6 AUT   Europe        37.8       53.2  1.37  

Labeling with ggrepel

The ggrepel package in R is a useful addition to ggplot2 for adding text labels to plots. It helps to prevent overlapping of text by repelling labels away from each other and from the data points. This feature is especially useful in plots with a large number of data points, where labels might otherwise clutter the visualization.

# The ggrepel package provides geom_text_repel, which repels overlapping text labels
library(ggrepel)

Creating the Scatterplot

We will now plot the scatterplot with different regression lines for each continent and add country codes as labels.

ggplot(data = growth_data, aes(x = ypop_1994, y = growth, color = continent)) +
  geom_smooth(method = "lm", se = FALSE) +
   geom_point() +
  scale_x_log10() +
  theme_imf() +
  xlab("GDP per Capita in 1994 (log scale)") +
  ylab("Percentage Growth (1994-2019)") +
  ggtitle("GDP per Capita in 1994 and Average Annual GDP per Capita Growth, \n1994-2019") +
  scale_colour_manual(values =

 c("Americas" = red, "Europe" = blue), 
                      breaks = c("Americas", "Europe")) +
  geom_text_repel(
    aes(label = iso3c, 
        color = continent),
    size = 3.5,
    box.padding = unit(0.35, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = 'grey50',
    show.legend = FALSE
  ) +
  theme(legend.title = element_blank(),
        legend.position = c(0.85, 0.85))

Similarly, this can also be done using the internal bookr package, without the need of loading the utils script. Instead, you would have to load the bookr library, and add the use of the function scale_imf_colors(), for the chart to display colors in line with the IMF theme.

#Loads internal bookr package
library(bookr)

ggplot(data = growth_data, aes(x = ypop_1994, y = growth, color = continent)) +
  geom_smooth(method = "lm", se = FALSE) +
   geom_point() +
  scale_x_log10() +
  xlab("GDP per Capita in 1994 (log scale)") +
  ylab("Percentage Growth (1994-2019)") +
  ggtitle("GDP per Capita 1994 & Average per Capita Growth \n1994-2019") +
  scale_imf_colors() + #Ensures the plot uses IMF theme colors
  geom_text_repel(
    aes(label = iso3c, 
        color = continent),
    size = 3.5,
    box.padding = unit(0.35, "lines"),
    point.padding = unit(0.5, "lines"),
    segment.color = 'grey50',
    show.legend = FALSE
  ) +
  theme(legend.title = element_blank(),
        legend.position = c(0.85, 0.85))