library(ggplot2)
library(haven)
library(countrycode)
library(dplyr)
library(here)
library(ggrepel)
Scatterplots with labels and regression lines
Introduction
In this tutorial, we’ll be creating a scatterplot that shows the relationship between GDP per capita levels in 1994 and the growth of GDP per capita from 1994 to 2019 for countries in the Americas and Europe. We’ll be using the ggplot2
package in R, which allows for more aesthetically pleasing and complex visualizations with less code compared to base R.
Set up Libraries
First, we need to load the necessary libraries for data manipulation (dplyr
), country code conversion (countrycode
), data reading (haven
), and plotting (ggplot2
, ggrepel
).
Custom Theme and Colors
We have a custom script with predefined colors and themes for our plots:
source(here("utils/theme_and_colors_IMF.R"))
Note: The color red
defined in the script is not the same as the default red
in R. In our script, red
is defined as #C00050
, which is a particular shade of red.
Importing the Data
In the section in Base R, we calculated the data. We won’t go over this calculation again, but simply import the data.
<-readRDS(here("databases/convergence.rds"))
growth_data
head(growth_data)
# A tibble: 6 × 5
iso3c continent ypop_1994 ypop_2019 growth
<chr> <chr> <dbl> <dbl> <dbl>
1 ABW Americas 29.3 28.9 -0.0616
2 AIA Americas 11.7 15.0 1.00
3 ALB Europe 3.81 12.9 4.89
4 ARG Americas 17.8 21.8 0.803
5 ATG Americas 12.1 15.6 1.02
6 AUT Europe 37.8 53.2 1.37
Labeling with ggrepel
The ggrepel
package in R is a useful addition to ggplot2
for adding text labels to plots. It helps to prevent overlapping of text by repelling labels away from each other and from the data points. This feature is especially useful in plots with a large number of data points, where labels might otherwise clutter the visualization.
# The ggrepel package provides geom_text_repel, which repels overlapping text labels
library(ggrepel)
Creating the Scatterplot
We will now plot the scatterplot with different regression lines for each continent and add country codes as labels.
ggplot(data = growth_data, aes(x = ypop_1994, y = growth, color = continent)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point() +
scale_x_log10() +
theme_imf() +
xlab("GDP per Capita in 1994 (log scale)") +
ylab("Percentage Growth (1994-2019)") +
ggtitle("GDP per Capita in 1994 and Average Annual GDP per Capita Growth, \n1994-2019") +
scale_colour_manual(values =
c("Americas" = red, "Europe" = blue),
breaks = c("Americas", "Europe")) +
geom_text_repel(
aes(label = iso3c,
color = continent),
size = 3.5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.5, "lines"),
segment.color = 'grey50',
show.legend = FALSE
+
) theme(legend.title = element_blank(),
legend.position = c(0.85, 0.85))
Similarly, this can also be done using the internal bookr package, without the need of loading the utils script. Instead, you would have to load the bookr library, and add the use of the function scale_imf_colors(), for the chart to display colors in line with the IMF theme.
#Loads internal bookr package
library(bookr)
ggplot(data = growth_data, aes(x = ypop_1994, y = growth, color = continent)) +
geom_smooth(method = "lm", se = FALSE) +
geom_point() +
scale_x_log10() +
xlab("GDP per Capita in 1994 (log scale)") +
ylab("Percentage Growth (1994-2019)") +
ggtitle("GDP per Capita 1994 & Average per Capita Growth \n1994-2019") +
scale_imf_colors() + #Ensures the plot uses IMF theme colors
geom_text_repel(
aes(label = iso3c,
color = continent),
size = 3.5,
box.padding = unit(0.35, "lines"),
point.padding = unit(0.5, "lines"),
segment.color = 'grey50',
show.legend = FALSE
+
) theme(legend.title = element_blank(),
legend.position = c(0.85, 0.85))