library(wbstats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggrepel)
library(here)
source(here("utils/theme_and_colors_IMF.R"))Self employment and GDP per worker
Introduction
This tutorial explores the link between self-employment and labor productivity, using GDP per worker data from the World Bank. We’ll process the data, build some charts, and discuss what we find.
Load Necessary Libraries
First, we need to load the necessary libraries. These include wbstats for accessing World Bank data, dplyr and tidyr for data manipulation, and ggplot2 and ggrepel for data visualization.
Load and Process Data
Self-Employment Data
We start by loading the self-employment data using the wb_data function from the wbstats package. We filter the data to include years from 1990 to 2022 and rename columns for clarity.
self_employment <- wb_data(indicator = "SL.EMP.SELF.ZS",country = "all")
self_employment_data <- self_employment %>%
filter(date %in% 1990:2022) %>%
rename(year = date, self_employment = SL.EMP.SELF.ZS) %>%
filter(iso3c %in% c("EAS", "LCN", "SSF", "HIC", "CEB","USA")) %>%
select(iso3c,year,self_employment)
self_employment_data$year <- as.numeric(self_employment_data$year)
head(self_employment_data)# A tibble: 6 × 3
iso3c year self_employment
<chr> <dbl> <dbl>
1 CEB 2022 17.0
2 CEB 2021 17.4
3 CEB 2020 18.1
4 CEB 2019 17.9
5 CEB 2018 18.3
6 CEB 2017 18.6
GDP Per Worker Data
Next, we load the GDP per worker (labor productivity) data, filter it for the same years, and rename the columns.
gdp_per_worker <- wb_data(indicator = "SL.GDP.PCAP.EM.KD",country="all")
gdp_data <- gdp_per_worker %>%
filter(date %in% 1990:2022) %>%
rename(gdp_per_worker = SL.GDP.PCAP.EM.KD, year = date) %>%
filter(iso3c %in% c("EAS", "LCN", "SSF", "HIC", "CEB","USA")) %>%
select(iso3c,year,gdp_per_worker)
gdp_data$year <- as.numeric(gdp_data$year)
head(gdp_data)# A tibble: 6 × 3
iso3c year gdp_per_worker
<chr> <dbl> <dbl>
1 CEB 1990 NA
2 CEB 1991 38474.
3 CEB 1992 36816.
4 CEB 1993 37593.
5 CEB 1994 38969.
6 CEB 1995 41129.
Merge and Process Data
We merge the self-employment and GDP per worker data into a single data frame and adjust the GDP values to be in thousands.
merged_data <- merge(gdp_data, self_employment_data, by = c("iso3c", "year"))
merged_data$gdp_per_worker <- merged_data$gdp_per_worker / 1000
selected_data <- merged_data %>% select(iso3c, year, gdp_per_worker, self_employment)
head(selected_data) iso3c year gdp_per_worker self_employment
1 CEB 1990 NA NA
2 CEB 1991 38.47363 23.11941
3 CEB 1992 36.81588 24.48385
4 CEB 1993 37.59328 24.90722
5 CEB 1994 38.96940 25.02134
6 CEB 1995 41.12927 24.88043
Adjust GDP Values Relative to the USA
We pivot the GDP data to a wide format, adjust the values relative to the USA, and clean up the column names.
adjust_values <- function(df) {
df %>%
rowwise() %>%
mutate(across(.cols = -c(year, USA), .fns = ~ . / USA,
.names = "relative_{.col}")) %>%
select(year, starts_with("relative_"))
}The adjust_values function adjusts the GDP values relative to the USA. It goes row by row, dividing each country’s GDP by the USA’s for that year, and creates new columns with ‘relative_’ in the name. Then it keeps only those relative columns and the year.
Basically, it takes the original data with GDP for different countries over years, and for each year, expresses each country’s GDP as a ratio to the USA’s. This helps compare productivity levels relative to the US.
Applying the Function
gdp_wide <- selected_data %>%
select(iso3c, year, gdp_per_worker) %>%
pivot_wider(names_from = iso3c, values_from = gdp_per_worker)
adjusted_gdp <- adjust_values(gdp_wide)
clean_gdp <- adjusted_gdp %>%
rename_with(.fn = ~ gsub("relative_", "", .x), .cols = starts_with("relative_"))
head(clean_gdp)# A tibble: 6 × 5
# Rowwise:
year CEB EAS LCN SSF
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1990 NA NA NA NA
2 1991 0.417 0.129 0.370 0.107
3 1992 0.389 0.130 0.360 0.101
4 1993 0.393 0.135 0.362 0.0962
5 1994 0.402 0.139 0.368 0.0927
6 1995 0.422 0.146 0.364 0.0922
Pivot Data Back to Long Format
We pivot the adjusted GDP data back to a long format and merge it with the self-employment data.
gdp_long <- clean_gdp %>%
pivot_longer(cols = -year, names_to = "iso3c")
merged_long <- merge(gdp_long, selected_data %>% select(iso3c, year, self_employment),
by = c("year", "iso3c"))
merged_long <- merged_long %>% rename(gdp_per_worker = value)
head(merged_long) year iso3c gdp_per_worker self_employment
1 1990 CEB NA NA
2 1990 EAS NA NA
3 1990 LCN NA NA
4 1990 SSF NA NA
5 1991 CEB 0.4171650 23.11941
6 1991 EAS 0.1287891 55.96672
Filter Data for Specific Regions and Years
We filter the data to include specific years for our analysis.
filtered_data <- merged_long %>%
filter(year > 1996)
head(filtered_data) year iso3c gdp_per_worker self_employment
1 1997 CEB 0.42338281 25.07956
2 1997 EAS 0.15162024 52.51994
3 1997 LCN 0.36549073 38.55800
4 1997 SSF 0.09203405 80.98884
5 1998 CEB 0.42131169 25.27910
6 1998 EAS 0.14610805 52.30169
Define Region Names and Custom Colors
We define long names for the region codes and custom colors for the plot.
iso3c_long_names <- c(CEB = "Central Europe and Baltics",
EAS = "East Asia and Pacific",
LCN = "Latin America and Caribbean",
SSF = "Sub-Saharan Africa",
HIC = "High income countries")
custom_colors <- c(CEB = "grey", EAS = blue, LCN = red, SSF = green, HIC = "black")Create Plot
We create a scatter plot to visualize the relationship between self-employment and GDP per worker, with labels for specific years.
label_data <- filtered_data %>%
filter(year %in% c(1997, 2010, 2022))
plot <- ggplot(filtered_data, aes(x = self_employment, y = gdp_per_worker, col = iso3c)) +
geom_point() +
scale_y_log10() +
theme_imf() +
xlab("Share of self-employment in total employment") +
geom_text_repel(data = label_data, aes(x = self_employment, y = gdp_per_worker,
label = year, col = iso3c), size = 3.5,
show.legend = FALSE, box.padding = unit(0.15, "lines"),
point.padding = unit(0.5, "lines"),
segment.color = 'grey50') +
ylab("Ratio of labor productivity to that of US") +
ggtitle("Self-employment and labor productivity, 1997-2022") +
theme_imf_panel() +
scale_color_manual(values = custom_colors, labels = iso3c_long_names) +
theme(legend.position = c(0.7, 0.8)) +
theme(legend.title = element_blank())
print(plot)
Save Plot
Finally, we save the plot as a high-resolution PNG file.
ggsave(plot, filename = here("figures/fund-7--self-empl-lab-prod-reg-90-22.png"),
dpi = 600, width = 8.5 * 1.2, height = 5.5 * 1.2)Conclusion
Here, we’ve examined how self-employment ties into labor productivity. After working with the data and making visualizations, we can see some patterns in how self-employment rates connect to productivity in various regions.