Self employment and GDP per worker

Introduction

This tutorial explores the link between self-employment and labor productivity, using GDP per worker data from the World Bank. We’ll process the data, build some charts, and discuss what we find.

Load Necessary Libraries

First, we need to load the necessary libraries. These include wbstats for accessing World Bank data, dplyr and tidyr for data manipulation, and ggplot2 and ggrepel for data visualization.

library(wbstats)
library(dplyr)
library(tidyr)
library(ggplot2)
library(ggrepel)
library(here)
source(here("utils/theme_and_colors_IMF.R"))

Load and Process Data

Self-Employment Data

We start by loading the self-employment data using the wb_data function from the wbstats package. We filter the data to include years from 1990 to 2022 and rename columns for clarity.

self_employment <- wb_data(indicator = "SL.EMP.SELF.ZS",country = "all") 

self_employment_data <- self_employment %>%
  filter(date %in% 1990:2022) %>%
  rename(year = date, self_employment = SL.EMP.SELF.ZS) %>%
  filter(iso3c %in% c("EAS", "LCN", "SSF", "HIC", "CEB","USA")) %>%
  select(iso3c,year,self_employment)



self_employment_data$year <- as.numeric(self_employment_data$year)

head(self_employment_data)
# A tibble: 6 × 3
  iso3c  year self_employment
  <chr> <dbl>           <dbl>
1 CEB    2022            17.0
2 CEB    2021            17.4
3 CEB    2020            18.1
4 CEB    2019            17.9
5 CEB    2018            18.3
6 CEB    2017            18.6

GDP Per Worker Data

Next, we load the GDP per worker (labor productivity) data, filter it for the same years, and rename the columns.

gdp_per_worker <- wb_data(indicator = "SL.GDP.PCAP.EM.KD",country="all")

gdp_data <- gdp_per_worker %>%
  filter(date %in% 1990:2022) %>%
  rename(gdp_per_worker = SL.GDP.PCAP.EM.KD, year = date) %>%
   filter(iso3c %in% c("EAS", "LCN", "SSF", "HIC", "CEB","USA")) %>%
  select(iso3c,year,gdp_per_worker)

gdp_data$year <- as.numeric(gdp_data$year)

head(gdp_data)
# A tibble: 6 × 3
  iso3c  year gdp_per_worker
  <chr> <dbl>          <dbl>
1 CEB    1990            NA 
2 CEB    1991         38474.
3 CEB    1992         36816.
4 CEB    1993         37593.
5 CEB    1994         38969.
6 CEB    1995         41129.

Merge and Process Data

We merge the self-employment and GDP per worker data into a single data frame and adjust the GDP values to be in thousands.

merged_data <- merge(gdp_data, self_employment_data, by = c("iso3c", "year"))
merged_data$gdp_per_worker <- merged_data$gdp_per_worker / 1000

selected_data <- merged_data %>% select(iso3c, year, gdp_per_worker, self_employment)

head(selected_data)
  iso3c year gdp_per_worker self_employment
1   CEB 1990             NA              NA
2   CEB 1991       38.47363        23.11941
3   CEB 1992       36.81588        24.48385
4   CEB 1993       37.59328        24.90722
5   CEB 1994       38.96940        25.02134
6   CEB 1995       41.12927        24.88043

Adjust GDP Values Relative to the USA

We pivot the GDP data to a wide format, adjust the values relative to the USA, and clean up the column names.

adjust_values <- function(df) {
  df %>%
    rowwise() %>%
    mutate(across(.cols = -c(year, USA), .fns = ~ . / USA, 
                  .names = "relative_{.col}")) %>%
    select(year, starts_with("relative_"))
}

The adjust_values function adjusts the GDP values relative to the USA. It goes row by row, dividing each country’s GDP by the USA’s for that year, and creates new columns with ‘relative_’ in the name. Then it keeps only those relative columns and the year.

Basically, it takes the original data with GDP for different countries over years, and for each year, expresses each country’s GDP as a ratio to the USA’s. This helps compare productivity levels relative to the US.

Applying the Function

gdp_wide <- selected_data %>%
  select(iso3c, year, gdp_per_worker) %>%
  pivot_wider(names_from = iso3c, values_from = gdp_per_worker)

adjusted_gdp <- adjust_values(gdp_wide)

clean_gdp <- adjusted_gdp %>%
  rename_with(.fn = ~ gsub("relative_", "", .x), .cols = starts_with("relative_"))

head(clean_gdp)
# A tibble: 6 × 5
# Rowwise: 
   year    CEB    EAS    LCN     SSF
  <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
1  1990 NA     NA     NA     NA     
2  1991  0.417  0.129  0.370  0.107 
3  1992  0.389  0.130  0.360  0.101 
4  1993  0.393  0.135  0.362  0.0962
5  1994  0.402  0.139  0.368  0.0927
6  1995  0.422  0.146  0.364  0.0922

Pivot Data Back to Long Format

We pivot the adjusted GDP data back to a long format and merge it with the self-employment data.

gdp_long <- clean_gdp %>%
  pivot_longer(cols = -year, names_to = "iso3c")

merged_long <- merge(gdp_long, selected_data %>% select(iso3c, year, self_employment), 
                     by = c("year", "iso3c"))
merged_long <- merged_long %>% rename(gdp_per_worker = value)

head(merged_long)
  year iso3c gdp_per_worker self_employment
1 1990   CEB             NA              NA
2 1990   EAS             NA              NA
3 1990   LCN             NA              NA
4 1990   SSF             NA              NA
5 1991   CEB      0.4171650        23.11941
6 1991   EAS      0.1287891        55.96672

Filter Data for Specific Regions and Years

We filter the data to include specific years for our analysis.

filtered_data <- merged_long %>% 
  filter(year > 1996)

head(filtered_data)
  year iso3c gdp_per_worker self_employment
1 1997   CEB     0.42338281        25.07956
2 1997   EAS     0.15162024        52.51994
3 1997   LCN     0.36549073        38.55800
4 1997   SSF     0.09203405        80.98884
5 1998   CEB     0.42131169        25.27910
6 1998   EAS     0.14610805        52.30169

Define Region Names and Custom Colors

We define long names for the region codes and custom colors for the plot.

iso3c_long_names <- c(CEB = "Central Europe and Baltics", 
                      EAS = "East Asia and Pacific", 
                      LCN = "Latin America and Caribbean", 
                      SSF = "Sub-Saharan Africa", 
                      HIC = "High income countries")
custom_colors <- c(CEB = "grey", EAS = blue, LCN = red, SSF = green, HIC = "black")

Create Plot

We create a scatter plot to visualize the relationship between self-employment and GDP per worker, with labels for specific years.

label_data <- filtered_data %>%
  filter(year %in% c(1997, 2010, 2022))

plot <- ggplot(filtered_data, aes(x = self_employment, y = gdp_per_worker, col = iso3c)) +
  geom_point() +
  scale_y_log10() +
  theme_imf() +
  xlab("Share of self-employment in total employment") +
  geom_text_repel(data = label_data, aes(x = self_employment, y = gdp_per_worker, 
                                         label = year, col = iso3c), size = 3.5, 
                  show.legend = FALSE, box.padding = unit(0.15, "lines"), 
                  point.padding = unit(0.5, "lines"), 
                  segment.color = 'grey50') +
  ylab("Ratio of labor productivity to that of US") +
  ggtitle("Self-employment and labor productivity, 1997-2022") +
  theme_imf_panel() +
  scale_color_manual(values = custom_colors, labels = iso3c_long_names) +
  theme(legend.position = c(0.7, 0.8)) +
  theme(legend.title = element_blank())

print(plot)

Save Plot

Finally, we save the plot as a high-resolution PNG file.

ggsave(plot, filename = here("figures/fund-7--self-empl-lab-prod-reg-90-22.png"), 
       dpi = 600, width = 8.5 * 1.2, height = 5.5 * 1.2)

Conclusion

Here, we’ve examined how self-employment ties into labor productivity. After working with the data and making visualizations, we can see some patterns in how self-employment rates connect to productivity in various regions.