library(haven)
library(dplyr)
library(here)
Basic charts in Base R
In R, there are multiple systems for creating graphs, each with its own strengths. Base R provide a straightforward approach to creating various plots without additional packages. Mastering base R plotting functions equips you with foundational skills for effective data visualization. In this section, we will discuss how to make some basic charts in base R. Please load the following libraries:
We will use some WEO data for the US that we have stored in a rds file–a file format is used to store a dataframe on disk. We read it with the readRDS
command
<-readRDS(here("databases/weo_usa.rds")) weo_usa
We look at the last part of the data frame with the tail command. The file contains real GDP (ngdp_rpch), unemployment (lur) and cpi inflation (pcpi_pch)
tail(weo_usa)
# A tibble: 6 × 4
year ngdp_rpch lur pcpi_pch
<dbl> <dbl> <dbl> <dbl>
1 2023 1.58 3.83 4.52
2 2024 1.06 4.92 2.32
3 2025 1.76 4.80 2.10
4 2026 2.05 4.30 2.00
5 2027 2.12 4.10 2.01
6 2028 2.12 4.06 2.09
Line Charts
We will start with a line chart of US unemployment from the year 1980.
# Using the same sample data as barplot
# Filter the data to include only years >= 1980
<- subset(weo_usa, year >= 1980)
weo_usa
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment", xlab = "", ylab = "Percent", col = "red")
Note that the y-axis labels have the wrong orientation. We can fix this by adding las=1
:
# Using the same sample data as barplot
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment",
xlab = "", ylab = "Percent", col = "red", las=1)
If we want to add a second line, we use the lines
command
# Using the same sample data as barplot
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment", xlab = "",
ylab = "Percent", col = "red", las=1)
lines(weo_usa$year,weo_usa$pcpi_pch,col="blue")
The range of the y-axis is too small. One solution is to fix it by hand, using the ylim
command:
# Using the same sample data as barplot
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment", xlab = "",
ylab = "Percent", col = "red", las=1, ylim=c(0,13))
lines(weo_usa$year,weo_usa$pcpi_pch,col="blue")
A better solution is to have R find the appropriate limits by using the range
command:
# Using the same sample data as barplot
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment", xlab = "",
ylab = "Percent", col = "red", las=1, ylim=range(weo_usa$lur,weo_usa$pcpi_pch))
lines(weo_usa$year,weo_usa$pcpi_pch,col="blue")
Let’s add a legend. The bty
parameter indicates whether or not to have a box around the legend.
# Using the same sample data as barplot
# Create a line chart
plot(weo_usa$year,weo_usa$lur, type = "l", main = "US: Unemployment", xlab = "",
ylab = "Percent", col = "red", las=1, ylim=range(weo_usa$lur,weo_usa$pcpi_pch))
lines(weo_usa$year,weo_usa$pcpi_pch,col="blue")
legend(x = "topleft", bty = "n",
legend=c("Unemployment", "Inflation"),
fill = c("red","blue"))
Scatter Plots
Scatter plots are made the same way as line charts, but instead of a data variable we have another variable on the x-axis. For example, let’s make a scatter plot of unemployment versus inflation:
# Create a scatter plot
plot(weo_usa$lur,weo_usa$pcpi_pch, main = "US: Unemployment rate and inflation",
xlab = "Unemployment rate", ylab = "Inflation", pch = 16, col = "navy",las=1)
The link is not very strong. We can add a regression line:
# Create a scatter plot
plot(weo_usa$lur,weo_usa$pcpi_pch, main = "US: Unemployment rate and inflation", xlab = "Unemployment rate", ylab = "Inflation", pch = 16, col = "navy",las=1)
<-lm(weo_usa$pcpi_pch~weo_usa$lur)
eqabline(eq)
Bar Charts
Bar charts are a bit tricky. You need to specific two arguments: height
and names.arg
.
From the documentation: height
is either a vector or matrix of values describing the bars which make up the plot. If height is a vector, the plot consists of a sequence of rectangular bars with heights given by the values in the vector. If height is a matrix and beside is FALSE then each bar of the plot corresponds to a column of height, with the values in the column giving the heights of stacked sub-bars making up the bar. If height is a matrix and beside is TRUE, then the values in each column are juxtaposed rather than stacked.
Bar Chart of one variable
Let’s first make a bar chart of one variable. In this case height
should be a vector:
# Using historical GDP of the United States as sample data
# Create a bar chart
barplot(height = as.vector(weo_usa$lur), names.arg = weo_usa$year,
main = "US: Unemployment rate", ylab = "Percent", col = "lightblue", las=1)
box()
box()
adds a box around the plot area.
Bar chart of two variables
We now make a bar chart of two variables. We will make a chart of the “misery rate”–the sum of the unemploymenet rate and inflation.
We select the two variables and change it into a matrix, using the as.matrix()
function. Note that in the barplot
function we need to transpose this matrix, using t()
.
# Using historical GDP of the United States as sample data
# Create a matrix from the selected columns of the data frame
<- as.matrix(weo_usa[, c("lur", "pcpi_pch")])
df_matrix
# Create a bar chart
barplot(height = t(df_matrix), names.arg = weo_usa$year,
main = "US: Misery rate", ylab = "Percent", col = c("red", "lightblue"), las = 1)
# Add the legend at the top right of the chart
legend("topright", legend = c("Unemployment", "Inflation"), fill = c("red", "lightblue"),bty = "n")
box()