Reading Excel Data in R

Introduction

Reading Excel files into R is a common task for data analysis. In this tutorial, we will explore how to read Excel data into R using various libraries. We will use selected Penn World Table data between 1994 and 2019 as an example.

Prerequisites

Before we begin, make sure you have the necessary packages installed and loaded. The main packages used for reading Excel files in R are readxl, openxlsx, and XLConnect. Each has its strengths and specific use cases.

Loading the Packages

Load the following packages:

library(readxl)
library(openxlsx)
library(here)

If you have not have done so, please install them first.

The Data

We have an Excel file with selected Penn World Table data between 1994 and 2019:

The data are in the file “databases/penn_excel_sample.xlsx”, in sheet “Sheet1”.

Reading Excel Data with Different Libraries

Using readxl

The readxl package is simple and easy to use for reading Excel files. It supports both .xls and .xlsx formats.

# Read the data from the specified sheet using readxl
pwt_data_readxl <- read_excel(here( "databases/penn_excel_sample.xlsx"),
                              sheet = "Sheet 1")

# Display the data
head(pwt_data_readxl)
# A tibble: 6 × 10
   year rgdpna_BRA rgdpna_MEX rgdpna_USA emp_BRA emp_MEX emp_USA pop_BRA pop_MEX
  <dbl>      <dbl>      <dbl>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
1  1994   1742187.   1387994    11150584    62.5    31.7    126.    159.    90.2
2  1995   1815770.   1300672    11449898    62.9    32.0    127.    162.    91.7
3  1996   1854818.   1388770    11881846    61.8    33.1    129.    165.    93.1
4  1997   1917423    1483857    12410257    62.8    33.9    132.    167.    94.6
5  1998   1918101.   1560482.   12966412    62.6    35.2    135.    170.    96.1
6  1999   1922974.   1603451    13582736    65.9    36.6    137.    172.    97.5
# ℹ 1 more variable: pop_USA <dbl>

Using openxlsx

The openxlsx package provides more advanced options for reading and writing Excel files. It is particularly useful if you need to style your Excel output.

# Read the data from the specified sheet using openxlsx
pwt_data_openxlsx <- read.xlsx(here( "databases/penn_excel_sample.xlsx"), 
                               sheet = "Sheet 1")

# Display the data
head(pwt_data_openxlsx)
  year rgdpna_BRA rgdpna_MEX rgdpna_USA  emp_BRA  emp_MEX  emp_USA  pop_BRA
1 1994    1742187    1387994   11150584 62.45842 31.70289 125.6900 159.4327
2 1995    1815770    1300672   11449898 62.93206 32.00495 127.4627 162.0199
3 1996    1854818    1388770   11881846 61.78389 33.09135 129.3577 164.6147
4 1997    1917423    1483857   12410257 62.79587 33.92621 132.3626 167.2090
5 1998    1918101    1560482   12966412 62.61137 35.24316 134.5009 169.7852
6 1999    1922974    1603451   13582736 65.88960 36.63657 136.7565 172.3187
   pop_MEX  pop_USA
1 90.15640 262.2412
2 91.66329 265.1637
3 93.14704 268.3350
4 94.61100 271.7136
5 96.05632 275.1753
6 97.48483 278.5482

Comparison of Libraries

  • readxl: This package is best for straightforward reading of Excel files. It is lightweight and fast but does not support writing to Excel files.

  • openxlsx: This package is more versatile, supporting both reading and writing Excel files. It also allows for advanced styling options.

Conclusion

In this tutorial, we have covered how to read Excel data into R using the readxl, openxlsx, and XLConnect packages. Each package has its strengths and can be chosen based on your specific needs for reading and manipulating Excel files in R.