Accessing IMF Portwatch Data using an API

Learn how to query large datasets from the IMF’s PortWatch platform using R and the ArcGIS REST API. This tutorial was developed with members of the IMF’s PortWatch team Mario Saraiva and Alessandra Sozzi.

Setup

Start by loading the required packages:

library(httr)
library(jsonlite)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(glue)
library(data.table)

Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':

    between, first, last

Define Available Datasets

PortWatch datasets are hosted on ArcGIS Online. Each dataset has a unique service name. It is important to explore and specify the datasets to query. In our case, we will use PortWatch’s Daily_Trade_Data (ports) and Daily_Chokepoints_Data (key maritime passages).

# Datasets
chokepoints.url <- "Daily_Chokepoints_Data"
ports.url <- "Daily_Ports_Data"

Build Dynamic API URLs

Each dataset has a unique name that is inserted into a standard REST API URL structure. Instead of hardcoding URLs, we define a flexible function that takes a dataset name and returns the corresponding API endpoint.

# Function to compose dataset URL
get_api_url <- function(dataset) {
  base <- glue("https://services9.arcgis.com/weJ1QsnbMYJlCHdG/ArcGIS/rest/services/{dataset}/FeatureServer/0/query")
  return(base)
}

For example, if I wanted to retrieve the URL for the Daily Trade Data, I would do it in the following manner:

get_api_url("Daily_Trade_Data")
https://services9.arcgis.com/weJ1QsnbMYJlCHdG/ArcGIS/rest/services/Daily_Trade_Data/FeatureServer/0/query
# returns:
# "https://services9.arcgis.com/weJ1QsnbMYJlCHdG/arcgis/rest/services/Daily_Trade_Data/FeatureServer/0/query"

ArcGIS services can contain one or more tables of data. When a service contains only a single table, the query endpoint can be constructed directly using the service name, as we did earlier with the chokepoints dataset.

The Daily Ports dataset, however, is organized slightly differently. The service contains a table where the actual port data is stored, and queries must be directed to that table specifically. In this case the relevant table is table 0, which is why the query URL ends with /FeatureServer/0/query.

# Convert the ports dataset name into the correct query endpoint
ports.url <- paste0(
  "https://services9.arcgis.com/weJ1QsnbMYJlCHdG/ArcGIS/rest/services/",
  ports.url,
  "/FeatureServer/0/query"
)

Define API Query Helpers

To retrieve data from the ArcGIS API, we make repeated GET requests with specific query parameters (like filters and output fields). Instead of repeating the same request logic multiple times, we define a wrapper function around httr::GET() that sends the request and parses the response into a JSON object.

This function simplifies the rest of our workflow by abstracting away the raw HTTP logic:

# Function to make API requests and increment resultOffset
get_api_data <- function(url, params) {
  response <- GET(url, query = params)
  response1 <- fromJSON(rawToChar(response$content))
  return(response1)
}

This helper allows us to pass any combination of where, outFields, and parameters to the API and return a result ready for processing.

Creating query_portwatch()

Given the millions of records in ArcGIS Feature Servie, we build a core function that handles large-scale data extraction from Portwatch, the query_portwatch() function. This handles data extraction in three main steps:

  1. It first queries the API to get the total number of available records.

  2. Then, it loops through the data in batches (up to 5,000 records at a time), appending each batch to a unified result.

  3. Lastly, it returns a cleaned data.table with parsed date values.

# Function to query PortWatch data
query_portwatch <- function(url, where = "1=1", maxRecordCountFactor = 5, outFields = "*") {
  
  batch_size <- maxRecordCountFactor * 1000
  
  # Step 1: Get total record count
  params_initial <- list(where = where, returnCountOnly = TRUE, f = "json")
  total_records <- GET(url, query = params_initial) %>%
    content("parsed") %>%
    .$count
  
  print(paste0("Begin extraction of ", total_records, " records..."))
  
  # Prepare to store results
  all_results <- list()
  params_query <- list(where = where, outFields = outFields, f = "json", maxRecordCountFactor = maxRecordCountFactor)
  
  # Step 2: Batch fetch
  for (offset in seq(0, total_records, by = batch_size)) {
    print(paste0("Extracting batch from record ", offset, "..."))
    params_query$resultOffset <- offset
    
    result <- get_api_data(url, params_query)
    print('Length of result$features:')
    print(length(result$features))
    
    if (length(result$features) > 0) {
      df_batch <- as.data.frame(result$features$attributes)
      all_results[[length(all_results) + 1]] <- df_batch
    } else {
      break
    }
    Sys.sleep(1)
  }
  
  final_df <- rbindlist(all_results, fill = TRUE)
  
  if ("date" %in% colnames(final_df)) {

  if (is.numeric(final_df$date)) {
    final_df$date <- as.POSIXct(final_df$date / 1000, origin = "1970-01-01", tz = "UTC")
  } else {
    final_df$date <- as.POSIXct(final_df$date, tz = "UTC")
  }

  final_df <- final_df %>% arrange(date)
}
  
  return(final_df)
}

You can now use this function to extract data from any PortWatch-compatible ArcGIS service with just a few lines of code.

Query Examples

We will now show you how to use query_portwatch() with real-world filters to retrieve data from the PortWatch API. Each example corresponds to a typical use case: chokepoint activity, individual port traffic, or country-level trade data.

Query a Chokepoint (e.g., Suez Canal)

The Daily_Chokepoints_Data dataset includes maritime chokepoints such as the Suez Canal. You can query by portid, using values like 'chokepoint1', 'chokepoint2', etc. This will return daily vessel activity and trade volume passing through the Suez Canal.

# 1. Query Suez Canal chokepoint
ck1 <- query_portwatch(
  url = get_api_url(chokepoints.url),
  where = "portid='chokepoint1'"
)
[1] "Begin extraction of 2617 records..."
[1] "Extracting batch from record 0..."
[1] "Length of result$features:"
[1] 1
head(ck1)
         date  year month   day      portid   portname n_container n_dry_bulk
       <POSc> <int> <int> <int>      <char>     <char>       <int>      <int>
1: 2019-01-01  2019     1     1 chokepoint1 Suez Canal          15         15
2: 2019-01-02  2019     1     2 chokepoint1 Suez Canal          24          4
3: 2019-01-03  2019     1     3 chokepoint1 Suez Canal          13         14
4: 2019-01-04  2019     1     4 chokepoint1 Suez Canal          17         11
5: 2019-01-05  2019     1     5 chokepoint1 Suez Canal          20          9
6: 2019-01-06  2019     1     6 chokepoint1 Suez Canal          13         12
   n_general_cargo n_roro n_tanker n_cargo n_total capacity_container
             <int>  <int>    <int>   <int>   <int>              <int>
1:               7      4       18      41      59            1018466
2:               6      6       10      40      50            1534924
3:               9      2       23      38      61             555097
4:               2      2       14      32      46            1044184
5:               1      1       13      31      44            1218925
6:               6      0       18      31      49             832088
   capacity_dry_bulk capacity_general_cargo capacity_roro capacity_tanker
               <int>                  <int>         <int>           <int>
1:            618654                 111545         21900         1279775
2:            227315                   8536         37934          518306
3:            977689                  39398         10829          804257
4:            430139                   3673          3430          759384
5:            777548                   4587          5567          628481
6:            664305                  27559             0          314955
   capacity_cargo capacity ObjectId
            <int>    <int>    <int>
1:        1770567  3050342        1
2:        1808711  2327017        2
3:        1583014  2387272        3
4:        1481428  2240812        4
5:        2006627  2635108        5
6:        1523954  1838909        6

Query a Specific Port (e.g., Rotterdam)

You can target a specific commercial port using the Daily_Trade_Data dataset. Each port has a unique portid identifier — in this case, 'port1114' for Rotterdam. This will return time series data for vessel calls, imports, and exports for Rotterdam.

# 2. Query Rotterdam port (port1114)
port1114 <- query_portwatch(
  url = ports.url,
  where = "portid='port1114'"
)
[1] "Begin extraction of 2615 records..."
[1] "Extracting batch from record 0..."
[1] "Length of result$features:"
[1] 1
head(port1114)
         date  year month   day   portid  portname         country   ISO3
       <POSc> <int> <int> <int>   <char>    <char>          <char> <char>
1: 2019-01-01  2019     1     1 port1114 Rotterdam The Netherlands    NLD
2: 2019-01-02  2019     1     2 port1114 Rotterdam The Netherlands    NLD
3: 2019-01-03  2019     1     3 port1114 Rotterdam The Netherlands    NLD
4: 2019-01-04  2019     1     4 port1114 Rotterdam The Netherlands    NLD
5: 2019-01-05  2019     1     5 port1114 Rotterdam The Netherlands    NLD
6: 2019-01-06  2019     1     6 port1114 Rotterdam The Netherlands    NLD
   portcalls_container portcalls_dry_bulk portcalls_general_cargo
                 <int>              <int>                   <int>
1:                  19                  1                      15
2:                  12                  0                      22
3:                  27                  1                      13
4:                  20                  2                      14
5:                  18                  4                       7
6:                  18                  3                       9
   portcalls_roro portcalls_tanker portcalls_cargo portcalls import_container
            <int>            <int>           <int>     <int>            <int>
1:              1               45              36        81           221013
2:              1               42              35        77           294931
3:              8               49              49        98           259714
4:              7               46              43        89           161523
5:              4               59              33        92           135492
6:              4               44              34        78            58813
   import_dry_bulk import_general_cargo import_roro import_tanker import_cargo
             <int>                <int>       <int>         <int>        <int>
1:               0                11581           0        446862       232594
2:               0                20636         403        362018       315971
3:           29763                 3242        2212        646548       294931
4:           84292                 5530        2097        267817       253443
5:          258645                 4619        4074        984345       402832
6:           55217                 5985           0        355189       120016
    import export_container export_dry_bulk export_general_cargo export_roro
     <int>            <int>           <int>                <int>       <int>
1:  679456           122748               0                 9269        2058
2:  677990            46391               0                 9354           0
3:  941480           159137               0                 7866        5521
4:  521260           142940               0                 5948        1194
5: 1387177           179678               0                 1076        3582
6:  475206           157174            1247                 3199        2171
   export_tanker export_cargo export ObjectId
           <int>        <int>  <int>    <int>
1:         57778       134077 191855   352826
2:         57689        55745 113434   352827
3:         92597       172525 265122   352828
4:        125718       150083 275801   352829
5:        158452       184337 342789   352830
6:        148448       163791 312240   352831

View the full list of portids here.

Query All U.S. Ports (with selected fields)

You can extract trade data for all ports in a specific country using the ISO3 country code (e.g., "USA" for the United States). To keep results focused, this example limits the output to four key fields. This will return daily totals of port calls, imports, and exports across U.S. ports.

# 3. Query all ports in the USA with select fields
us_ports <- query_portwatch(
  url =ports.url,
  outFields = "date,portcalls,import,export",
  where = "ISO3='USA'"
)
head(us_ports)

Creating your own filters with field names

To customize your queries, it’s helpful to inspect the available field names for the dataset. This ensures your where clauses match real column names.

#Inspect available fields to construct your own filters
meta_url <- "https://services9.arcgis.com/weJ1QsnbMYJlCHdG/ArcGIS/rest/services/Daily_Ports_Data/FeatureServer/0?f=json"
meta_resp <- GET(meta_url)
fields <- fromJSON(content(meta_resp, as = "text", encoding = "UTF-8"))

# List available field names
field_names <- fields$fields$name
print(field_names)
 [1] "date"                    "year"                   
 [3] "month"                   "day"                    
 [5] "portid"                  "portname"               
 [7] "country"                 "ISO3"                   
 [9] "portcalls_container"     "portcalls_dry_bulk"     
[11] "portcalls_general_cargo" "portcalls_roro"         
[13] "portcalls_tanker"        "portcalls_cargo"        
[15] "portcalls"               "import_container"       
[17] "import_dry_bulk"         "import_general_cargo"   
[19] "import_roro"             "import_tanker"          
[21] "import_cargo"            "import"                 
[23] "export_container"        "export_dry_bulk"        
[25] "export_general_cargo"    "export_roro"            
[27] "export_tanker"           "export_cargo"           
[29] "export"                  "ObjectId"               

Here are a few filter examples you can plug into your query_portwatch() call:

where = "country = 'China'"           # Filter by country name
where = "portid = 'port1207'"         # Filter by specific port
where = "ISO3 = 'BRA'"                # Filter by ISO3 country code
where = "portcalls > 10"              # Filter by numeric threshold
where = "year = 2024 AND ISO3 = 'IND'" # Combine conditions

You’re now ready to create your own custom queries in PortWatch.