This tutorial will teach you how to query large datasets from the IMF’s PortWatch platform using R and the ArcGIS REST API. It was designed alongside members of the IMF’s PortWatch team Mario Saraiva and Alessandra Sozzi.
Setup
Start by loading the required packages:
library(httr)library(jsonlite)library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(glue)library(data.table)
Attaching package: 'data.table'
The following objects are masked from 'package:dplyr':
between, first, last
Define Available Datasets
PortWatch datasets are hosted on ArcGIS Online. Each dataset has a unique service name. It is important to explore and specify the datasets to query. In our case, we will use PortWatch’s Daily_Trade_Data (ports) and Daily_Chokepoints_Data (key maritime passages).
Each dataset has a unique name that is inserted into a standard REST API URL structure. Instead of hardcoding URLs, we define a flexible function that takes a dataset name and returns the corresponding API endpoint.
# Function to compose dataset URLget_api_url <-function(dataset) { base <-glue("https://services9.arcgis.com/weJ1QsnbMYJlCHdG/arcgis/rest/services/{dataset}/FeatureServer/0/query")return(base)}
For example, if I wanted to retrieve the URL for the Daily Trade Data, I would do it in the following manner:
To retrieve data from the ArcGIS API, we make repeated GET requests with specific query parameters (like filters and output fields). Instead of repeating the same request logic multiple times, we define a wrapper function around httr::GET() that sends the request and parses the response into a JSON object.
This function simplifies the rest of our workflow by abstracting away the raw HTTP logic:
# Function to make API requests and increment resultOffsetget_api_data <-function(url, params) { response <-GET(url, query = params) response1 <-fromJSON(rawToChar(response$content))return(response1)}
This helper allows us to pass any combination of where, outFields, and parameters to the API and return a result ready for processing.
Creating query_portwatch()
Given the millions of records in ArcGIS Feature Servie, we build a core function that handles large-scale data extraction from Portwatch, the query_portwatch() function. This handles data extraction in three main steps:
It first queries the API to get the total number of available records.
Then, it loops through the data in batches (up to 5,000 records at a time), appending each batch to a unified result.
Lastly, it returns a cleaned data.table with parsed date values.
# Function to query PortWatch dataquery_portwatch <-function(url, where ="1=1", maxRecordCountFactor =5, outFields ="*") { batch_size <- maxRecordCountFactor *1000# Step 1: Get total record count params_initial <-list(where = where, returnCountOnly =TRUE, f ="json") total_records <-GET(url, query = params_initial) %>%content("parsed") %>% .$countprint(paste0("Begin extraction of ", total_records, " records..."))# Prepare to store results all_results <-list() params_query <-list(where = where, outFields = outFields, f ="json", maxRecordCountFactor = maxRecordCountFactor)# Step 2: Batch fetchfor (offset inseq(0, total_records, by = batch_size)) {print(paste0("Extracting batch from record ", offset, "...")) params_query$resultOffset <- offset result <-get_api_data(url, params_query)print('Length of result$features:')print(length(result$features))if (length(result$features) >0) { df_batch <-as.data.frame(result$features$attributes) all_results[[length(all_results) +1]] <- df_batch } else {break }Sys.sleep(1) } final_df <-rbindlist(all_results, fill =TRUE)if ("date"%in%colnames(final_df)) { final_df$date <-as.POSIXct(final_df$date /1000, origin ="1970-01-01", tz ="UTC") final_df <- final_df %>%arrange(date) }return(final_df)}
You can now use this function to extract data from any PortWatch-compatible ArcGIS service with just a few lines of code.
Query Examples
We will now show you how to use query_portwatch() with real-world filters to retrieve data from the PortWatch API. Each example corresponds to a typical use case: chokepoint activity, individual port traffic, or country-level trade data.
Query a Chokepoint (e.g., Suez Canal)
The Daily_Chokepoints_Data dataset includes maritime chokepoints such as the Suez Canal. You can query by portid, using values like 'chokepoint1', 'chokepoint2', etc. This will return daily vessel activity and trade volume passing through the Suez Canal.
You can target a specific commercial port using the Daily_Trade_Data dataset. Each port has a unique portid identifier — in this case, 'port1114' for Rotterdam. This will return time series data for vessel calls, imports, and exports for Rotterdam.
# 2. Query Rotterdam port (port1114)port1114 <-query_portwatch(url =get_api_url(ports.url),where ="portid='port1114'")
[1] "Begin extraction of 2349 records..."
[1] "Extracting batch from record 0..."
[1] "Length of result$features:"
[1] 1
You can extract trade data for all ports in a specific country using the ISO3 country code (e.g., "USA" for the United States). To keep results focused, this example limits the output to four key fields. This will return daily totals of port calls, imports, and exports across U.S. ports.
# 3. Query all ports in the USA with select fieldsus_ports <-query_portwatch(url =get_api_url(ports.url),outFields ="date,portcalls,import,export",where ="ISO3='USA'")
[1] "Begin extraction of 258390 records..."
[1] "Extracting batch from record 0..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 5000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 10000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 15000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 20000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 25000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 30000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 35000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 40000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 45000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 50000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 55000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 60000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 65000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 70000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 75000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 80000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 85000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 90000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 95000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 1e+05..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 105000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 110000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 115000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 120000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 125000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 130000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 135000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 140000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 145000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 150000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 155000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 160000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 165000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 170000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 175000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 180000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 185000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 190000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 195000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 2e+05..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 205000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 210000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 215000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 220000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 225000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 230000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 235000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 240000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 245000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 250000..."
[1] "Length of result$features:"
[1] 1
[1] "Extracting batch from record 255000..."
[1] "Length of result$features:"
[1] 1
To customize your queries, it’s helpful to inspect the available field names for the dataset. This ensures your where clauses match real column names.
#Inspect available fields to construct your own filtersmeta_url <-"https://services9.arcgis.com/weJ1QsnbMYJlCHdG/arcgis/rest/services/Daily_Trade_Data/FeatureServer/0?f=json"meta_resp <-GET(meta_url)fields <-fromJSON(content(meta_resp, as ="text", encoding ="UTF-8"))# List available field namesfield_names <- fields$fields$nameprint(field_names)
Here are a few filter examples you can plug into your query_portwatch() call:
where ="country = 'China'"# Filter by country namewhere ="portid = 'port1207'"# Filter by specific portwhere ="ISO3 = 'BRA'"# Filter by ISO3 country codewhere ="portcalls > 10"# Filter by numeric thresholdwhere ="year = 2024 AND ISO3 = 'IND'"# Combine conditions
You are now equipped with creating your own custom queries in PortWatch.