nycOpenData: A unified R interface to NYC Open Data APIs
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Guest post by Christian Martinez, developer of the nycOpenData package in R.
nycOpenData: A unified R interface to NYC Open Data APIs
I am pleased to announce the release of nycOpenData, an R package providing convenient, tidy access to dozens of datasets from the New York City Open Data platform.
The package is designed as part of an open-science and reproducible-research effort, with the goal of lowering the friction between public data and statistical analysis—especially for teaching, exploratory research, and applied civic work.
Why nycOpenData?
NYC Open Data hosts hundreds of datasets covering topics such as public safety, housing, transportation, education, health, and city services. While these datasets are publicly accessible through the Socrata API, working with them directly often requires:
- knowing dataset identifiers,
- manually constructing API queries,
- handling pagination, timeouts, and rate limits,
- and performing repetitive data-cleaning steps.
These barriers can slow down exploratory analysis and make public data less accessible to students, researchers, and practitioners who primarily work in R.
nycOpenData was built to remove these obstacles by providing a consistent, user-friendly interface that returns clean tibbles ready for analysis—without requiring users to interact directly with the API.
What does the package do?
The package provides a growing collection of wrapper functions, each corresponding to a specific NYC Open Data dataset or dataset family. All functions follow a shared design pattern and support:
- row limits,
- optional filtering via named lists,
- sorting,
- and graceful handling of API errors and timeouts.
Examples of currently supported domains include:
- 311 service requests
- Transportation and for-hire vehicles
- Motor vehicle collisions
- Department of Buildings permits and complaints
- Education and school reporting
- Juvenile justice and public safety
- Street trees and environmental data
- Permitted events (historical)
A typical call looks like this:
library(nycOpenData) nyc_311( limit = 1000, filters = list(borough = "BROOKLYN") ) ## # A tibble: 1,000 × 40 ## unique_key created_date agency agency_name complaint_type descriptor ## <chr> <chr> <chr> <chr> <chr> <chr> ## 1 67613985 2026-01-26T02:06:05.… NYPD New York C… Noise - Resid… Banging/P… ## 2 67609553 2026-01-26T02:02:09.… NYPD New York C… Noise - Resid… Banging/P… ## 3 67610990 2026-01-26T01:58:58.… NYPD New York C… Illegal Parki… Blocked H… ## 4 67615428 2026-01-26T01:56:49.… NYPD New York C… Noise - Resid… Banging/P… ## 5 67609568 2026-01-26T01:48:16.… NYPD New York C… Noise - Resid… Loud Musi… ## 6 67612476 2026-01-26T01:47:10.… NYPD New York C… Noise - Resid… Loud Musi… ## 7 67614152 2026-01-26T01:46:26.… DSNY Department… Snow or Ice Snow Trac… ## 8 67614054 2026-01-26T01:44:50.… DSNY Department… Dirty Conditi… Trash ## 9 67606570 2026-01-26T01:41:32.… NYPD New York C… Noise - Resid… Banging/P… ## 10 67610091 2026-01-26T01:35:51.… NYPD New York C… Noise - Vehic… Car/Truck… ## # ℹ 990 more rows ## # ℹ 34 more variables: location_type <chr>, incident_zip <chr>, ## # incident_address <chr>, street_name <chr>, cross_street_1 <chr>, ## # cross_street_2 <chr>, intersection_street_1 <chr>, ## # intersection_street_2 <chr>, address_type <chr>, city <chr>, ## # landmark <chr>, status <chr>, community_board <chr>, ## # council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>, …
The result is returned as a tidy tibble of the 1,000 most recent NYC 311 requests, making it immediately compatible with the tidyverse ecosystem for visualization, modeling, and reporting.
Mini analysis
One of the strongest qualities this function has is its ability to filter based on multiple columns. Let’s put everything together and get a dataset of the last 1,000 311 requests from the New York Police Department in Brooklyn.
# Creating the dataset brooklyn_nypd <- nyc_311(limit = 1000, filters = list(agency = "NYPD", borough = "BROOKLYN")) # Calling head of our new dataset head(brooklyn_nypd) ## # A tibble: 6 × 39 ## unique_key created_date agency agency_name complaint_type descriptor ## <chr> <chr> <chr> <chr> <chr> <chr> ## 1 67613985 2026-01-26T02:06:05.0… NYPD New York C… Noise - Resid… Banging/P… ## 2 67609553 2026-01-26T02:02:09.0… NYPD New York C… Noise - Resid… Banging/P… ## 3 67610990 2026-01-26T01:58:58.0… NYPD New York C… Illegal Parki… Blocked H… ## 4 67615428 2026-01-26T01:56:49.0… NYPD New York C… Noise - Resid… Banging/P… ## 5 67609568 2026-01-26T01:48:16.0… NYPD New York C… Noise - Resid… Loud Musi… ## 6 67612476 2026-01-26T01:47:10.0… NYPD New York C… Noise - Resid… Loud Musi… ## # ℹ 33 more variables: location_type <chr>, incident_zip <chr>, ## # incident_address <chr>, street_name <chr>, cross_street_1 <chr>, ## # cross_street_2 <chr>, intersection_street_1 <chr>, ## # intersection_street_2 <chr>, address_type <chr>, city <chr>, ## # landmark <chr>, status <chr>, community_board <chr>, ## # council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>, ## # x_coordinate_state_plane <chr>, y_coordinate_state_plane <chr>, … # Quick check to make sure our filtering worked nrow(brooklyn_nypd) ## [1] 1000 unique(brooklyn_nypd$agency) ## [1] "NYPD" unique(brooklyn_nypd$borough) ## [1] "BROOKLYN"
We successfully created our dataset that contains the 1,000 most recent requests regarding the NYPD in the borough Brooklyn.
Now that we have successfully pulled the data and have it in R, let’s figure out what NYC residents in Brooklyn are complaining about to the NYPD.
To do this, we will create a bar graph of the complaint types.
# Visualizing the distribution, ordered by frequency
library(ggplot2)
ggplot(brooklyn_nypd, aes(y = reorder(complaint_type, complaint_type, length))) +
geom_bar(fill = "steelblue") +
theme_minimal() +
labs(
title = "Most Recent NYPD 311 Complaints (Brooklyn)",
subtitle = "Top 1,000 service requests",
x = "Number of Complaints",
y = "Type of Complaint"
)
Figure 1: Bar chart showing the frequency of NYPD-related 311 complaint types in Brooklyn from the 1,000 most recent service requests.
This graph shows us not only which complaints were made, but how many of each complaint were made.
Designed for reproducible workflows
A core design principle of nycOpenData is reproducibility. Rather than downloading static CSV files that can change over time or be accidentally modified, analyses can explicitly document:
- which dataset was used,
- how many rows were requested,
- which filters were applied,
- and when the data were accessed.
This makes the package particularly useful for:
- reproducible research projects,
- classroom assignments,
- data journalism,
- and exploratory civic analysis.
The package is also designed to be API-polite, with configurable timeouts and safeguards that help prevent common failure modes when querying large public datasets.
Who is it for?
nycOpenData is intended for a broad audience, including:
- students learning statistics or data science using real-world data,
- instructors teaching reproducible research or applied data analysis,
- researchers conducting exploratory or descriptive analyses,
- data journalists and civic technologists,
- and anyone interested in working with NYC public data in R.
The goal is not to abstract away the data itself, but to make access predictable, transparent, and easy to integrate into standard R workflows.
Availability
The package is available on CRAN and can be installed using:
install.packages("nycOpenData")
Development continues on GitHub, where new datasets and improvements are added regularly.
Acknowledgements
This package was developed alongside teaching and applied research projects in reproducible data science, with inspiration from open-source contributors across the R community and the NYC Open Data program.
Useful links
- CRAN package: https://CRAN.R-project.org/package=nycOpenData
- pkgdown site: https://martinezc1.github.io/nycOpenData/
- GitHub repository: https://github.com/martinezc1/nycOpenData
- NYC Open Data portal: https://opendata.cityofnewyork.us/
As always, feedback, bug reports, and dataset requests are very welcome.
Thanks for reading!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.