NYED Data Explorer Shows 15 Years of Charter School Success

[This article was first published on R on Redwall Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
NYED Data Explorer filtered for “All Students” ELA Aggregated Annual Test Scores
NYED Data Explorer filtered for “All Students” ELA Aggregated Annual Test Scores


Three years ago, in the course of building personal projects in R using public data from Connecticut, I wrote How Does Stamford Charter School for Excellence do it?. “Stamford Excellence” was achieving remarkable results in a district with proficiency below 50%. Then and now, the school receives little recognition, and their effort to open a second Connecticut location in Norwalk, has been stymied as of this writing. My interest in these exceptional schools, and possibly the blog post, led to a new position as Director of Data & Research at another high-performing Bronx charter school network, South Bronx Classical Charter Schools. Naturally, when I discovered 15 years of NYED assessment data, the interest to clean and free this data for others to discover in a Shiny app, was immediate. The opportunity to also feature Classical’s stand-out performance didn’t hurt my motivation, although this post and the app were built in my spare time, and do not represent the opinions of South Bronx Classical Charter Schools. Unlike many past Redwall posts, this one will not have code, and will be primarily to show how to use the app, built with {golem} and {shinyWidgets}.

Classical Charter Schools and the Charter Debate

Classical Charter Schools operates four schools demonstrating outstanding academic performance in an under-served community, where there is not much evidence of educational accountability (based on the results which will be shown below). With approximately 90% economically-disadvantaged students, it achieves close to 90% proficiency in both ELA and Math, on par with the wealthiest NYS districts, while spending much less per pupil than DOE schools. This is accomplished with a variety of strategies, but most importantly for me, a big commitment to collecting and using data to make decisions. This post will show the result of extensive data cleaning, of over 3 million rows of school performance data, from the NYED Data site. Despite five year’s of daily experience manipulating data with code, untangling this data took much more effort than I care to admit. After the recent broad drop in proficiency related to the COVID, and the renewal of public debate about lifting the cap on new charter schools, it seems like there has never been a better time to make clean and accessible school performance data available. This post launches a “minimum viable” version of a Shiny App called NYED Data Explorer (accessible in this blog post below), which I hope may put to rest any questions over whether more charters should be allowed.

Thoughts on NYED and Open Public Data

As with every public data site I have worked with, the NYED’s disclosure strategy sometimes feels like it intends to make it hard to access a clean longitudinal data set. Disclosures occur annually, and are often in the inconvenient form of a MS Access Database, while other years shift to csv, xlsx or tab separated formats. Sometimes the Access databases include tables for enrollment, but leave them out in other years. In some years, only the current year is disclosed, and others also include the prior year. I was able to extract and separate these painfully, year-by-year from the command line using mdbtools. When the number of scholars in a group is below a threshold, fields are suppressed, which is common, but the NYED data denotes these cases with varying notations in different years. Sometimes, assessment data is disclosed in tables by grade, and others, all grades are stacked together in one table. Disclosed subgroups are usually male/female and ethnicity, but several years include ethnic group by gender. Important fields are added, and then disappear, such as in 2022, dropping “Mean Scale Score”, which showed the average score of a cohort at a school in most of the past years. As a result, I will concentrate on the “Pass Rate” for now, which is the number of students scoring in Level 3 or 4 divided by total test takers. The cutoffs for levels change every year as does the test difficulty, so “pass rates” are also not an objective measure over time.

Data Collection Considerations

Schools record and report student attributes, but in many cases, it seems likely that these have differences in interpretation, inaccuracies in collection and timing differences. Tracking race and ethnicity accurately within a single institution with changing personal is likely to be complicated, much less across 4,000+ schools, all with varying, often manual processes. There may be dozens of data filings a school has to make over the calendar year to local DOE as well as the State systems. Even if a scholar is classified correctly, there will surely be many cases where the totals may not be added up or transposed correctly into reporting systems. Enrollment data changes through the year, but is primarily recorded at “BEDS” Day. There are many cases, especially before 2012, where the number of test takers at the exams in the Spring exceeds the total enrollment in a grade reported at the same school. In fact, in the early years of this data, the aggregate number of test takers often exceeded the total enrollment in all schools in a subgroup. This may be still an error in my data cleaning, but is unclear as of this writing. The intention of this app is to give the most accurate representation possible. Please understand that this is impossible given the nature of the data, although it lot better than anything else I have discovered up until now.

Future additions

Making the data cleaning steps reproducible is one goal, but given the number of manual steps to collect the raw data out of MS Access databases, this is not easy. Now that the data is mostly clean, and the app is up and running, I will be adding further tables and graphics to aid in explorations to compare the trajectory of individual schools. Approximately 80 charters have come and gone for poor performance. It would be interesting to see how many failing DOE schools have also shut down, and the relationship between that and performance. We have narrowed our filters down to Counties, but zip codes might be even more revealing and not difficult to add. In his book “Charter School and their Enemies”, Thomas Sowell showed the striking differentials in performance of students in charter and DOE schools sharing the exact same buildings, so even this would not be too difficult to add.


The goal of the NYED Data Dashboard, as with many of my past “Redwall Analytics” projects, was to free valuable data from an inaccessible open repository, especially where the honesty surrounding the debate might be improved. My walk through is opinionated, but I think is supported by data, which is available for all to test and reproduce. Although I focused on NYC schools, the data includes all NYS public schools. The comments section of this blog is open, and I would welcome feedback on how to improve the app, different interpretations of the data or how to boost awareness of it.

To leave a comment for the author, please follow the link and comment on their blog: R on Redwall Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)