Use of Differential Privacy in the US Census–All for Nothing?

[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The field of data privacy has long been of broad interest. In a medical database, for instance, how can administrators enable statistical analysis by medical researchers, while at the same time protecting the privacy of individual patients? Over the years, many methods have been proposed and used. I’ve done some work in the area myself.

But in 2006, an approach known as differential privacy (DP) was proposed, by a group of prominent cryptography researchers. With its catchy name and theoretical underpinnings, DP immediately attracted lots of attention. As it is more mathematical than many other statistical disclosure control methods, thus good fodder for theoretical research–it immediately led to a flurry of research papers, showing how to apply DP in various settings.

DP was also adopted by some firms in industry, notably Apple. But what really gave DP a boost was the decision by the US Census Bureau to use DP for their publicly available data, beginning with the most recent census, 2020. On the other hand, that really intensified the opposition to DP. I have my own concerns about the method.

The Bureau, though, had what it considered a compelling reason to abandon their existing privacy methods: Their extensive computer simulations showed that current methods were vulnerable to attack, in such a manner as to exactly reconstruct large portions of the “private” version of the census database. This of course must be avoided at all costs, and DP was implemented.

But now…it turns out that the Bureau’s claim of reconstructivity. was incorrect, according to a recent paper by Krishna Muralidhar, who writes,

“This study shows that there are a practically infinite number of possible reconstructions, and each reconstruction leads to assigning a different identity to the respondents in the reconstructed data. The results reported by the Census Bureau researchers are based on just one of these infinite possible reconstructions and is easily refuted by an alternate reconstruction.”

This is one of the most startling statements I’ve seen in my many years in academia. It would appear that the Bureau committed a “rush to judgment” on a massive scale, just mind boggling, and in addition–much less momentous but still very concerning–gave its imprimatur to methodology that many believe has serious flaws.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)