Our Differential Privacy Mini-series

November 1, 2015
By

(This article was first published on Win-Vector Blog » R, and kindly contributed to R-bloggers)

We’ve just finished off a series of articles on some recent research results applying differential privacy to improve machine learning. Some of these results are pretty technical, so we thought it was worth working through concrete examples. And some of the original results are locked behind academic journal paywalls, so we’ve tried to touch on the highlights of the papers, and to play around with variations of our own.

Blurry snowflakes stock by cosmicgallifrey d3inho1

  • A Simpler Explanation of Differential Privacy: Quick explanation of epsilon-differential privacy, and an introduction to an algorithm for safely reusing holdout data, recently published in Science (Cynthia Dwork, Vitaly Feldman, Moritz Hardt, Toniann Pitassi, Omer Reingold, Aaron Roth, “The reusable holdout: Preserving validity in adaptive data analysis”, Science, vol 349, no. 6248, pp. 636-638, August 2015).

    Note that Cynthia Dwork is one of the inventors of differential privacy, originally used in the analysis of sensitive information.

  • Using differential privacy to reuse training data: Specifically, how differential privacy helps you build efficient encodings of categorical variables with many levels from your training data without introducing undue bias into downstream modeling.
  • A simple differentially private procedure: The bootstrap as an alternative to Laplace noise to introduce differential privacy.

Our R code and experiments are available on Github here, so you can try some experiments and variations yourself.

Image Credit

To leave a comment for the author, please follow the link and comment on their blog: Win-Vector Blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)