What Data Science can learn from small-data Statistics

November 15, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

Last month I joined Gregory Piatetsky (KDnuggets editor) for a webinar presentation Data Science: Not Just for Big Data, hosted by Kalido. In my portion of the prentation (you can see my slides below), I wanted to react to the Big Data focus which is so much a part of the Data Science movement today, to focus on the issues that with all data sets, that statisticians have learned from working with smaller data sets over the last 200 years. This includes issues like observational bias (an often-overlooked issue with Big Data), confounding and overfitting (which can mess up any model, if care isn't taken), and to move the discussion around predictions (means) and towards risk (variance).

 

I still firmly believe that Big Data is important — there's so much we can do today that was never possible without the variety and volume of data sources we have now — but the data science community has much to learn from the realm of smaller data. Serveral examples come from the excellent ComputerWorld article, 12 predictive analytics screw-ups. You can watch the webinar replay below.

 

Kalido Webinars: Data Science: Not Just For Big Data

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.