Last month I joined Gregory Piatetsky (KDnuggets editor) for a webinar presentation Data Science: Not Just for Big Data, hosted by Kalido. In my portion of the prentation (you can see my slides below), I wanted to react to the Big Data focus which is so much a part of the Data Science movement today, to focus on the issues that with all data sets, that statisticians have learned from working with smaller data sets over the last 200 years. This includes issues like observational bias (an often-overlooked issue with Big Data), confounding and overfitting (which can mess up any model, if care isn't taken), and to move the discussion around predictions (means) and towards risk (variance).
I still firmly believe that Big Data is important — there's so much we can do today that was never possible without the variety and volume of data sources we have now — but the data science community has much to learn from the realm of smaller data. Serveral examples come from the excellent ComputerWorld article, 12 predictive analytics screw-ups. You can watch the webinar replay below.
Kalido Webinars: Data Science: Not Just For Big Data