Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
What is the difference between statistics and data analysis?

Of course to answer this we need to define those terms, and definitions of such things are hardly standard. But they are nor particularly standard in other disciplines either. Can you define art? Music? How about mathematics?

Would you have defined mathematics as “including such topics as numbers, formulas and related structures, shapes and the spaces in which they are contained, and quantities and their changes,” as in Wikipedia? Is this all-encompassing?

Statistics and data analysis have some overlaps. Both involve defining, exploring, cleaning, visualizing, and describing data. Data analyst students study some traditional statistics. Statistics students nowadays study some data analysis.

The father and daughter Larose team have suggested a working distinction of inferential statistics versus data mining (so neither of these is identical to the terms in the first sentence above) as follows:

Inferential statistics involves having a prior hypothesis about a population and testing that hypothesis with a sample from that population. The test may result in statistical significance, even if there is no practical significance.

Data mining does not begin with a prior hypothesis, but rather the analyst “freely trolls through the data for actionable results.” (Larose, p. 161)

Larose, D.T. & Larose, C.D. (2015). Data mining and predictive analytics. Wiley. Hoboken, NJ.