Introduction to Statistical Methods in R

January 18, 2016

(This article was first published on r – Lunean, and kindly contributed to R-bloggers)

Data analyses are the product of many different tasks, and statistical methods are one key aspect of any data analysis. There is a common workflow in the related areas of informatics, data mining, data science, machine learning, and statistics. The workflow tasks include data preparation, the development of predictive mathematical models, and the interpretation and preparation of analysis results (including the development of visualizations to communicate findings).

The presentation provides information on the last two steps of this workflow and reproducible code examples and presents a walk-through of many common statistical methods (including regression, clustering (e.g. K-means and hiearchical), and dimensionality reduction (e.g. prinical component analysis (PCA)) used to explore data with examples in R.

Novice users are shown how to navigate the resuting R object to extract specific elements of interest, such as correlation p-values, regression coefficients, etc. The presentation additionally tries to tackle of some of the key concerns about these introductory methods by providing guidance on the interpretation of analyses results, such as understanding the approximately 10 values returned in a simple linear regression; the importance of and how to deal with missing values through imputation in real world problems; determining the quality of clustering results; and understanding the data transformations that take place in dimension reduction methods. Also provided is information about more sophisticated methodologies, such as regularized regression methods: LASSO, Ridge, and Elastic Net regression, and packages to make use of these more advanced methods in R, such as glmnet for regularized regression.

Usage of these statistical methods for modeling can help users to understand their data sets, and these methodologies can be coupled with other aspects of R and RStudio to develop interactive analyses using the Shiny R package.


The post Introduction to Statistical Methods in R appeared first on Lunean.

To leave a comment for the author, please follow the link and comment on their blog: r – Lunean. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)