Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the biggest challenges educators face is how to teach statistical thinking integrated with data and computing skills to allow our students to fluidly think with data.  Contemporary data science requires a tight integration of knowledge from statistics, computer science, mathematics, and a domain of application. For example, how can one model high earnings as a function of other features that might be available for a customer? How do the results of a decision tree compare to a logistic regression model? How does one assess whether the underlying assumptions of a chosen model are appropriate?  How are the results interpreted and communicated?

While there are a lot of other useful textbooks and references out there (e.g., R for Data Science, Practical Data Science with R, Intro to Data Science with Python) we saw a need for a book that incorporates statistical and computational thinking to solve real-world problems with data.  The result was Modern Data Science with R, a comprehensive data science textbook for undergraduates that features meaty, real-world case studies integrated with modern data science methods.  (Figure 8.2 above was taken from a case study in the supervised learning chapter.)

Part I (introduction to data science) motivates the book and provides an introduction to data visualization, data wrangling, and ethics.  Part II (statistics and modeling) begins with fundamental concepts in statistics, supervised learning, unsupervised learning, and simulation.  Part III (topics in data science) reviews dynamic visualization, SQL, spatial data, text as data, network statistics, and moving towards big data.  A series of appendices cover the mdsr package, an introduction to R, algorithmic thinking, reproducible analysis, multiple regression, and database creation.

We believe that several features of the book are distinctive:

1. minimal prerequisites: while some background in statistics and computing is ideal, appendices provide an introduction to R, how to write a function, and key statistical topics such as multiple regression
2. ethical considerations are raised early, to motivate later examples
3. recent developments in the R ecosystem (e.g., RStudio and the tidyverse) are featured

Rather than focus exclusively on case studies or programming syntax, this book illustrates how statistical programming in R/RStudio can be leveraged to extract meaningful information from a variety of data in the service of addressing compelling statistical questions.

This book is intended to help readers with some background in statistics and modest prior experience with coding develop and practice the appropriate skills to tackle complex data science projects. We’ve taught a variety of courses using it, ranging from an introduction to data science, a sophomore level data science course, and as part of the components for a senior capstone class.
We’ve made three chapters freely available for download: data wrangling I, data ethics, and an introduction to multiple regression. An instructors solution manual is available, and we’re working to create a series of lab activities (e.g., text as data).  (The code to generate the above figure can be found in the supervised learning materials at http://mdsr-book.github.io/instructor.html.)  Modern Data Science with R 