Sorry about the noisy post title, but it happens to be the name of the book I was working on in the past year, which has been just published at Packt:
Although I do not think that reading this ~400 page book will turn everyone into a true master of R and data analysis, but I believe it can get you on the way. I wrote this book for a relatively large target audience in mind with some prior R experience (like at an introductory university course or MOOC covering how to install R, load CSV files or generate a histogram), but without the time/need to walk through a complete series of books on the stats background, algorithms and domain specific knowledge on handling different data types.
So this is not a reference book, it does not even include a piece of formal mathematical formula, but instead it does provide a practical introduction, many references and hands-on examples on the following topics:
- Reading data from larger text files and databases in an optimal way
- Loading data from the Web via parsing HTML, XML, JSON and interacting with APIs
- Filtering, summarizing and restructuring data
- Building and interpreting generalized linear models
- Traditional multivariate statistical methods for dimension reduction and latent variables
- Classification and clustering, including supervised and unsupervised statistical and machine learning methods
- Handling outliers and missing values
- Processing unstructured text data
- A bit of social network analysis
- Smoothing, seasonal decomposition and modeling time-series
- Visualizing spatial data
And a free chapter (available from Packt) on “Analyzing the R community”, which combines quite a few techniques described in the above mentioned chapters into an actual use case, including some reproducible examples from some of my past researches on this topic:
- The number of R Foundation members and R conference attendees (previously presented at the useR! 2014 and 2015 conferences besides an interactive webapp on R-activity around the world)
- The number of packages per R package maintainers
- The volume and timeline of messages and posters on the [R-help] mailing list
- Estimating the number of R users around the world
- The number of R users on Facebook and Twitter
Besides this free chapter, Packt offered a 50% discount on the e-book format of this book for two weeks, that you can activate via the RXI37LH discount code until October 30 2015 (Friday). Another promo code for 20% discount on printed copies is also being generated — to be available early next week. For more details, revisit this page later and look for new comments, or follow me on Twitter:
After ~1001 sleepless nights, my #rstats book on #datascience is published w/ a free chapter https://t.co/7lS4pgN06k pic.twitter.com/MnM24P67dE
— Gergely Daróczi (@daroczig) October 1, 2015
Some quick statistics on the book:
- 14 chapters
- 396 pages
- 95 packages loaded
- hflights and data.table used in 7, ggplot2 in 5, dplyr and plyr in 4, microbenchmark and MASS used in 3 chapters
- 5 reviewers
- more than 20 persons contributing
- 2,711 lines of the code bundle on GitHub
- 581 days between signing the author contract and the actual publication date
- around 320 e-mail sent and received with the ISBN on the subject line
- 10,000 kilometers between the places where I wrote the first and the last chapters
- and I forgot to use time tracking software after logging 174.73 hours spent on the book
And most importantly: I’d love to and looking forward to hearing any kind of private or public feedback on this book!
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...