Blog Archives

Technical Foundations of Informatics: A modern introduction to R

May 3, 2017
By
Technical Foundations of Informatics: A modern introduction to R

Informatics (or Information Science) is the practice of creating, storing, finding, manipulating and sharing information. These are all tasks that the R language was designed for, and so Technical Foundations of Informatics, the online course guide for the University of Washington course of the same name, also provides an excellent resource for learning those skills using R. The course...

Read more »

The Datasaurus Dozen

May 2, 2017
By
The Datasaurus Dozen

There's a reason why data scientists spend so much time exploring data using graphics. Relying only on data summaries like means, variances, and correlations can be dangerous, because wildly different data sets can give similar results. This is a principle that has been demonstrated in statistics classes for decades with Anscombe's Quartet: four scatterplots which despite being qualitatively different...

Read more »

Using Microsoft R with Alteryx

May 1, 2017
By
Using Microsoft R with Alteryx

Alteryx Designer, the self-service analytics workflow tool, recently added integration with Microsoft R. This allows you to train models provided by Microsoft R, and create predictions from them, without needing to write R code — you simply drag-and-drop to create a workflow. In a recent post at the Microsoft R blog, Bharath Sankaranarayan walks through the process of building...

Read more »

Make pleasingly parallel R code with rxExecBy

April 28, 2017
By
Make pleasingly parallel R code with rxExecBy

Some things are easy to convert from a long-running sequential process to a system where each part runs at the same time, thus reducing the required time overall. We often call these "embarrassingly parallel" problems, but given how easy it is to reduce the time it takes to execute them by converting them into a parallel process, "pleasingly parallel"...

Read more »

Where Europe lives, in 14 lines of R Code

April 27, 2017
By
Where Europe lives, in 14 lines of R Code

Via Max Galka, always a great source of interesting data visualizations, we have this lovely visualization of population density in Europe in 2011, created by Henrik Lindberg: Impressively, the chart was created with just 14 lines of R code: (To recreate it yourself, download the GEOSTAT-grid-POP-1K-2011-V2-0-1.zip file from eurostat, and move the two .csv files inside in range of...

Read more »

dv01 uses R bring greater transparency to the consumer lending market

April 26, 2017
By
dv01 uses R bring greater transparency to the consumer lending market

The founder of the NYC-based startup dv01 watched the 2008 financial crisis and was inspired to bring greater transparency to institutional investors in the consumer lending market. Despite being an open-source shop, they switched their data services to Microsoft SQL Server to provide better performance (reducing latency for queries from tens of seconds to under two seconds). They also...

Read more »

Using checkpoint with knitr and RStudio

April 25, 2017
By
Using checkpoint with knitr and RStudio

The knitr package by Yihui Xie is a wonderful tool for reproducible data science. I especially like using it with R Markdown documents, where with some simple markup in an easy-to-read document I can easily combine R code and narrative text to generate an attractive document with words, tables and pictures in HTML, PDF or Word format. Say, something...

Read more »

R 3.4.0 now available

April 24, 2017
By

R 3.4.0, the latest release of the R programming language (codename: "You Stupid Darkness"), is now available. This is the annual major update to the R language engine, and provides improved performance for R programs. The source code was released by the R Core Team on Friday and binaries for Windows, Mac and Linux are available for download now...

Read more »

Reproducible Data Science with R

April 21, 2017
By

Yesterday, I had the honour of presenting at The Data Science Conference in Chicago. My topic was Reproducible Data Science with R, and while the specific practices in the talk are aimed at R users, my intent was to make a general argument for doing data science within a reproducible workflow. Whatever your tools, a reproducible process: Saves time,...

Read more »

SQL Server 2017 to add Python support

April 20, 2017
By
SQL Server 2017 to add Python support

One of the major announcements from yesterday's Data Amp event was that SQL Server 2017 will add Python as a supported language. Just as with the continued R support, SQL Server 2017 will allow you to process data in the database using any Python function or package without needing to export the data from the database, and use SQL...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)