Reproducible Environments

April 21, 2019 | 0 Comments

Great data science work should be reproducible. The ability to repeat experiments is part of the foundation for all science, and reproducible work is also critical for business applications. Team collaboration, project validation, and sustainable products presuppose the ability to reproduce work over time. In my opinion, mastering just a ...
Setting up RStudio Server on a Cloud for Collaboration and Reproducibility

April 16, 2019 | 0 Comments

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When setting up R and RStudio Server on a cloud Linux instance, some thought should be given to implementing a workflow that facilitates collaboration and ensures R project reproducibility. There are many possible workflows to accomplish ...
On Meeting Data Journalists

April 7, 2019 | 0 Comments

“I’d rather do data than date”. I overheard this while eavesdropping on a conversation among three female data journalists while waiting for an elevator at the IRE-CAR (Investigative Reporters and Editors - Computer-Assisted Reporting) conference last month. I would like to think the remark was overloaded with hyperbole, but ...
How to share R visualizations in Microsoft PowerPoint

April 3, 2019 | 0 Comments

Hadrien Dykiel is an RStudio Customer Success Engineer Microsoft PowerPoint is often the de facto choice for creating presentation slides, especially at larger companies. In many organizations, it comes pre-installed on workstations and pretty much everybody knows how to use it. This can make it an effective medium for sharing ...
RInside Help in Testing

March 31, 2019 | 0 Comments

A problem arises when building R interfaces to C/C++ libraries involves testing: how to go about replicating the existing C/C++ tests in R without undue effort. If the C/C++ tests are simple and small enough, they can be manually translated. However, when there are many tests, and ...

February 2019: “Top 40” New CRAN Packages

March 25, 2019 | 0 Comments

One hundred and fifty-one new packages arrived at CRAN in February. Here are my “Top 40” picks organized into eight categories: Bioinformatics, Data, Machine Learning, Medicine, Statistics, Time Series, Utilities and Visualization. Bioinfomatics Cascade v1.7: Implements a modeling tool allowing gene selection, reverse engineering, and prediction in cascade networks. See Jung ...
How to Avoid Publishing Credentials in Your Code

March 20, 2019 | 0 Comments

Roland Stevenson is a data scientist and consultant who may be reached on Linkedin. When accessing an API or database in R, it is often necessary to provide credentials such as a login name and password. You may find yourself being prompted with something like this: When writing an R ...
Parsnipping Fama French

March 13, 2019 | 0 Comments

Today, we will continue our exploration of developments in the world of tidy models, and we will stick with our usual Fama French modeling flow to do so. For new readers who want get familiar with Fama French before diving into this post, see here where we covered importing and ...
Paid in Books: An Interview with Christian Westergaard

March 6, 2019 | 0 Comments

R is greatly benefiting from new users coming from disciplines that traditionally did not provoke much serious computation. Journalists1 and humanist scholars2, for example, are embracing R. But, does the avenue from the Humanities go both ways? In a recent conversation with Christian Westergaard, proprietor of Sophia Rare Books in ...
Graph analysis using the tidyverse

March 5, 2019 | 0 Comments

It is because I am not a graph analysis expert that I though it important to write this article. For someone who thinks in terms of single rectangular data sets, it is a bit of a mental leap to understand how to apply tidy principles to a more robust object, ...
Some R Packages for ROC Curves

February 28, 2019 | 0 Comments

In a recent post, I presented some of the theory underlying ROC curves, and outlined the history leading up to their present popularity for characterizing the performance of machine learning models. In this post, I describe how to search CRAN for packages to plot ROC curves, and highlight six useful ...
January 2019: “Top 40” New CRAN Packages

February 24, 2019 | 0 Comments

One hundred and fifty-three new packages made it to CRAN in January. Here are my “Top 40” picks in eight categories: Computational Methods, Data, Machine Learning, Medicine, Science, Statistics, Utilities, and Visualization. Computational Methods cPCG v1.0: Provides a function to solve systems of linear equations using a (preconditioned) conjugate gradient algorithm. ...
A Few New R Books

February 19, 2019 | 0 Comments

Greg Wilson is a data scientist and professional educator at RStudio. As a newcomer to R who prefers to read paper rather than pixels, I've been working my way through a more-or-less random selection of relevant books over the past few months. Some have discussed topics that I'm ...

A Look Back on 2018: Part 2

February 11, 2019 | 0 Comments

Welcome to the second installment of Reproducible Finance 2019! In the previous post, we looked back on the daily returns for several market sectors in 2018. Today, we’ll continue that theme and look at some summary statistics for 2018, and then extend out to previous years and different ways of visualizing our ...
R for Quantitative Health Sciences: An Interview with Jarrod Dalton

February 5, 2019 | 0 Comments

This interview came about through researching R-based medical applications in preparation for the upcoming R/Medicine conference. When we discovered the impressive number of Shiny-based Risk Calculators developed by the Cleveland Clinic and implemented in public-facing sites, we wanted to learn more about the influence of R Language in the ...

December 2108: “Top 40” New CRAN Packages

January 29, 2019 | 0 Comments

By my count, 157 new packages stuck to CRAN in December. Below are my “Top 40” picks in ten categories: Computational Methods, Data, Finance, Machine Learning, Medicine, Science, Statistics, Time Series, Utilities and Visualization. This is the first time I have used the Medicine category. I am pleased that a few packages ...
Onboard and Offboard Data Manipulation in Flexdashboard

January 22, 2019 | 0 Comments

Harrison Schramm is a Professional Statistician and Non-Resident Senior Fellow at the Center for Strategic and Budgetary Assessments. The Shiny set of tools, and, by extension, Flexdashboard, give professional analysts tools to rapidly put interactive versions of their work in the hands of clients. Frequently, an end user will interact ...

ROC Curves

January 16, 2019 | 0 Comments

I have been thinking about writing a short post on R resources for working with (ROC) curves, but first I thought it would be nice to review the basics. In contrast to the usual (usual for data scientists anyway) machine learning point of view, I’ll frame the topic closer ...
A Look Back on 2018: Part 1

January 9, 2019 | 0 Comments

Welcome to Reproducible Finance 2019! It’s a new year, a new beginning, the Earth has completed one more trip around the sun, and that means it’s time to look back on the previous January to December cycle. Today and next time, we’ll explore the returns and volatilities of ...
