Blog Archives

Getting {sparklyr}, {h2o}, {rsparkling} to work together and some fun with bash

This is going to be the type of blog posts that would perhaps be better as a gist, but it is easier for me to use my blog as my own personal collection of gists. Plus, someone else might find this useful, so here it is! In this blog post I am going to show a little trick to...

Read more »

Importing 30GB of data in R with sparklyr

February 15, 2018
By
Importing 30GB of data in R with sparklyr

Disclaimer: the first part of this blog post draws heavily from Working with CSVs on the Command Line, which is a beautiful resource that lists very nice tips and tricks to work with CSV files before having to load them into R, or any other statistical software. I highly recommend it! Also, if you find this interesting, read also...

Read more »

Predicting job search by training a random forest on an unbalanced dataset

February 10, 2018
By
Predicting job search by training a random forest on an unbalanced dataset

In this blog post, I am going to train a random forest on census data from the US to predict the probability that someone is looking for a job. To this end, I downloaded the US 1990 census data from the UCI Machine Learning Repository. Having a background in economics, I am always quite interest by such datasets. I...

Read more »

Mapping a list of functions to a list of datasets with a list of columns as arguments

This week I had the opportunity to teach R at my workplace, again. This course was the “advanced R” course, and unlike the one I taught at the end of last year, I had one more day (so 3 days in total) where I could show my colleagues the joys of th...

Read more »

It’s lists all the way down, part 2: We need to go deeper

It’s lists all the way down, part 2: We need to go deeper

Shortly after my previous blog post, I saw this tweet on my timeline: The purrr resolution for 2018 - learn at least one purrr function per week - is officially launched with encouragement and inspiration from @statwonk and @hadleywickham. We start with modify_depth: https://t.co/dCMnSHP7Pl. Please join to learn and share. #rstats— Isabella R. Ghement (@IsabellaGhement) January 3,...

Read more »

It’s lists all the way down

Today, I had the opportunity to help someone over at the R for Data Science Slack group (read more about this group here) and I thought that the question asked could make for an interesting blog post, so here it is! Disclaimer: the way I’m doing things here is totally not optimal, but I want to illustrate how to map...

Read more »

Building formulae

December 26, 2017
By

This Stackoverflow question made me think about how to build formulae. For example, you might want to programmatically build linear model formulae and then map these models on data. For example, suppose the following (output suppressed): data(mtcars) lm(mpg ~ hp, data = mtcars) lm(mpg ~I(hp^2), data = mtcars) lm(mpg ~I(hp^3), data = mtcars) lm(mpg ~I(hp^4), data = mtcars) lm(mpg ~I(hp^5), data = mtcars) lm(mpg ~I(hp^6), data...

Read more »

Teaching the tidyverse to beginners

December 16, 2017
By
Teaching the tidyverse to beginners

End October I tweeted this: will teach #rstats soon again but this time following @drob 's suggestion of the tidyverse first as laid out here: https://t.co/js8SsUs8Nv— Bruno Rodrigues (@brodriguesco) October 24, 2017 and it generated some discussion. Some people believe that this is the right approach, and some others think that one should first present base R and then show how...

Read more »

Functional peace of mind

November 13, 2017
By

I think what I enjoy the most about functional programming is the peace of mind that comes with it. With functional programming, there’s a lot of stuff you don’t need to think about. You can write functions that are general enough so that they solve a variety of problems. For example, imagine for a second that R does not...

Read more »

Easy peasy STATA-like marginal effects with R

Easy peasy STATA-like marginal effects with R

Model interpretation is essential in the social sciences. If one wants to know the effect of variable x on the dependent variable y, marginal effects are an easy way to get the answer. STATA includes a margins command that has been ported to R by Thomas J. Leeper of the London School of Economics and Political Science. You can...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)