Minimal examples help
[This article was first published on mages' blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The other day I got stuck working with a huge data set using Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
data.table
in R. It took me a little while to realise that I had to produce a minimal reproducible example to actually understand why I got stuck in the first place. I know, this is the mantra I should follow before I reach out to R-help, Stack Overflow or indeed the package authors. Of course, more often than not, by following this advise, the problem becomes clear and with that the solution obvious. Ok, here is the problem. Well, easy to write down now, after I understood it.
Suppose, I have some data that describes my sales targets by product and quarter:
library(data.table) Plan <- data.table( Product=c(rep("Apple",3),rep("Kiwi",3),rep("Coconut",3)), Quarter=rep(c(1,2,3), 3), Target=1:9) Plan ## Product Quarter Target ## 1: Apple 1 1 ## 2: Apple 2 2 ## 3: Apple 3 3 ## 4: Kiwi 1 4 ## 5: Kiwi 2 5 ## 6: Kiwi 3 6 ## 7: Coconut 1 7 ## 8: Coconut 2 8 ## 9: Coconut 3 9
Further, I have some actual data, which is also broken down by region, but has no data for coconut:
Actual <- data.table( Region=rep(c("North", "South"), each=4), Product=rep(c("Apple", "Kiwi"), times=4), Quarter=rep(c(1,1,2,2), 2), Sales=1:8) Actual ## Region Product Quarter Sales ## 1: North Apple 1 1 ## 2: North Kiwi 1 2 ## 3: North Apple 2 3 ## 4: North Kiwi 2 4 ## 5: South Apple 1 5 ## 6: South Kiwi 1 6 ## 7: South Apple 2 7 ## 8: South Kiwi 2 8
What I would like to do is to join both data sets together, so that I can compare my sales figures with my targets. In particular, I would like to see also my targets for future quarters. However, I would like to filter out the target data for those products that are not available in a region, coconut in my example.
First I have to set keys for my data sets on which I would like to join them:
setkey(Actual, Product, Quarter) setkey(Plan, Product, Quarter)
Because I want to see also future targets I am not using
Plan[Actual]
. Instead I join the Plan data for each region; but then I get also the target data for coconut:Actual[, .SD[Plan], by=list(Region)] ## Region Product Quarter Sales Target ## 1: North Apple 1 1 1 ## 2: North Apple 2 3 2 ## 3: North Apple 3 NA 3 ## 4: North Coconut 1 NA 7 ## 5: North Coconut 2 NA 8 ## 6: North Coconut 3 NA 9 ## 7: North Kiwi 1 2 4 ## 8: North Kiwi 2 4 5 ## 9: North Kiwi 3 NA 6 ## 10: South Apple 1 5 1 ## 11: South Apple 2 7 2 ## 12: South Apple 3 NA 3 ## 13: South Coconut 1 NA 7 ## 14: South Coconut 2 NA 8 ## 15: South Coconut 3 NA 9 ## 16: South Kiwi 1 6 4 ## 17: South Kiwi 2 8 5 ## 18: South Kiwi 3 NA 6
Ok, that means I have to filter for the products in my actual data to match the relevant planning data:
Actual[, .SD[ Plan[ Product %in% unique(.SD[, Product]) ] ], by=list(Region)] ## Region Product Quarter Sales Target ## 1: North Apple 1 1 1 ## 2: North Apple 2 3 2 ## 3: North Apple 3 NA 3 ## 4: North Kiwi 1 2 4 ## 5: North Kiwi 2 4 5 ## 6: North Kiwi 3 NA 6 ## 7: South Apple 1 5 1 ## 8: South Apple 2 7 2 ## 9: South Apple 3 NA 3 ## 10: South Kiwi 1 6 4 ## 11: South Kiwi 2 8 5 ## 12: South Kiwi 3 NA 6
That's it. Now I can get back to my original huge and complex data set and move on.
Please let me know if there is a better way of achieving the above.
Session Info
R version 3.1.2 Patched (2015-01-20 r67564) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.10.2 (Yosemite) locale: [1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] data.table_1.9.4 loaded via a namespace (and not attached): [1] chron_2.3-45 plyr_1.8.1 Rcpp_0.11.4 reshape2_1.4.1 stringr_0.6.2
To leave a comment for the author, please follow the link and comment on their blog: mages' blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.