Blog Archives

pandas “transform” using the tidyverse

April 11, 2017
By
pandas “transform” using the tidyverse

Chris Moffit has a nice blog on how to use the transform function in pandas. He provides some (fake) data on sales and asks the question of what fraction of each order is from each SKU. Being a R nut and a tidyverse fan, I thought to compare and contrast the code for the pandas

Read more »

Changing names in the tidyverse: An example for many regressions

March 9, 2017
By
Changing names in the tidyverse: An example for many regressions

A collaborator posed an interesting R question to me today. She wanted to do several regressions using different outcomes, with models being computed on different strata defined by a combination of experimental design variables. She then just wanted to extract the p-values for the slopes for each of the models, and then filter the strata

Read more »

Copying tables from R to Outlook

February 28, 2017
By
Copying tables from R to Outlook

I work in an ecosystem that uses Outlook for e-mail. When I have to communicate results with collaborators one of the most frequent tasks I face is to take a tabular output in R (either a summary table or some sort of tabular output) and send it to collaborators in Outlook. One method is certainly

Read more »

A (much belated) update to plotting Kaplan-Meier curves in the tidyverse

February 28, 2017
By
A (much belated) update to plotting Kaplan-Meier curves in the tidyverse

One of the most popular posts on this blog has been my attempt to create Kaplan-Meier plots with an aligned table of persons-at-risk below it under the ggplot paradigm. That post was last updated 3 years ago. In the interim, Chris Dardis has built upon these attempts to create a much more stable and feature-rich

Read more »

A quick exploration of the ReporteRs package

October 28, 2016
By
A quick exploration of the ReporteRs package

The package ReporteRs has been getting some play on the interwebs this week, though it’s actually been around for a while. The nice thing about this package is that it allows writing Word and PowerPoint documents in an OS-independent fashion unlike some earlier packages. It also allows the editing of documents by using bookmarks within

Read more »

Annotated Facets with ggplot2

October 20, 2016
By
Annotated Facets with ggplot2

I was recently asked to do a panel of grouped boxplots of a continuous variable, with each panel representing a categorical grouping variable. This seems easy enough with ggplot2 and the facet_wrap function, but then my collaborator wanted p-values on the graphs! This post is my approach to the problem. First of all, one caveat. I’m a

Read more »

Creating new data with max values for each subject

December 1, 2014
By
Creating new data with max values for each subject

We have a data set dat with multiple observations per subject. We want to create a subset of this data such that each subject (with ID giving the unique identifier for the subject) contributes the observation where the variable X takes it’s maximum value for that subject. An R solution Using the excellent R package

Read more »

“LaF”-ing about fixed width formats

November 10, 2014
By
“LaF”-ing about fixed width formats

If you have ever worked with US government data or other large datasets, it is likely you have faced fixed-width format data. This format has no delimiters in it; the data look like strings of characters. A separate format file defines which columns of data represent which variables. It seems as if the format is

Read more »

Practical Data Science Cookbook

November 10, 2014
By
Practical Data Science Cookbook

Practical Data Science Cookbook My friends Sean Murphy, Ben Bengfort, Tony Ojeda and I recently published a book, Practical Data Science Cookbook. All of us are heavily involved in developing the data community in the Washington DC metro area, serving on the Board of Directors of Data Community DC. Sean and Ben co-organize the meetup

Read more »

The need for documenting functions

May 22, 2014
By
The need for documenting functions

My current work usually requires me to work on a project until we can submit a research paper, and then move on to a new project. However, 3-6 months down the road, when the reviews for the paper return, it is quite common to have to do some new analyses or re-analyses of the data.

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)