Blog Archives

Timing Working With a Row or a Column from a data.frame

May 15, 2019
By
Timing Working With a Row or a Column from a data.frame

In this note we share a quick study timing how long it takes to perform some simple data manipulation tasks with R data.frames. We are interested in the time needed to select a column, alter a column, or select a row. Knowing what is fast and what is slow is critical in planning code, so … Continue reading Timing...

Read more »

What is “Tidy Data”?

May 11, 2019
By

I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable forms a column. Each observation forms a row. Each type of observational unit forms a table. … Continue reading What...

Read more »

Could not Resist

April 29, 2019
By
Could not Resist

Also, Practical Data Science with R, 2nd Edition; Zumel, Mount; Manning 2019 is now content complete! It is deep into editing and soon into production!

Read more »

Data Layout Exercises

April 27, 2019
By

John Mount, Nina Zumel; Win-Vector LLC 2019-04-27 In this note we will use five real life examples to demonstrate data layout transforms using the cdata R package. The examples for this note are all demo-examples from tidyr/demo/, and are mostly based on questions posted to StackOverflow. They represent a good cross-section of data layout problems, … Continue reading Data...

Read more »

Practical Data Science with R Book Update (April 2019)

April 22, 2019
By

I thought I would give a personal update on our book: Practical Data Science with R 2nd edition; Zumel, Mount; Manning 2019. The second edition should be fully available this fall! Nina and I have finished up through chapter 10 (of 12), and Manning has released previews of up through chapter 7 (with more to … Continue reading Practical...

Read more »

Controlling Data Layout With cdata

April 16, 2019
By
Controlling Data Layout With cdata

Here is an example how easy it is to use cdata to re-layout your data. Tim Morris recently tweeted the following problem (corrected). Please will you take pity on me #rstats folks? I only want to reshape two variables x & y from wide to long! Starting with: d xa xb ya yb 1 1 … Continue reading Controlling...

Read more »

Piping is Method Chaining

April 14, 2019
By

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called … Continue reading Piping...

Read more »

R Photo

April 10, 2019
By
R Photo

A good friend is now a professor at the University of Auckland and knew to photograph and send us this. Thanks!!!

Read more »

Practical Data Science with R Book Update

April 8, 2019
By
Practical Data Science with R Book Update

A good friend shared with us a great picture of Practical Data Science with R, 1st Edition hanging out in Cambridge at the MIT Press Bookstore. This is as good an excuse as any to share a book update. Nina Zumel and I (John Mount) are busy revising chapters 10 and 11 of Practical Data … Continue reading Practical...

Read more »

Not Always C++’s Fault

April 6, 2019
By

From the recent developer.r-project.org “Staged Install” article: Incidentally, there were just two distinct (very long) lists of methods in the warnings across all installed packages in my run, but repeated for many packages. It turned out that they were lists of exported methods from dplyr and rlang packages. These two packages take very long to … Continue reading Not...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)