79th #TokyoR Meetup: {tidyr} 1.0.0, RAW image processing, and more!

[This article was first published on R by R(yo), and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As the monsoon rains fall, another TokyoR User Meetup! On June 29th, useRs from all over Tokyo flocked to Hanzomon, Tokyo for another jam-packed session of #rstats hosted by Infocom.

In line with my previous round up posts:

I will be going over around half of all the talks. Hopefully, my efforts will help spread the vast knowledge of Japanese R users to the wider R community. Throughout I will also post helpful blog posts and links from other sources if you are interested in learning more about the topic of a certain talk. You can follow TokyoR by searching for the #TokyoR hashtag on Twitter.

Anyways…

Let’s get started!

BeginneR Session

As with every TokyoR meetup, we began with a set of beginner user focused talks:

Main Talks

yutannihilation: tidyr 1.0.0

@yutannihilation, co-author of ggplot2, gave a talk about the new tidyr 1.0.0 version with special focus on the pivot_longer() and pivot_wider() functions that are to take over from (but not erase) the gather() and spread() functions respectively.

1. pivot_longer(data, columns, ...): Make datasets longer (more rows, less columns)

  • You can use the various select() helper functions to select columns
    • ex. starts_with("col"), contains("july"), matches(".t.")
  • For multiple columns, you need to use c()
  • Other options such as col1:col5 and -col2 still exist as well
  • names_to = argument lets you set the name of the new column(s) that will be created by the function
  • values_to = argument lets you set the name of the new column(s) that will contain the “data value”
    • if the column name does not already exist in the dataframe you have to wrap them in quotes, ""

The above are pretty similar to what we had in gather() but some new arguments I found interesting were:

  • names_prefix = argument which lets you remove
    • Ex. Values in column look like “day1”, “day2”, “day3”, etc., if you set names_prefix = 'day' it will remove “day” from the values in the column.
  • names_ptype = argument lets you set the class type of the specified column(s)
    • Ex. After taking out “day” from the values in the above example, the number values in the column are still class “character”. Set names_ptype = list(day = integer()) to specify that the column should be an integer class (see below image).

  • names_pattern = and names_sep = work similarly in that you specify a regex or symbol on which you want to separate the values into the names_to columns you want to create.

2. pivot_wider(columns, ...): Make datasets wider (less rows, more columns)

  • names_from = and values_from = are like in pivot_longer() except the opposite way as we combine the column names and specify which column the values for the new column comes from.
  • values_fill =: use this argument to specify what each value should be when missing.

  • names_sep = and names_prefix =: Much like their counterparts in pivot_longer() except they create new column names using “sep” value to separate names and add a prefix, respectively
  • When there are multiple values for a certain row/group (see below image) you can now wrap up these values into a list. A definite upgrade from spread()/gather() where this action was not possible.

An alternative to the above is to use values_fn = argument to specify a function to summarize a set of values in a column (such as taking the mean() of the three values), this may be the optimal strategy if you do not want list-columns everywhere in your data frame.

Some discussion followed on the #TokyoR hashtag as some questioned whether the complicated gamut of arguments to the new pivot_*() functions differs from the tidyverse style of simple verbs describing the user’s action. Others commented that the very messy nature of real world data necessitates the extra layer of complexity in the spread()/gather()/pivot_*() functions.

It has been speculated that tidyr 1.0.0 was going to be released by the useR! Conference in early July, however more likely it will be the late July if not early August. As this was only a 20 minute talk, @yutannihilation wasn’t able to talk about every change such as information regarding specs but I’ve listed resources for further reading below. Do keep in mind that the vignettes below are still in development and explanations and examples may change.

Additional resources:

LTs

flaty13: RMarkdown Template for Analytics Teams!

@flaty13, who has previously presented at both Tokyo.R and Japan.R, gave a presentation on creating RMarkdown templates at his workplace. The usual workflow at his company is that those that use R knit a RMD into a .html file then upload it to their web server to share. However, every team member has a different level of expertise of RMarkdown and as such it can be difficult for others to understand the structure and flow of another person’s report. To remedy this @flaty13 decided to create a RMarkdown template for everybody to use!

One of the main differences that needed to be standardized was how people loaded packages as well as read in data so after a “Summary” section at the top the next two sections were:

  • “Preparation” section: Loaded all the libraries, configured RMD options, and defined any custom functions
  • “Data Load & Check” sections: Reads in all the data, does the processing and checking as well. This sections makes it clear where the data is coming from.

He also took advantage of the “Table of Contents”, “Code folding”, and “Tabset” options to create a report template that is easy to navigate and reduces the amount of clutter. These options were all called in via the template’s YAML header:

As a result of creating a standard template, @flaty13 found that it became a lot easier to understand other team members’ reports and it also became easier for new employees to get integrated into the RMarkdown reports workflow.

Additional resources:

igjit: Creating images from RAW data using R!

@igjit has come to present at Tokyo.R yet again, cooking up another interesting way to use R, this time for processing/editing images from RAW data! RAW is a certain file format that captures all of the image data recorded by a camera’s sensor when you take a photo. As no data is compressed in this format (compared to say, a JPEG) you are able to get very high quality images. One advantage of RAW is that it is able to record more levels of brightness (“bits”) which means you can make more adjustments in terms of exposure, contrast, and brightness when producing your image.

To do this @igjit primarily used the {imager}, {reticulate}, {tidyverse} packages while also using the Python library rawpy to actually load the RAW data into R.

After loading in the data you can use functions from {imager} to edit and process the image in different ways such as demosaic-ing and white balance editing.

@igjit has created a bookdown of the how-to and various tutorials of manipulating RAW images with R which you can find here!

In conclusion, @igjit talked about how R isn’t just a tool for statistical work and it can be used for other tasks too! Also, from pursuing this project he realized that using R for RAW image processing can be quite an advantage due to R’s non-standard evaluation capabilities, the %>%, and the ability to use vector operations on objects.

soriente: R Coding Styles

At her first Tokyo.R, @soriente, talked about different R coding styles. Coming from a PHP background she wondered if there were any “official” or certain guidelines in writing R code. The resource she came across was Google’s Style Guide.

One of the things she noted was that in R it doesn’t seem to matter if you wrap characters using single-quotes 'blah' or double-quotes "blahblah" which was odd for her as a PHP person as in that language there is a significant difference. Another style guide many R users maybe familiar with is the tidyverse style guide:

Below are some other resources for coding styles:

Other Talks

Food, Drinks, and Conclusion

This month’s food & drinks was an Italian-themed dinner with an assortment of pasta and pizza available. With a loud rendition of “kampai!” (cheers!) R users from all over Tokyo began to talk about their successes and struggles with R. A fun tradition at TokyoR is a Rock-Paper-Scissors tournament with the prize being free data science books:

Unfortunately, there was an incident at the Rock-Paper-Scissors Tournament when there was some “booing” heard when the Python book was announced as one of the prizes. Although seemingly made in joking manner it was still disrespectful not only to other Pythonistas at the meetup but also to the person who took the time and effort to donate the book for this event. What was good to see was that the organizers and others came out to admonish this kind of behavior. In the time that I’ve been going to Tokyo.R, the meetup has always been very welcoming of people from all backgrounds including Python, Excel (many R users in Japan have been trying to move their companies away from Excel to mixed success…), etc. We have also had presentations featuring Python alongside R via reticulate in the past, including a presentation in this session!

Despite this blip, it was another educational and fun-filled TokyoR session where I again lost in the Rock-Paper-Scissors Tournament in the first round… repeatedly. Someday I’ll win a book…someday! TokyoR happens almost monthly and it’s a great way to mingle with Japanese R users as it’s the largest regular meetup here in Japan. Talks in English are also welcome so if you’re ever in Tokyo come join us!

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)