This past weekend was the 9th JapanR Conference hosted at LINE
Corporation in Tokyo, Japan!
I’ve been back in Japan for nearly a year now and I’ve been going to
nearly every one of the R user meetups here,
TokyoR, and it’s been a great experience
to learn about R and its wide variety of uses by Japanese practitioners
and academics. Besides the near-monthly meetup of TokyoR there are
smaller gatherings spread throughout Japan such as
TsukubaR but the meeting that gathers the
biggest crowd is the JapanR Conference held every
December since 2010. Of course, there are outliers such as the special
TokyoR session this past July
when Joe Rickert and Hadley
Wickham visited Japan!
- Want to learn more about TokyoR and when the next meeting is?
Check this link! (Next meeting is
January 19th, 2019)
- Want to learn more about JapanR? Check this
This time around I took the time to take notes on the presentations and
write up a little round-up blog post about it. As much as I would like
to write about every single presentation there were a number of topics
where I really wouldn’t have been able to explain well even if the
presentation were done in English! You can watch most of the
presentations on the JapanR YouTube
Although the talks are in Japanese maybe you’ll still find something
useful in their slides… or you can read on as I give a summary on around
9 (out of 22) presentations that I found interesting!
NOTE: Some people presented using their Twitter/online name only,
it’s just a cultural thing I’ve found here relating to privacy.
NOTE 2: There are still several presentations/slides that haven’t
been uploaded yet but I will put more screenshots in as they become
available so please check back in the coming days!
Creating your own RMarkdown template! – Kazuhiro Maeda
Kazuhiro Maeda is well known in the Japanese R community mainly
through his online avatar (an elephant plushie) and his love for R
Markdown. For this conference he presented about creating your own
customized R Markdown template. Maeda-san noted that knowledge of CSS,
render() function works to create a document of your choice.
Within this explanation he highlighted how the output from a
call depends on templates and options set through Pandoc, therefore it
is important to create a template that has options that can be utilized
by Pandoc. As making a template from scratch is extremely difficult,
Maeda-san recommended that you find an existing template and play around
with it to get used to the process involved.
An example template Maeda-san worked on was: having an image pop out
when you click on it in your R markdown document. To do this you need to
Pandoc template that calls on this library at the appropriate time (when
you click on an image). Following a very thorough and technical
explanation he showed us the fruits of his labor in a live-demo that you
can see here, where he knits the
R Markdown document and clicks on a plot image, et Voila! It pops up
Since I wasn’t able to translate the template creation process well
enough (live-translating technical stuff is hard!), I will leave some
good links to creating your own R Markdown template in English below:
chapter from Yihui Xie’s R Markdown: The Definitive Guide
- A list
of R Markdown template packages from Jianghao Wang
- An example: the R markdown templates used by Monash University
Department of Econometrics and Business
- A short tutorial
by Chester Ismay
Easy and modern data analysis with “R AnalyticFlow”! – Ryota Suzuki
Ryota Suzuki, CEO of
ef-prime and author of the
pvclust package, gave a
talk on R AnalyticFlow which is a free
software that his company built that utilizes the R environment for
statistical computing in a GUI format. R AnalyticFlow was created in
Java and is compatible with Windows, Mac, and Linux OSs as well as being
available in English, Japanese, Chinese, and many more languages.
As you can see in the picture above, R AnalyticFlow allows you to
represent your data analysis workflow through nodes and edges in a
descriptive flow chart. In previous versions of the GUI, the goal was to
use as much of base R functions as possible but more recently R data
analyses including predictive modeling have been relying heavily on
external packages such as the
xgboost, etc. So
now the new direction Suzuki-san wants to take is to implement these
packages into R AnalyticFlow and provide support to users who want
to install their own packages to use in the GUI. Lastly, in a live
demonstration he showed us a
development version of the GUI as he made some simple
with a simple mouse-and-click. Due to the new direction R
AnalyticFlow is taking, Suzuki-san is looking for Java developers to
help contribute to the development of the new versions of the GUI. If
you know your way around Java and want to help, let him know!
I have completely understood Shiny! – Med_KU
In what was a very lively and fun presentation,
@Med_KU took us
through a very comprehensive tour of Shiny apps. First he talked about
how to create a Shiny app via R Studio, working with the app.R and ui.R
files, and publishing through R Studio Connect. Afterwards, he went
through many examples with
googleVis showing all the
interactive/reactive capabilities that Shiny apps are known for.
Personally, I’m more of a
ggiraph fan myself (I use it at work for
flexdashboards and Shiny apps) but
@Med_KU’s presentation has gotten
me interested in trying
googleVis out as well!
I recommend watching the
of the presentation as
@Med_KU goes through a lot of different
DID Analysis with R! – Yuki Yagi
University student Yuki Yagi presented on DID
(Difference-in-differences) analysis and how he utilized it in one of
his research papers. For those unfamiliar, DID is a statistical
technique that observes that differential effect of a treatment
(training program, medication intake, etc.) on a treatment group vs. a
control group. A quick overview of DID can be found
The main question that Yagi-san investigated in his research paper was:
“What would be the impact on the number of patents produced when
research subsidies were given to companies that were already highly
skilled and had a track record of producing many patents.”
DID is really easy to understand given the above diagram. On the left
hand side is the measurement of the outcome variable, in this case the
number of patents before the treatment while on the right is the
measurement of the number of patents after the treatment (research
subsidies) were given to the treatment group (blue dot is the control
and red dot is the treatment group).
Above is how the model looked like for the research question with the
use of dummy variables for B (subsidy), C (post-treatment), and D
(subsidy + post-treatment).
Rugby Analytics with R! – Koichi Kinoshita
Koichi Kinoshita, a rugby performance analyst for the
HITO-Communications Sunwolves and the
Northland Rugby Union (in New Zealand), gave a presentation on how he
applied his nascent R skills to his favorite sport. After giving a brief
explanation about the state of sports analytics in rugby and his resolve
to improve his data analytics skills in R, he showed us a number of
plots from data he gathered from the Japanese national rugby league
Throughout the presentation Kinoshita-san tried to answer several
questions such as “Is tackling percentage related to lost tries?”,
“Do a higher number of tackles help stop line-breaks?”, “Can you stop
line-breaks if you have a higher tackle success percentage?” among
others as he explained his results in thorough detail using plots as a
visual aid. Ultimately, his data showed that those Japanese rugby teams
with over 86% tackle success rate were able to limit line-breaks to 10
or less and were very likely to win matches while those teams with a
tackle success rate below 78% mostly wound up losing.
Armed with this knowledge, Kinoshita-san investigated further and found
that across the entire season around half the teams totaled around
100~150 tackles in any single game. Assuming an 80% tackle success
rate, a team with 100 tackles will have 20 mistackles while a team with
150 tackles will have 30 mistackles. So, the big question was: “How
much will this 10 mistackle difference cost a team?”
Consequently, Kinoshita-san ran a regression analysis on line-breaks
against a mistackles and found that on average you concede 10
line-breaks from 30 mistackles. Coupling this with data presented that
if a team concedes 10 line-breaks or over a team is ~70% likely to lose
a game, a higher number of attempted tackles isn’t necessarily a good
thing, what matters is preventing line-breaks with successful
In his conclusion, Kinoshita-san brought up a really good point in that
it’s not enough to look at pure success/fail percentages and he brought
up pass completion rate in soccer as an example. I concurred with his
statement as in soccer you could naively assume a team being “dominant”
or “good” if they have a really high pass completion rate, but if most
of those successful passes came from the defenders and goalkeeper
passing among themselves you can’t really say that that is a good thing.
In soccer (and in other sports) it’s important to dig a little bit
deeper, for example it might be more insightful to look at a soccer
team’s pass completion rate in the opposition’s third of the pitch!
Using linear regression to find a new home in Tokyo! – Kaori Sawamura
This presentation by Kaori Sawamura showed off a fun real life case
study using R. One of Sawamura-san’s co-workers wanted to move to Tokyo
and become a “city boy”, so she set out to use some of her newly learned
R skills to take try to find a dream home for him!
Here are the 3 basic requirements that Sawamura-san was given:
- Somewhere with a gym close by (preferably Tokyo Metropolitan
- Somewhere close to Sendagaya Station
- Somewhere with a monthly rent below 200,000 Yen
After filtering out houses above 200,000 Yen monthly rent Sawamura-san
fitted a multiple linear regression as seen below:
After looking at the diagnostic plots for the model she took out a few
outliers that she confirmed was mainly due to incorrect data on the
housing website and was able to cut her list down to around 70 houses!
Then using leaflet Sawamura-san mapped out all the potential houses that
fit her criteria and labelled them with details about the
After doing the analysis, she showed it to her co-worker and got him to
take a look around. Unfortunately, the information provided by the
website didn’t account for things such as construction work, cleanliness
and safety of the neighborhood, along with the added bonus of having to
live with the landlord. So this case study was also a great reminder
about the necessity of doing some field work in addition to analytics!
Using external C/C++ libraries with R! – Wataru Iwasaki
Wataru Iwasaki loves using C++ and R, he has given talks on the
subject before and this
time was no different as he talked about incorporating and using C/C++
libraries with R. First, he introduced a number of great online
resources for developing
Rcpp packages including stuff from his own
website as well as the free “Rcpp for Everyone (English
version)” written by fellow
Japanese R user Masaki Tsuda.
Next, he talked about a couple of basic steps needed to incorporate
C/C++ libraries as well as the advantages and disadvantages of the
various styles of doing so.
In Iwasaki-san’s concluding remarks he called on the community to share
their knowledge of handling external C++ libraries via social media or
through blog posts.
Build an R compiler…with R! – igjit
For this presentation
@igjit told us about his attempt at creating an
R compiler written in R! You can see the fruits of his labor in the
nrc package that he created here.
Note: currently it only works on Linux so you should use something
like Docker if you want to try it out on other OSs.
Here are some examples:
You can even assign and use variables! (the function that got cut off is
I unfortunately don’t have much experience with compilers so it was
tough for me to understand the technical details but it was another
pretty cool example of how you can use R for just about anything!
Tennis Player Ratings with R! – flaty13
In the second use of R in sports analysis we had
@flaty13 take a look
at tennis player ELO ratings. After a brief introduction relating to
tennis, ELO ratings, and his views on the difficulty of rating/ranking
tennis players, he used a data set from Kaggle and the
package to conjure up some visualizations on tennis player rankings over
As a matter of course, he also looked at Kei Nishikori’s performance
and highlighted his rapid rise in the rankings during 2014 where he
became the first Asian to reach a Grand Slam tournament
Overall, it was nice to see how sports analytics have grown these past
few years in Japan as besides this tennis presentation and the rugby
presentation earlier there have been presentations relating to soccer
and baseball analytics at past TokyoR meetups.
Following both the main talks and the LTs, sushi and drinks were served
at the after-party as R users from all over Japan shared their stories
of success and struggle alike!
The main organizer, Atsushi Hayakawa
mentioned that he eventually wants JapanR to grow even bigger in the
coming years and to have every participant give a LT! Whether as a joke
or if that is actually feasible it would be cool if we set the Guiness
World Record for Most Presentations at an R Conference!
As you saw the quality of presentations at JapanR was very high.
Unfortunately, most of the content was only in Japanese which I thought
was a shame. That’s why I thought of doing this to share the knowledge
of Japanese R Users to those around the world! This is my first time
writing up one of these and I hope to contribute more and improve in the
years to come!
If you’re ever in Japan, come join us for some R&R … and R!