80th #TokyoR Meetup Roundup: Econometrics vs. ML, Python with R, & Translating tidyverse.org into Japanese!

August 1, 2019
By

(This article was first published on R by R(yo), and kindly contributed to R-bloggers)

Within a typhoon, another TokyoR Meetup! … well not really it turned out
to be a false alarm and the weather was a wonderful 30 degrees Celsius
with 800% humidity as usual in Tokyo. My gripes with the weather aside this
month’s meetup was held at Cresco, an IT
management strategy company, in their headquarters in Shinagawa, Tokyo.

In line with my previous round up posts:

I will be going over around half of all the talks. Hopefully, my efforts
will help spread the vast knowledge of Japanese R users to the wider R
community. Throughout I will also post helpful blog posts and links from
other sources if you are interested in learning more about the topic of
a certain talk. You can follow Tokyo.R by searching for the
#TokyoR hashtag on Twitter.

Anyways…

Let’s get started!

BeginneR Session

As with every TokyoR meetup, we began
with a set of beginner user focused talks:

Main Talks

ill_identified: Econometrics vs. Machine Learning

@ill_identified gave a talk on a more general industry topic rather
than about R specifically by talking about the differences between econometrics and
machine learning/A.I. To start off he listed some good resources (in
English) to read from the economics/econometrics side of the discussion:

From looking at the methods that both fields use like multiple linear
regression, logistic regression (GLMs), Monte-Carlo Markov Chain (MCMC),
non-parametric regression, etc. it seems as though they might be the
same thing… but an important distinction can be made in that:

  • Economics/Econometrics == Causal Inference
  • Machine Learning == Prediction

Frank Harrell in Road Map for Choosing Between Statistical Modeling and
Machine Learning
as well as
Miguel Hernan in Data science is science’s second chance to get causal
inference right: A classification of data science
tasks
discuss this in length if you
want to take a look.

@ill_identified then took us through a couple examples of causal
inference from a variety of economics research focusing on the spread
and popularity of Randomized Control Trials (RCTs) and
Difference-In-Differences (DID) in the field.

There was a talk on DID at last year’s Japan.R
Conference
that you can find
here!

From the side of machine learning, Athey
(2018)
talked about how up to
the present, economists have been trying to fit their model to the
entirety of the data available to them, leading to potential problems of
over-fitting. To counteract this problem Athey notes how the field can
learn from machine learning by using cross-validation techniques and
penalized models.

For causal inference from the machine learning side of the discussion,
@ill_identified talked about Judea Pearl’s answer of Quora to the
question, “What are the differences between econometrics, statistics,
and machine
learning?”
.

In the answer, Judea Pearl makes the distinction between standard ML and
advanced ML, namely that the former (while specifically including deep
learning and neural networks in this category) “fits a function to a
stream of data and plays the same role as statistical analysis, taking
us from samples to properties of distribution functions.” while the
latter “goes beyond distributions onto the process that generates the
data, and so, allows us to manage policy interventions and
counter-factual reasoning”. He then points to two of his works, 7 Tools
of Causal Inference with Reflections on Machine
Learning
and The Book of
Why (written with Dana MacKenzie)
for
further reading. It is in the former work that Judea Pearl talks about
the three levels of the causal hierarchy:

  • Level 1: Association
  • Level 2: Intervention
  • Level 3: Counterfactuals

There is a lot of debate centered on “Potential Outcomes” theory posited
by Neyman & Rubin versus Pearl’s “Causal Graphs/SEM” approaches in the
past while Andrew Gelman has also talked about the issue
here
(2009)

and here
(2011)
.
Very recently, Guido Imbens submitted an article, Potential Outcome and
Directed Acyclic Graph Approaches to Causality: Relevance for Empirical
Practice in Economics
that discusses
this in length that is probably worth checking out as well!

From the Rubin side of the debate Guido Imbens, Susan Athey, and Viktor
Chernozhukov stand out as the primary researchers.

Athey:

Chernozhukov:

In the final section @ill_identified went over a few new methods in
A.I./ML that provided some evidence to show that A.I./ML does indeed
have similarities with econometrics mainly through the application of
the structural estimation approach to modeling. A good overview is
Artificial Intelligence as Structural Estimation: Economic
Interpretations of Deep Blue, Bonanza, and AlphaGo – Mitsuru
Igami
while you can read about the
specific AIs discussed in more detail below.

kilometer00: R interface to Python

TokyoR organizer and frequent BeginneR session speaker @kilometer00
talked about using Python with R.

To familiarize the audience with Python he went over quite a number of
slides showing the similarities and differences in syntax between the
two languages.


Next, @kilometer00 talked about the {reticulate} package which allows
you to call Python from R and can provide translation between R and
Python objects (such as R and Pandas data frames or R matrices and NumPy
arrays). Using {reticulate} he talked about the importance of having an
isolated and independent environment, to keep Python in a “sandbox”-ed
virtual environment for security and reproducibility. To do this
@kilometer00 likes to use Pipenv.


Once you’re done with all the set-up, you can install {reticulate} from
CRAN and attach your Python virtualenv with reticulate::use_python()
and then you can finally start doing stuff! But be wary of type errors
when you’re coding:

You can also use Python in a R Markdown document by setting the code
chunk to run it. With a recent development in RMD you can now also share
objects from different languages by putting a prefix in front of the
object name!

Funnily enough you can also run R in Python in R:

Pythonception!

More resources on {reticulate}:

LTs

wkwk_soprano: Creating network graphs with R!

It’s been a while since @wkwk_soprano used R (5 years!) but he’s come
back with aplomb by talking about network graphs at Tokyo.R! Network
graphs are used in all sorts of fields of study including physics,
chemistry, linguistics, and the social sciences. In industry you might
see them as part of a recommendation graph between a customer and
products on sale. Frustrated by the fact that he didn’t have a fun data
set to use the {network} package on, @wkwk_soprano decided to create
his own data set based on his favorite manga, One Piece!

By counting up the times a character appeared in one panel of the manga
with another he slowly built up a co-occurrence matrix of all the
characters from Volume 1 to Volume 23. It took him about one hour per
volume to create this data set, now that’s dedication!

You can find the gum-gum fruits of his labor here.

After creating the data @wkwk_soprano wanted to do some analysis on it
like graph embedding via DeepWalk or Large-scale Information Network
Embedding (LINE). There’s actually a R package called
Rline to implement this method but
he found that it was difficult to install and it hadn’t been updated in
a while so he went with the original C++
implementation
from Jian Tang et
al. The result was an output of the distributed representation of all
the characters in the data.

Lastly, @wkwk_soprano wanted to find similarities between One Piece
characters so he used cosine similarity using this code
snippet

which allows you to extract top ‘N’ similar items from network embedding
matrices. Taking a look at some popular characters he was somewhat
disappointed in the results as from his extensive knowledge of the story
he knew some of the character similarities just weren’t right!

More resources on network analysis in R:

gepuro: Translating tidyverse.org into Japanese!

After being involved in the Japanese translation of Feature Engineering
for Machine Learning: Principles and Techniques for Data Scientists
,
@gepuro thought about trying his hand at translating tidyverse.org
in Japanese!

In recent months there have been big changes in major tidyverse
packages such as {tidyr} and {ggplot2} with accompanying articles to
boot. These articles, especially the new pivot functions vignette, are
the ones @gepuro and fellow TokyoR community members such as @Atsusy
have started working on in the past few weeks. To do the translation
there are three key steps:

  1. Create a Github account
  2. Log into GitLocalize
  3. Access the specific GitLocalize repo where your translation project
    is located

GitLocalize looks like this:

Once you’re done, you create a “Review Request” which is checked by the
maintainer @gepuro for any errors. He’ll receive the “Review Request”
as a Pull Request on the R Lang Document JA repo and if everything is
OK it’ll be merged in!

There are other ways to contribute to the project as well such as:

  • Helping to make the text sound more naturally Japanese
  • Create the blogdown website of the Japanese translation
  • Create a vocab list of common R terminology in Japanese to use as a
    reference
  • and more!

I enjoyed this talk as it was similar to the talk by Riva Quiroga on
translating the “R for Data Science” book and R data sets into Spanish
that I heard at user!2019 a few weeks ago (the talk is covered in my
blogpost
here
). If
you’re good at English and Japanese you can join the #translation
channel on the Tokyo.R slack!

In other news, there was an announcement that this year’s Japan.R
Conference
will be on December 7th!

Other Talks

Food, Drinks, and Conclusion

TokyoR happens almost monthly and it’s a great way to mingle with
Japanese R users as it’s the largest regular meetup here in Japan. We’re
finally taking a break next month so the next meetup will be on
September 28 and it will be a special session in {Shiny}!

Talks in English are also welcome so if you’re ever in Tokyo come join
us!

To leave a comment for the author, please follow the link and comment on their blog: R by R(yo).

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)