Last year I decided to take out the domain Choroplethr.com. I used it to host information about Choroplethr, my suite of R packages for mapping... The post New Version of Choroplethr.com! appeared first on AriLamstein.com.

Richard McElreath inquires: I was helping a colleague recently fix his MATLAB code by using log_sum_exp and log1m tricks. The natural question he had was, “where do you learn this stuff?” I checked Numerical Recipes, but the statistical parts are actually pretty thin (at least in my 1994 edition). Do you know of any books/papers The post Where do...

Richard McElreath inquires: I was helping a colleague recently fix his MATLAB code by using log_sum_exp and log1m tricks. The natural question he had was, “where do you learn this stuff?” I checked Numerical Recipes, but the statistical parts are actually pretty thin (at least in my 1994 edition). Do you know of any books/papers The post Where do...

Some time ago, Maëlle Salmon published a very nice post showing how she scraped her mathematical family tree from the Mathematics Genealogy Project. Of course I immediately wanted to produce my own! I am not a mathematician myself, but one of my PhD supervisor has a PhD in mathematics. Which makes me the indirect descendant of a long lineage...

It’s a challenge for an experienced user to remember what it was like to be totally new to R and come up with explanations that don’t draw on understanding developed subsequently. Terminology with which you have become very familiar is, in fact, jargon. So I asked a novice, Elliot, to explain a piece of code… Continue reading R Code...

If your new to modelling in R and don’t know what this title means, you definitely want to look into doing it. I’ve always been a fan of converting model outputs to real-life quantities of interest. For example, I like to supplement a logistic regression model table with predicted probabilities for a given set of … Continue reading "Finalfit...

Problem How do I change the name of just one column in a data frame? Context This is a simple one that keeps coming up. Sometimes, whoever put together my data decided to capitalize the first letter of some column names and not others. Sometimes I’ve merged several data frames together and I need to … Continue reading "Changing...

Generally speaking, if the code does any simulations, it is a good practice to set a seed to make the code reproducible. Setting a seed ensures that the same (pseudo-)random numbers will be generated each time the script is executed. Surprisingly, I found really few posts dedicated to any convention, best practice, or routine of setting a seed in...

I’ll be giving talks and workshops at the following three upcoming conferences; hope to meet some of you there! From 15th to 17th October 2018, I’ll be in London for the M-cubed conference. My talk about Explaining complex machine learning models with LIME will take place on October 16 Traditional machine learning workflows focus heavily on model training and optimization; the...

Evolutionary biologists are increasingly using R for building, editing and visualizing phylogenetic trees. The reproducible code-based workflow and comprehensive array of tools available in packages such as ape, phangorn and phytools make R an ideal platform for phylogenetic analysis. Yet the many different tree formats are not well integrated, as pointed out in a recent post. The standard data structure for phylogenies in R is the “phylo” object, a memory...

rquery and rqdatatable are new R packages for data wrangling; either at scale (in databases, or big data systems such as Apache Spark), or in-memory. The speed up both execution (through optimizations) and development (though a good mental model and up-front error checking) for data wrangling tasks. Win-Vector LLC‘s John Mount will be speaking on … Continue reading John...

Here is the course link. Course Description Python and R have seen immense growth in popularity in the "Machine Learning Age". They both are high-level languages that are easy to learn and write. The language you use will depend on your background and field of study and work. R is a language made by and for statisticians, whereas Python is a...

Though Python is usually thought of over R for doing system administration tasks, R is actually quite useful in this regard. In this post we’re going to talk about using R to create, delete, move, and obtain information on files. How to get and change the current working directory Before working with files, it’s usually The post R: How...

Let’s welcome the viridis palette into the new version of {ggplot2}! Viri-what ? viridis is one of the favorite color palettes of one of the member of the team (guesswho). The viridis palette was first developed for the python package matplotlib, and has been implemented in R since. The strengths of this palette are that: plots are beautiful (which...

One of the most difficult and most critical parts of implementing data science in business is quantifying the return-on-investment or ROI. As a data scientist in an organization, it’s of chief importance to show the value that your improvements bring. In this article, we highlight three reasons you need to learn the Expected Value Framework, a framework that connects...

Today we’ve had our workshop on “R for trial and model-based cost-effectiveness analysis”, at UCL. I really enjoyed the whole day — we had several interesting presentations and very lively discussion. In fact, all presenters have agreed to make their slides available, which I’ll put on the workshop webpage. One of the cool outputs is actually that we’ll use that...

Today we’ve had our workshop on “R for trial and model-based cost-effectiveness analysis”, at UCL. I really enjoyed the whole day — we had several interesting presentations and very lively discussion. In fact, all presenters have agreed to make their slides available, which I’ll put on the workshop webpage. One of the cool outputs is actually that we’ll use that...

In case you missed them, here are some articles from June of particular interest to R users. An animated visualization of global migration, created in R by Guy Abel. My take on the question, Should you learn R or Python for data science? The BBC and Financial Times use R — without post-processing — for publication graphics. "Handling Strings...

In my last post, I’d pointed out the importance of a local community and described how Delhi NCR useR group came about. It’s been a month since we’d started the group and I’m excited to announce that our first meetup is scheduled to take place on 14th July 2018, i.e. upcoming Saturday. Venue for the event is SocialCops headquarters...

Cone plots (also known as 3-D quiver plots) represent vector fields defined in some region of the 3-D space. A vector field associates to each point of coordinates (x, y, z) a vector of components (u, v, w). In this post, we’ll explore how Plotly’s cone plots can be used to visualize atmospheric wind 💨, magnetic fields, a trajectory of the...

In case you hadn’t spotted it already, a major new release of ggplot2 hit CRAN last week. Among the many new features, the one that caught my attention was support for simple features. Or in other words, it’s now really easy to plot spatial data in ggplot2. On Friday afternoon I spent a couple of hours playing with this new...

The file contains 2017 face-to-face post-election survey responses along with explanatory notes. Read the Stata DTA file into R with two these two lines: The...

hete is the wild west of analytics. no wrong answers yet - Bill Lattner Intro Generally lift models use a Qini coefficient to measure the performance of a model. However this metric is generally an indirect measure of what the user wants to achieve: profit. In this post I’ll discuss a measure of profitability in lift models. A neural network can...

The odds ratio always confounds: while it may be constant across different groups or clusters, the risk ratios or risk differences across those groups may vary quite substantially. This makes it really hard to interpret an effect. And then there is inconsistency between marginal and conditional odds ratios, a topic I seem to be visiting frequently, most recently last...

The Spatial Equilibrium concept is well known to urban economists. In a nutshell, it states that in equilibrium there are no rents to be gained by changing locations. Ed Glaeser begins Chapter 2 of his book: “Cities, Agglomeration, and Spatial Equilibrium” with the well known Alonso-Muth-Mills model. In this post, I want to summarize it briefly following Ed Glaeser...