## Wind in Netherlands

November 15, 2015
By

In climate change discussions, everybody talks about temperature. But weather is much more than that. There is at least rain and wind as directly experienced quality, and air pressure as measurable quantity. In the Netherlands, some observation station...

## Bioenergetics in R Workshop

November 15, 2015
By

It was just brought to my attention that there will be a workshop at the upcoming Midwest Fish and Wildlife Conference (Grand Rapids, MI) on the Bioenergetics 4.0 shiny app. The announcement from here (where there is a registration link) is ...

## Retrieving Data from Google Books with `ngramr`

November 14, 2015
By

Karl Marx is the most famous founding fathers of modern sociology with a popularity peak in 1975-6, but declining ever since. Introduction Google has a tool for tracking the frequency of words or phrases across its vast collection of scanned texts, the Google Books. The Google Ngram Viewer reports data and graphs the frequency of words encountered...

## Retrieving Data from Google Books with `ngramr`

November 14, 2015
By

Karl Marx is the most famous founding fathers of modern sociology with a popularity peak in 1975-6, but declining ever since. Introduction Google has a tool for tracking the frequency of words or phrases across its vast collection of scanned texts, the Google Books. The Google Ngram Viewer reports data and graphs the frequency of words encountered...

November 14, 2015
By

One of the highlights of my recent east coast trip was meeting Ezra Haber Glenn, the author of the acs package in R. The acs package is my primary tool for accessing census data in R, and I was grateful to spend time with its author. My goal was to learn how to “take the next step” in The post

## Correlation and Linear Regression

November 14, 2015
By

Before going into complex model building, looking at data relation is a sensible step to understand how your different variable interact together. Correlation look at trends shared between two variables, and regression look at causal relation between a predictor (independent variable) and a response (dependent) variable. Correlation As mentioned above correlation look at global movement

## Linear model with time series random component

November 14, 2015
By

What do auto-correlated residuals do to your linear model? For training purposes I wanted to illustrate the dangers of ignoring time series characteristics of the random part of a classical linear regression, and I came up with this animation to do it: I like this, because it shows how easy it is to fit something that looks to be...

## James Bond movies

November 14, 2015
By

James Bond: Do you expect me to talk? Auric Goldfinger: No, Mr. Bond, I expect you to die! James Bond I’m a big James Bond fan, so naturally I went to watch the new Bond movie Spectre which – spoiler alert! – is pretty bad. It also got me to reminice about the good Bond films of the past....

## What it means to be a US Veteran Today

November 13, 2015
By

Six easy graphs that tell a big story:1. You represent a much small portion of the American people than veterans in the 1980s.2. You currently have the highest risk of being classified as poor for any time period since 1980. Since 2005, the rate of pov...

## Blog Post at Pluralsight

November 13, 2015
By

Final post in the three part series is now up at Pluralsight.  The series is geared towards business users so if you have some friends that you are encouraging to set aside their spreadsheets and take R for a spin - send them this way!http://blog....

November 13, 2015
By

By Paulin Shek Working at Mango is generally busy, fun, but also at times, quite surreal. Lunchtime conversation amongst the consultants can get quite animated, especially when Andy Nicholls, the Head of Consultancy, finds a topic that he has a … Continue reading →

## In case you missed it: October 2015 roundup

November 13, 2015
By

In case you missed them, here are some articles from October of particular interest to R users. A video from the PASS 2015 conference in Seattle shows R running within SQL Server 2016. The preview for SQL Server 2016 includes Revolution R Enterprise (as SQL Server R Services). A way of dealing with confounding variables in experiments: instrumental variable...

## Annotables: R data package for annotating/converting Gene IDs

November 13, 2015
By

I work with gene lists on a nearly daily basis. Lists of genes near ChIP-seq peaks, lists of genes closest to a GWAS hit, lists of differentially expressed genes or transcripts from an RNA-seq experiment, lists of genes involved in certain pathways, etc. And lots of times I’ll need to convert these gene IDs from one identifier to another....

## Szkolenie z analizy sieciowej

November 13, 2015
By

Summary in English: We are organizing a two-day workshop on network analysis in R. The dates are 2-3 of December, 2015. The workshop will be in Polish. For more information and registration see this page. Zapraszamy na szkolenie z analizy sieciowej w R w dniach 2-3 grudnia 2015. Analiza sieci społecznych (ang. Social Network Analysis,

## Applied Statistical Theory: Quantile Regression

November 13, 2015
By
$Applied Statistical Theory: Quantile Regression$

This is part two of the ‘applied statistical theory’ series that will cover the bare essentials of various statistical techniques. As analysts, we need to know enough about what we’re doing to be dangerous and explain approaches to others. It’s not enough to say “I used X because the misclassification rate was low.” Standard linear

## Let’s meet on SatRdays: the link between RUGs and conferences

November 13, 2015
By

I am always very happy to attend local R meetups and international R conferences, as these are great opportunities tomeet other R users, developers, rock stars and friends from all around the world/GH/SO/Twitter etc, listen to inspiring presentati...

## importance sampling with infinite variance

November 12, 2015
By

“In this article it is shown that in a fairly general setting, a sample of size approximately exp(D(μ|ν)) is necessary and sufficient for accurate estimation by importance sampling.” Sourav Chatterjee and Persi Diaconis arXived yesterday an exciting paper where they study the proper sample size in an importance sampling setting with no variance. That’s right,

## H2O World 2015

November 12, 2015
By

by Joseph Rickert The second, annual H2O World conference finished up yesterday. More than 700 people from all over the US attended the three-day event that was held at the Computer History Museum in Mountain View, California; a venue that pretty much sits well within the blast radius of ground zero for Data Science in the Silicon Valley. This...

## Graph from Sparse Adjacency Matrix

November 12, 2015
By

I spent a decent chunk of my morning trying to figure out how to construct a sparse adjacency matrix for use with graph.adjacency(). I'd have thought that this would be rather straight forward, but I tripped over a few subtle issues with the Matrix package. My biggest problem (which in retrospect seems rather trivial) was The post

## magrittr: The best thing to have ever happened to R?

November 11, 2015
By

By Aimee Gott At EARL 2014 I saw Hadley Wickham using the pipe operator from Stefan Milton Bache’s magrittr whilst also presenting the functionality of dplyr. I remember thinking at the time that this was going to be the new … Continue reading →

## Best practices for handling packages in R projects

November 11, 2015
By

by Andrie de Vries For much of my data science work, I want to have the very latest package from CRAN or github. However, once any work finds it way into production server (where it runs on a regular schedule), I want my environment to be stable. Most importantly, for these projects I want to ensure I have reproducible...

## Using MonetDB[Lite] with real-world CSV files

November 11, 2015
By

MonetDBLite (for R) was announced/released today and, while the examples they provide are compelling there’s a “gotcha” for potential new folks using SQL in general and SQL + MonetDB + R together. The toy example on the site shows dumping mtcars with dbWriteTable and then doing things. Real-world CSV files have headers and commas (MonetDB

## Bootstrapping standard errors for difference-in-differences estimation with R

November 10, 2015
By

I’m currently working on a paper (with my colleague Vincent Vergnat who is also a Phd candidate at BETA) where I want to estimate the causal impact of the birth of a child on hourly and daily wages as well as yearly worked hours. For this we are using non-parametric difference-in-differences (henceforth DiD) and thus have...

## Free Webinar: Learn to Map Unemployment Data in R

November 10, 2015
By

Last month I ran my first webinar (“Make a Census Explorer with Shiny”). About 100 people showed up, and feedback from the participants was great. I also had a lot of fun myself. Because of this, I’ve decided to do one more webinar before my free trial with the webinar service ends. Here are the The post

## Le Monde puzzle [#937]

November 10, 2015
By

A combinatoric Le Monde mathematical puzzle that resembles many earlier ones: Given a pool of 30 interns allocated to three person night-shifts, is it possible to see 31 consecutive nights such that (a) all the shifts differ and (b) there are no pair of shifts with a single common intern? In fact, the constraint there

## The Data Science Industry: Who Does What (Infographic)

November 10, 2015
By

Nowadays, the data science field is hot, and it is unlikely that this will change in the near future. While a data driven approach is finding its way into all facets of business, companies are fiercely fighting for the best data analytic skills that are available in the market, and salaries for data science roles The post

## fluent-r: a new R analytics integration library for JVM developers

November 10, 2015
By

by David Russell, fluent-r developer fluent-r is a new R analytics integration library for JVM application developers that improves upon existing solutions for integrating R analytics services delivered by popular open source R integration servers DeployR and OpenCPU. The fluent-r library provides a natural-language DSL alongside a simple API that can be used to replace or complement existing use...

## Interactive charts using htmlwidgets

November 10, 2015
By

This was a deck used in my presentation to the Inland Northwest R user Group this past Friday (November 6, 2015). The introduction of htmlwidgetshas opened up a wide-range of options for R-users without having the need to pick-up on Java...