## Accessing iNaturalist data

March 26, 2014
The iNaturalist project is a really cool way to both engage people in citizen science and collect species occurrence data. The premise is pretty simple, users download an app for their smartphone, and then can easily geo reference any specimen they see, uploading it to the iNaturalist website. It let's users turn casual observations into meaningful...

## RProtoBuf 0.4.1

March 25, 2014
A new bug-fix release release 0.4.1 of RProtoBuf, is now on CRAN. RProtoBuf provides GNU R bindings for the Google Protocol Buffers ("Protobuf") data encoding library used and released by Google, and deployed as a language and operating-system agno...

## R 101: Summarizing Data

March 25, 2014
When working with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. While not as “efficient” in relation to Excel pivot tables, R also

## Practical Data Science with R: Release date announced

March 25, 2014
It took a little longer than we’d hoped, but we did it! Practical Data Science with R will be released on April 2nd (physical version). The eBook version will follow soon after, on April 15th. You can preorder the pBook now on the Manning book page. The physical version comes with a complimentary eBook version Related posts:

## Using R: quickly calculating summary statistics from a data frame

March 25, 2014
A colleague asked: I have a lot of data in a table and I’d like to pull out some summary statistics for different subgroups. Can R do this for me quickly? Yes, there are several pretty convenient ways. I wrote about this in the recent post on the barplot, but as this is an important

## A Thumbnail History of Ensemble Methods

March 25, 2014
By Mike Bowles Ensemble methods are the backbone of machine learning techniques. However, it can be a daunting subject for someone approaching it for the first time, so we asked Mike Bowles, machine learning expert and serial entrepreneur to provide some context. Ensemble Methods are among the most powerful and easiest to use of predictive analytics algorithms and R...

## Interactive Discovery of Research Affiliates JoPM Paper

March 25, 2014
In my previous post More on Rebalancing | With Data from Research Affiliates , I did some really basic visualizations, but I thought this data would be great for some more powerful interactive discovery using an interesting javascript SQL-like query l...

## Filtering Data with L2 Regularisation

March 25, 2014
$Filtering Data with L2 Regularisation$

I have just finished reading Momentum Strategies with L1 Filter by Tung-Lam Dao. The smoothing results presented in this paper are interesting and I thought it would be cool to implement the L1 and L2 filtering schemes in R. We’ll start with the L2 scheme here because it has an exact solution and I will

## Wright Map Tutorial – Part 3

March 25, 2014
In this part of the tutorial, we’ll show how to load ConQuest output to make a CQmodel object and then WrightMaps. We’ll also show how to turn deltas into thresholds. All the example files here are available in the /inst/extdata folder of the github. If you download the latest version of the package, they should be in a folder...

March 25, 2014
Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign...