## Visualizing principal components with R and Sochi Olympic Athletes

March 27, 2014
By

Principal Components Analysis (PCA) is used as a dimensionality reduction method. Here we simply explain PCA step-by-step using data about Sochi Olympic Curlers. It is hard to visualize a high dimensional space. When I took linear algebra, the book and teachers spoke about it as if were easy to visualize a hyperspace, but...

## Alluvial diagrams

March 27, 2014
By

Parallel coordinates plot is one of the tools for visualizing multivariate data. Every observation in a dataset is represented with a polyline that crosses a set of parallel axes corresponding to variables in the dataset. You can create such plots in R using a function parcoord in package MASS. For example, we can create such

## New Shiny website launched; Shiny 0.9 released

March 27, 2014
By

We’re excited to introduce to you our new website for Shiny: shiny.rstudio.com! We’ve included articles on many Shiny-related topics, dozens of example applications, and an all-new tutorial for getting started. Whether you’re a beginner or expert at Shiny, we hope that having these resources available in one place will help you find the information you need.

## R User Group Activity for Q1 2014

March 27, 2014
By

by Joseph Rickert Worldwide R user group activity for the first Quarter of 2014 appears to be way up compared to previous years as the following plot shows. The plot was built by counting the meetings on Revolution Analytics R Community Calendar. R users continue to value the live, in person events and face-to-face meetings with their peers. Moreover,...

## GIS in R: Part 1

March 27, 2014
By

I messed around with R for years without really learning how to use it properly. I think it’s because I could always throw my hands up when the going got tough and run back and cling the skirts of Excel or JMP or Systat. I finally learned how to use R when I needed to

## Filtering Data with L1 Regularisation

March 27, 2014
By
$Filtering Data with L1 Regularisation$

A few days ago I posted about Filtering Data with L2 Regularisation. Today I am going to explore the other filtering technique described in the paper by Tung-Lam Dao. This is similar to the filter discussed in my previous post, but uses a slightly different objective function: where the regularisation term now employs the L1

## sjPlot 1.3 available #rstats #sjPlot

March 27, 2014
By

I just submitted my package update (version 1.3) to CRAN. The download is already available (currently source, binaries follow). While the last two updates included new functions for table outputs (see here and here for details on these functions), the current update only provides small helper functions as new functions. The focus of this update

## Evolution of Code

March 27, 2014
By

Recently while scraping some data from the college football data warehouse site, I started to realize the evolution of my code. To preface this, I am definitely not a trained programmer, just a self taught junky who enjoys doing it when I have time. ...

## Seasonal Unit Roots

March 26, 2014
By

As discussed in the MAT8181 course, there are – at least – two kinds of non-stationary time series: those with a trend, and those with a unit-root (they will be called integrated). Unit root tests cannot be used to assess whether a time series is stationary, or not. They can only detect integrated time series. And the same holds...

## Visualising Pandas DataFrames With IPythonBlocks – Proof of Concept

March 26, 2014
By

A few weeks ago I came across IPythonBlocks, a Python library developed to support the teaching of Python programming. The library provides an HTML grid that can be manipulated using simple programming constructs, presenting the outcome of the operations in a visually meaningful way. As part of a new third level OU course we’re putting

## Give your R charts that Wes Anderson style

March 26, 2014
By

I'm a big fan of Wes Anderson's movies. I love the quirky characters and stories, the distinctive cinematography, and the unique visual style. Now you can bring some of that style to your own R charts, by making use of these Wes Anderson inspired palettes. Just choose your favourite Wes Anderson film or short: Install the wesanderson pallettes package,...

## Using R: quickly calculating summary statistics (with dplyr)

March 26, 2014
By

I know I’m on about Hadley Wickham‘s packages a lot. I’m not the president of his fanclub, but if there is one I’d certainly like to be a member. dplyr is going to be a new and improved ddply: a package that applies functions to, and does other things to, data frames. It is also

## MCMC for Econometrics Students – Part IV

March 26, 2014
By

This is the fourth in a sequence of posts designed to introduce econometrics students to the use of Markov Chain Monte Carlo (MCMC, or MC2) simulation methods for Bayesian inference. The first three posts can be found here, here, and here, and I'll assume that you've read them already. The emphasis throughout is on the...

## How to open an SPSS file into R

March 26, 2014
By

R is a powerful system for statistical analysis and data visualization. However, it’s not exactly user-friendly for data storage, so, still for several time your data will be archived using Excel, SPSS or similar programs. How to open into R … Continue reading →

## Accessing iNaturalist data

March 26, 2014
By

The iNaturalist project is a really cool way to both engage people in citizen science and collect species occurrence data. The premise is pretty simple, users download an app for their smartphone, and then can easily geo reference any specimen they see, uploading it to the iNaturalist website. It let's users turn casual observations into meaningful...

## RProtoBuf 0.4.1

March 25, 2014
By

A new bug-fix release release 0.4.1 of RProtoBuf, is now on CRAN. RProtoBuf provides GNU R bindings for the Google Protocol Buffers ("Protobuf") data encoding library used and released by Google, and deployed as a language and operating-system agno...

## R 101: Summarizing Data

March 25, 2014
By

When working with large amounts of data that is structured in a tabular format, a common operation is to summarize that data in different ways using specific variables. In Microsoft Excel, pivot tables are a nice feature that is used for this purpose. While not as “efficient” in relation to Excel pivot tables, R also

## Practical Data Science with R: Release date announced

March 25, 2014
By

It took a little longer than we’d hoped, but we did it! Practical Data Science with R will be released on April 2nd (physical version). The eBook version will follow soon after, on April 15th. You can preorder the pBook now on the Manning book page. The physical version comes with a complimentary eBook versionRelated posts:

## Using R: quickly calculating summary statistics from a data frame

March 25, 2014
By

A colleague asked: I have a lot of data in a table and I’d like to pull out some summary statistics for different subgroups. Can R do this for me quickly? Yes, there are several pretty convenient ways. I wrote about this in the recent post on the barplot, but as this is an important

## A Thumbnail History of Ensemble Methods

March 25, 2014
By

By Mike Bowles Ensemble methods are the backbone of machine learning techniques. However, it can be a daunting subject for someone approaching it for the first time, so we asked Mike Bowles, machine learning expert and serial entrepreneur to provide some context. Ensemble Methods are among the most powerful and easiest to use of predictive analytics algorithms and R...

## Interactive Discovery of Research Affiliates JoPM Paper

March 25, 2014
By

In my previous post More on Rebalancing | With Data from Research Affiliates , I did some really basic visualizations, but I thought this data would be great for some more powerful interactive discovery using an interesting javascript SQL-like query l...

## Filtering Data with L2 Regularisation

March 25, 2014
By
$Filtering Data with L2 Regularisation$

I have just finished reading Momentum Strategies with L1 Filter by Tung-Lam Dao. The smoothing results presented in this paper are interesting and I thought it would be cool to implement the L1 and L2 filtering schemes in R. We’ll start with the L2 scheme here because it has an exact solution and I will

## Wright Map Tutorial – Part 3

March 25, 2014
By

In this part of the tutorial, we’ll show how to load ConQuest output to make a CQmodel object and then WrightMaps. We’ll also show how to turn deltas into thresholds. All the example files here are available in the /inst/extdata folder of the github. If you download the latest version of the package, they should be in a folder...

March 25, 2014
By

Sankey diagrams are great for visualising flows from one set of data values to another. Although named after Irish Captain Matthew Henry Phineas Riall Sankey, who used this type of diagram in 1898 to show the energy efficiency of a steam engine, the best know Sankey diagram is probably Charles Minard's Map of Napoleon's Russian Campaign...

## Which states are the most concerned by gun crime?

March 24, 2014
By

I recently discovered the Capitol Words API and have had some fun playing around with it. One of the categories in the API allows you to search for the words spoken by the senators of each state in the USA, and I was interested in finding out the number of times the words “gun” were recorded on a

## WrightMap Tutorial – Part 2

March 24, 2014
By

Plotting Multidimensional & Polytomous ModelsRemember: you can find the other parts of the tutorial here:Part 1: Plotting Unidimensional Dichotomous Models Part 3: Using Conquest Output & Making ThresholdsMultidimensional modelsIn Part 1, we reviewed how to install the package from GitHub and how to customize unidimensional and dichotomous models. Now in Part 2, we’ll look at graphing some...

## Free eBook on Big Data and Data Science

March 24, 2014
By

The fine folks behind the Big Data Journal have just published a new e-book Big Data: Harnessing the Power of Big Data Through Education and Data-Driven Decision Making. (Note: Adobe Flash is required to view the e-book.) In the eBook, you'll find the following technical papers on the topics of Big Data, Data Science, and R: Data Science and...

## Hack, a template for improving code reliability

March 24, 2014
By

My sole prediction for 2014 has come true, Facebook have announced the Hack language (if you don’t know that HHVM is the Hip Hop Virtual Machine you are obviously not a trendy developer). This language does not follow the usual trend in that it looks useful, rather than being fashion fluff for corporate developers to