The five element ninjas approach to teaching design matrices

April 25, 2016
By
The five element ninjas approach to teaching design matrices

Design matrices unite seemingly disparate statistical methods, including linear regression, ANOVA, multiple regression, ANCOVA, and generalized linear modeling. As part of a hierarchical Bayesian modeling course that we offered this semester, we wanted our students to learn about design matrices to facilitate model specification and parameter interpretation. Naively, I thought that I could spend a few minutes in class reviewing matrix...

Read more »

Webinar April 28: Effective Graphs with Microsoft R Open

April 25, 2016
By
Webinar April 28: Effective Graphs with Microsoft R Open

Naomi Robbins, author of Creating More Effective Graphs and Forbes contributor has teamed up with daughter Dr Joyce Robbins to present a new webinar this Thursday April 28, Creating Effective Graphs with Microsoft R Open. The webinar will demonstrate how to create a variety of useful graphics with R: comparisons, distributions, trends over time, relationships, divisions of a whole,...

Read more »

Bring Your Data to Life with googleVis and R – Tutorial

April 25, 2016
By

In this new demonstration, you will learn how to use the googleVis package in R to create a some beautiful interactive charts. The charts created here are inspired by the work of data guru Hans Rosling. Step-by-step, you will transform development statistics into moving bubbles that help quickly tell a powerful story. GoogleVis provides an interface between R and the...

Read more »

Free Workshop: Mapping Open Data in R

April 25, 2016
By
Free Workshop: Mapping Open Data in R

On May 17 I will be running a free workshop titled Mapping Open Data in R. The workshop will be in Berkeley, CA and is open to the public. If you have an interest in the subject matter then I hope that you will attend! Here is a sample of the maps that people will learn to The post

Read more »

yorkr ranks IPL batsmen and bowlers

April 25, 2016
By
yorkr ranks IPL batsmen and bowlers

Here is a short post which ranks IPL batsmen and bowlers. These are based on match data from Cricsheet. Ranking batsmen and bowlers in IPL is more challenging as the players can belong to different teams in different years. Hence I create a combined data frame of the batsmen and bowlers regardless of their IPL

Read more »

Learning R for Data Visualization [Video]

April 25, 2016
By
Learning R for Data Visualization [Video]

Last year Packt asked me to develop a video course to teach various techniques of data visualization in R. Since I love the idea of video courses and tutorials, and I also enjoy plotting data, I readily agreed.The result is this course, published last ...

Read more »

The one machine learning concept you need to know

April 25, 2016
By
The one machine learning concept you need to know

Machine learning is hard. Some people spend weeks, months, even years trying to learn machine learning without any success. They play around with datasets, buy books, compete on Kaggle, but ultimately make little progress. One of the big problems, is that many people just want to “dive in and build something.” I admire the ambition The post

Read more »

Fast csv writing for R

April 25, 2016
By
Fast csv writing for R

Guest post by Matt Dowle. This post was first published on the H2O blog, please go there to leave a comment. R has traditionally been very slow at reading and writing csv files of, say, 1 million rows or more. Getting data into R is often the first task a user needs to do and if they have a poor...

Read more »

Missing Value Treatment

April 25, 2016
By
Missing Value Treatment

Missing values in data is a common phenomenon in real world problems. Knowing how to handle missing values effectively is a required step to reduce bias and to produce powerful models. Lets explore various options of how to deal with missing values and how to implement them. Data prep and pattern Lets use the BostonHousing Related Post

Read more »

Simulating Continuous-Time Markov Chains with simmer (part 2)

April 25, 2016
By
Simulating Continuous-Time Markov Chains with simmer (part 2)

In part one, we simulated a simple CTMC. Now, let us complicate things a bit. Remember the example problem there: A gas station has a single pump and no space for vehicles to wait (if a vehicle arrives and the pump is not available, it leaves). Vehicles arrive to the gas station following a Poisson process with...

Read more »

Candlestick charts using Plotly and Quantmod

April 24, 2016
By
2016-04-26 15_30_22-Clipboard

This post is dedicated to creating candlestick charts using Plotly’s R-API. For more information on candlestick charts visit www.stockcharts.com. We’ll also showcase Plotly’s awesome new range selector feature !

Read more »

Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results)

April 22, 2016
By
Create Amazing Looking Backtests With This One Wrong–I Mean Weird–Trick! (And Some Troubling Logical Invest Results)

This post will outline an easy-to-make mistake in writing vectorized backtests–namely in using a signal obtained at the end of … Continue reading →

Read more »

R Courses at Newcastle

April 22, 2016
By
R Courses at Newcastle

Over the next two months I’m running a number of R courses at Newcastle University. May 2016 May 10th, 11th: Predictive Analytics May 16th – 20th: Bioconductor May 23rd, 24th: Advanced programming June 2016 June 8th: R for Big Data June 9th: Interactive graphics with Shiny Since these courses are on  advanced topics, numbers are limited

Read more »

Microsoft R Open 3.2.4 now available

April 22, 2016
By
Microsoft R Open 3.2.4 now available

M icrosoft R Open 3.2.4, Microsoft's enhanced distribution of R, is now available for download from mran.microsoft.com. This update is based on R 3.2.4-revised, and includes several improvements and some minor bug fixes from the R Core Group. Improvements include long-vector support for the smooth function, a new stringsAsFactors options when using rbind with data frames, and better rounding...

Read more »

New: Spanish and French Translations of Introduction to R

April 21, 2016
By

The team here at DataCamp is thrilled to announce that we now offer free Spanish and French translations of our most popular course, Introduction to R. Best of all, the courses are free as a part of our open course offering! By using in-browser coding challenges you will experiment with the different aspects of the R language in...

Read more »

WrightMap Tutorial 4 – More Flexibility Using the person and item side…

April 21, 2016
By
WrightMap Tutorial 4 – More Flexibility




Using the person and item side…

WrightMap Tutorial 4 - More Flexibility Using the person and item side functions Introduction Version 1.2 of the WrightMap package allows you to directly access the functions used for drawing the person and item sides of the map in order to allow more flexible item person maps. The parts can be put together on the same plot using the split.screen function. Calling the...

Read more »

Introducing fidlr: FInancial Data LoadeR

April 21, 2016
By
Introducing fidlr: FInancial Data LoadeR

fidlr is an RSutio addin designed to simplify the financial data downloading process from various providers. This initial version is a wrapper around the getSymbols function in the quantmod package and only Yahoo, Google, FRED and Oanda are supported. I will probably add functionalities over time. As usual with those things just a kind reminder: “THE SOFTWARE

Read more »

Principal curves example (Elements of Statistical Learning)

April 21, 2016
By
Principal curves example (Elements of Statistical Learning)

The bit of R code below illustrates the principal curves methods as described in The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman (Ch. 14; the book is freely available from the authors' website). Specifically, the code generates some bivariate data that have a nonlinear association, initializes the principal curve using the first (linear) principal … Continue reading...

Read more »

Get ready for R/Finance 2016

April 21, 2016
By
Get ready for R/Finance 2016

by Joseph Rickert R/Finance 2016 is less than a month away and, as always, I am very much looking forward to it. In past years, I have elaborated on what puts it among my favorite conferences even though I am not a finance guy. R/Finance is small, single track and intense with almost no fluff. And scattered among the...

Read more »

A simple proof that the p-value distribution is uniform when the null hypothesis is true

April 20, 2016
By

Someone asked this question in my linear modeling class: why is it that the p-value has a uniform distribution when the null hypothesis is true? Proof is remarkably simple.First, notice that when a random variable Z comes from a $Uniform(0,1)$ distribu...

Read more »

an integer programming riddle

April 20, 2016
By
an integer programming riddle

A puzzle on The Riddler this week that ends up as a standard integer programming problem. Removing the little story around the question, it boils down to optimise 200a+100b+50c+25d under the constraints 400a+400b+150c+50d≤1000, b≤a, a≤1, c≤8, d≤4, and (a,b,c,d) all non-negative integers. My first attempt was a brute force R code since there are only

Read more »

Pride and Prejudice and Z-scores

April 20, 2016
By
Pride and Prejudice and Z-scores

You might think literary criticism is no place for statistical analysis, but given digital versions of the text you can, for example, use sentiment analysis to infer the dramatic arc of an Oscar Wilde novel. Now you can apply similar techniques to the works of Jane Austen thanks to Julia Silge's R package janeaustenr (available on CRAN). The package...

Read more »

Installing SQL Server ODBC drivers on Ubuntu (in Travis-CI)

April 20, 2016
By

Did you know you can now get SQL Server ODBC drivers for Ubuntu? Yes, no, maybe? It’s ok even if you haven’t since it’s pretty new! Anyway, this presents me with an ideal opportunity to standardise my SQL Server ODBC connections across the operating systems I use R on i.e. Windows and Ubuntu. My first The post

Read more »

R editor improvements for the next release of Bio7

April 20, 2016
By
R editor improvements for the next release of Bio7

20.04.2016 For the upcoming release of Bio7 I worked hard to improve the R editor features. So I added some new features and improvements to assist in the creation of R scripts in Bio7. One of the highlights is the newly integrated dynamic code analysis when writing an R script. Here a short overview of

Read more »

Data Exploration with Tables exercises

April 20, 2016
By
Data Exploration with Tables exercises

The table() function is intended for use during the Data Exploration phase of Data Analysis. The table() function performs categorical tabulation of data. In the R programming language, “categorical” variables are also called “factor” variables. The tabulation of data categories allows for Cross-Validation of data. Thereby, finding possible flaws within a dataset, or possible flaws

Read more »

Le Monde puzzle [#959]

April 19, 2016
By
Le Monde puzzle [#959]

Another of those arithmetic Le Monde mathematical puzzle: Find an integer A such that A is the sum of the squares of its four smallest dividers (including1) and an integer B such that B is the sum of the third poser of its four smallest factors. Are there such integers for higher powers? This begs

Read more »

Notes from 2nd Bayesian Mixer Meetup

April 19, 2016
By
Notes from 2nd Bayesian Mixer Meetup

Last Friday the 2nd Bayesian Mixer Meetup (@BayesianMixer) took place at Cass Business School, thanks to Pietro Millossovich and Andreas Tsanakas, who helped to organise the event.Bayesian Mixer at CassFirst up was Davide De March talking about the challenges in biochemistry experimentation, which are often characterised by complex and...

Read more »

R’s Growth Continues to Accelerate

R’s Growth Continues to Accelerate

Each year I update the growth in R’s capability on The Popularity of Data Analysis Software. And each year, I think R’s incredible rate of growth will finally slow down. Below is a graph of the latest data, and as … Continue reading →

Read more »

Exploring NYC Taxi Data with Microsoft R Server and HDInsight

April 19, 2016
By
Exploring NYC Taxi Data with Microsoft R Server and HDInsight

As I mentioned yesterday, Microsoft R Server now available for HDInsight, which means that you can now run R code (including the big-data algorithms of Microsoft R Server) on a managed, cloud-based Hadoop instance. Debraj GuhaThakurta, Senior Data Scientist, and Shauheen Zahirazami, Senior Machine Learning Engineer at Microsoft, demonstrate some of these capabilities in their analysis of 170M taxi...

Read more »

Sponsors