Images as x-axis labels

June 2, 2016
By
Images as x-axis labels

Open-source software is awesome. If I found that a piece of closed-source software was missing a feature that I wanted, well, bad luck. I probably couldn't even tell if was actually missing or if I just didn't know about it....Continue Reading →

Read more »

R for Publication by Page Piccinini: Lesson 2 – Linear Regression

June 2, 2016
By
R for Publication by Page Piccinini: Lesson 2 – Linear Regression

This is our first lesson where we actually learn and use a new statistic in R. For today’s lesson we’ll be focusing on linear regression. I’ll be taking for granted some of the set-up steps from Lesson 1, so if you haven’t done that yet be sure to go back and do it. By the Lesson 2: Linear...

Read more »

A demonstration of vtreat data preparation

June 1, 2016
By
A demonstration of vtreat data preparation

This article is a demonstration the use of the R vtreat variable preparation package followed by caret controlled training. In previous writings we have gone to great lengths to document, explain and motivate vtreat. That necessarily gets long and unnecessarily feels complicated. In this example we are going to show what building a predictive model … Continue reading...

Read more »

Le Monde puzzle [#964]

June 1, 2016
By
Le Monde puzzle [#964]

A not so enticing Le Monde mathematical puzzle: Find the minimal value of a five digit number divided by the sum of its digits. This can formalised as finding the minimum of N/(a+b+c+d+e) when N writes abcde. And solved by brute force. Using a rough approach to finding the digits of a five-digit number, the

Read more »

Reference semantics in R

June 1, 2016
By

Question I recently got a mail from Václav on reference semantics in data.tree, reading as follows: Dear Christoph, I am rather inexperienced when it comes to environments in R and henceforth I apologize if my question is basic; however, my colleagues are no better than me to answer my question. I would have a question iro The post

Read more »

Covcalc: Shiny App for Calculating Coverage Depth or Read Counts for Sequencing Experiments

June 1, 2016
By
Covcalc: Shiny App for Calculating Coverage Depth or Read Counts for Sequencing Experiments

How many reads do I need? What's my sequencing depth? These are common questions I get all the time. Calculating how much sequence data you need to hit a target depth of coverage, or the inverse, what's the coverage depth given a set amount of sequenci...

Read more »

Trisurf Plots in R using Plotly

June 1, 2016
By

In this post we’ll show how to create Triangular Surface Plots in R. This post is based on timelyportfolio’s gist. Moebius Strip 2D Surface over a disk Chopper from python

Read more »

Scripting Loops In R

June 1, 2016
By
Scripting Loops In R

An R programmer can determine the order of processing of commands, via use of the control statements; repeat{}, while(), for(), break, and next Answers to the exercises are available here. Exercise 1 The repeat{} loop processes a block of code until the condition specified by the break statement, (that is mandatory within the repeat{} loop),

Read more »

EARL London 2016 – Speakers Announced

June 1, 2016
By
EARL London 2016 – Speakers Announced

LONDON  13th – 15th SEPTEMBER 2016 EARL is an exciting cross-sector Conference dedicated to the real business usage of R.   One day of Workshops and two days devoted to the most innovative R implementations by the world’s leading practitioners. … Continue reading →

Read more »

MilanoR meeting | Call for presentations

June 1, 2016
By
MilanoR meeting | Call for presentations

A MilanoR meeting is an occasion to bring together R users from the Milano area to share R tips and experience: the next one will be Thursday, October 27th. We are looking for volunteers to present at the next meeting: if you feel you have something to input or you can recommend someone, please contact us! The post

Read more »

NLP on NPR’s Commencement Addresses

June 1, 2016
By
NLP on NPR’s Commencement Addresses

Vocativ did an interesting analysis of the President’s State of the Union (SOTU) speeches. They showed that across the past couple hundred years and many Presidents, SOTU speeches have been targeted at audiences with lower and lower education levels. Vocativ’s in-print interpretation of the downward sloping trend was that a speeches have gotten less sophisticated. Their recommended share-tweet for the article...

Read more »

Recent presentations

June 1, 2016
By
Recent presentations

The last month or so has been a whirlwind of awesomeness with a veritable bevvy of user group and conference talks on my part! I thought I would share the materials with you and provide some brief thoughts on how each presentation went. Sessions SQL Saturday Exeter : Stats 101 London Business Analytics (LBAG) : The post

Read more »

Scientific RMarkdown

May 31, 2016
By

Recently, in my own little scientific community bubble there was increasing interest in markdown and its use for science. As a big fan of markdown and espacially rmarkdown, I created the following cheat sheet and shared it at a couple of events. Sinc...

Read more »

heatmaply: interactive heat maps (with R)

May 31, 2016
By

I am pleased to announce heatmaply, my new R package for generating interactive heat maps, based on the plotly R package. tl;dr By running the following 3 lines of code: install.packages("heatmaply") library(heatmaply) heatmaply(mtcars, k_col = 2, k_row = 3) %>% layout(margin = list(l = 130, b = 40)) You will get this output in your browser … Continue reading...

Read more »

Happy New Year, Mr. President. Data and Sentiment Analysis of Presidential New Year Speeches

May 31, 2016
By
Happy New Year, Mr. President. Data and Sentiment Analysis of Presidential New Year Speeches

Salvino A. Salvaggio At a moment where many are preparing for the December 31st evening cocktail, the End of Year speech of the President of the Italian Republic is broadcast right on time at 8:30pm. A tradition which came to be with the constitutional establishment...

Read more »

Principal Components Regression in R: Part 3

May 31, 2016
By
Principal Components Regression in R: Part 3

by John Mount Ph. D. Data Scientist at Win-Vector LLC In her series on principal components analysis for regression in R, Win-Vector LLC's Dr. Nina Zumel broke the demonstration down into the following pieces: Part 1: the proper preparation of data and use of principal components analysis (particularly for supervised learning or regression). Part 2: the introduction of y-aware...

Read more »

Predictive Bookmaker Consensus Model for the UEFA Euro 2016

May 31, 2016
By

(By Achim Zeileis) From 10 June to 10 July 2016 the best European football teams will meet in France to determine the European Champion in the UEFA European Championship 2016 tournament. For the first time 24 teams compete, expanding the format from 16 teams as in the previous five Euro tournaments. For forecasting the winning probability of each team...

Read more »

Understanding beta binomial regression (using baseball statistics)

May 31, 2016
By
Understanding beta binomial regression (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing In this series we’ve been using the empirical Bayes method to estimate batting averages of baseball players. Empirical Bayes is useful here because when we...

Read more »

QGIS, Open Source GIS & R

May 31, 2016
By
QGIS, Open Source GIS & R

Today’s post is by Kurt Menke, the owner of Bird’s Eye View GIS, a GIS consultancy. Kurt also wrote the book Mastering QGIS. In my latest course (Shapefiles for R Programmers) I briefly introduce people to QGIS. Kurt’s post below gives you a roadmap for learning more.  I come to this blog from a slightly different, The post

Read more »

How to use data analysis for machine learning (example, part 1)

May 31, 2016
By
How to use data analysis for machine learning (example, part 1)

In my last article, I stated that for practitioners (as opposed to theorists), the real prerequisite for machine learning is data analysis, not math. One of the main reasons for making this statement, is that data scientists spend an inordinate amount of time on data analysis. The traditional statement is that data scientists “spend 80% The post

Read more »

Principal Components Regression, Pt. 3: Picking the Number of Components

May 30, 2016
By
Principal Components Regression, Pt. 3: Picking the Number of Components

In our previous note we demonstrated Y-Aware PCA and other y-aware approaches to dimensionality reduction in a predictive modeling context, specifically Principal Components Regression (PCR). For our examples, we selected the appropriate number of principal components by eye. In this note, we will look at ways to select the appropriate number of principal components in … Continue reading...

Read more »

On ranger respect.unordered.factors

May 30, 2016
By
On ranger respect.unordered.factors

It is often said that “R it its packages.” One package of interest is ranger a fast parallel C++ implementation of random forest machine learning. Ranger is great package and at first glance appears to remove the “only 63 levels allowed for string/categorical variables” limit found in the Fortran randomForest package. Actually this appearance is … Continue reading...

Read more »

zoo time series exercises

May 30, 2016
By
zoo time series exercises

The zoo package consists of the methods for totally ordered indexed observations. It aims at performing calculations containing irregular time series of numeric vectors, matrices & factors. The zoo package interfaces to all other time series packages on CRAN. This makes it easy to pass the time series objects between zoo & other time series

Read more »

From a (set.)seed grows a mighty dataset

May 30, 2016
By

Can you predict the output from this code? Okay, the first bit is straightforward; it’s a function that puts two string together into one. The next two lines appear to provide a random integer to the set.seed function then sample...Continue Reading →

Read more »

satRday Event in Cape Town

May 30, 2016
By
satRday Event in Cape Town

This blog post was first published on EXEGETIC ANALYTICS‘s blog and kindly re-posted on Data Science Africa. We are planning to host one of the three inaugural satRday conferences in Cape Town during 2017. The R Consortium has committed to funding three of these events: one will be in Hungary, another will be somewhere in the USA and the third...

Read more »

Solving Math Puzzles with data.tree

May 29, 2016
By

I got a note from Karim Lahrichi, who even thinks about math when he’s supposed to be drinking beer. The bar puzzle they were trying to solve goes like this: Using all of the numbers 1, 3, 4, 6 exactly once, and any combination of: addition, subtraction, multiplication and division (and parenthesis to group operations however you The post

Read more »

Visualizing Bootrapped Stepwise Regression in R using Plotly

May 29, 2016
By

We all have used stepwise regression at some point. Stepwise regression is known to be sensitive to initial inputs. One way to mitigate this sensitivity is to repeatedly run stepwise regression on bootstrap samples. R has a nice package called bootStepAIC() which (from its description) “Implements a Bootstrap procedure to investigate the variability of model

Read more »

the random variable that was always less than its mean…

May 29, 2016
By
the random variable that was always less than its mean…

Although this is far from a paradox when realising why the phenomenon occurred, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why the biased plug-in estimator does not appear to present a bias. The picture below compares two

Read more »

Introduction to R for Data Science :: Session 5 [Strings in R]

May 29, 2016
By
Introduction to R for Data Science :: Session 5 [Strings in R]

Welcome to Introduction to R for Data Science Session 5: Structuring Data: String manipulation in R. The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.