Regression tree using Gini’s index

January 27, 2013
By
Regression tree using Gini’s index

In order to illustrate the construction of regression tree (using the CART methodology), consider the following simulated dataset, > set.seed(1) > n=200 > X1=runif(n) > X2=runif(n) > P=.8*(X1<.3)*(X2<.5)+ + .2*(X1<.3)*(X2>.5)+ + .8*(X1>.3)*(X1<.85)*(X2<.3)+ + .2*(X1>.3)*(X1<.85)*(X2>.3)+ + .8*(X1>.85)*(X2<.7)+ + .2*(X1>.85)*(X2>.7) > Y=rbinom(n,size=1,P) > B=data.frame(Y,X1,X2) with one dichotomos varible (the variable of interest, ), and two continuous ones (the explanatory ones  and ). > tail(B) Y...

Read more »

Tracking Number of Historical Clusters

January 26, 2013
By
Tracking Number of Historical Clusters

In the prior post, Optimal number of clusters, we looked at methods of selecting number of clusters. Today, I want to continue with clustering theme and show historical Number of Clusters time series using these methods. In particular, I will look at the following methods of selecting optimal number of clusters: Minimum number of clusters

Read more »

ggplot2 multiple boxplots with metadata

January 26, 2013
By
ggplot2 multiple boxplots with metadata

Recently I was asked for an advice of how to plot values with an additional attached condition separating the boxplots. This turns out to be ugly in base graphics, but amazingly simple in ggplot2.

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 3

January 26, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 3

In case you missed previous parts, the links to them are listed below. Part 1 Part 2 In this part, I tried to recreate the examples in section A.2.3 of the computational appendix in the reaction engineering book (by Rawlings and … Continue reading →

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 2

January 26, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 2

In case you missed part 1, you can view it here. In this part, I tried to recreate the examples in section A.2.2 of the computational appendix in the reaction engineering book by Rawlings and Ekerdt. Solving a nonlinear system of equations … Continue reading →

Read more »

Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

Read more »

Code Pollution With Command Prompts

January 26, 2013
By

This is not the first time I have ranted about command prompts, but I cannot help ranting about them whenever I saw them in source code. In short, a piece of source code with command prompts is like a bag of cooked shrimps in the market -- it does not make sense, and an otherwise good thing is...

Read more »

Waiting for an API request to complete

January 26, 2013
By

Dealing with API tokens in R In my previous post I showed an example of calling the Phylotastic taxonomic name resolution API Taxosaurus here. When you query their API they give you a token which you use later to retrieve the result (see examples on their page above). However, you don't know when the query will be...

Read more »

Learning R using a Chemical Reaction Engineering Book: Part 1

January 25, 2013
By
Learning R using a Chemical Reaction Engineering Book: Part 1

Chemical Reactor Analysis and Design Fundamentals by J.B. Rawlings and J. G. Ekerdt is a textbook for studying Chemical Reaction Engineering. The popular open source package Octave has its origins to the reaction engineering course offered by Prof. Rawlings. This book … Continue reading →

Read more »

Lambda.r 1.1.0 released

January 25, 2013
By
Lambda.r 1.1.0 released

This is a quick post to announce lambda.r version 1.1.0 is released and available on CRAN.1 This release has a …Continue reading »

Read more »

A simple way to cluster music

January 25, 2013
By

In my last blog, I discussed the tuneR functions that provide an option to transcribe musical notes from audio frequencies. In this blog, I’ll write about functions for comparison of audio spectrum distributions, available in the seewave library. The idea I … Continue reading →

Read more »

Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

January 25, 2013
By
Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

I owe a big “thank you” to all of you who attended my webinar yesterday “Using R with Hadoop”. Revolution Analytics partnered with us at Think Big Analytics to produce the webinar, and I owe them thanks as well. For those of you who missed it, the slides and replay are now available from Revolution

Read more »

R and foreign characters

January 25, 2013
By
R and foreign characters

Working with Russian characters can be mind-numbingly frustrating. This is true for R, as for other applications, so below I've written out the my top five tricks for making Russian inputs work in R; i believe they should be transferable to most other languages....

Read more »

Shiny 0.3.0 released

January 25, 2013
By
Shiny 0.3.0 released

Version 0.3.0 of Shiny is now available on CRAN. This version of Shiny has several new features and bug fixes. Some of the changes are under the hood: for example, Shiny now uses a more efficient algorithm for scheduling the execution of reactive functions. There are also some user-facing changes: for example, the new runGitHub()

Read more »

Video: Using R with Hadoop

January 25, 2013
By

If you weren't one of the almost 2000 people who signed up for yesterday's webinar "Using R with Hadoop", the replay and slides are now available. During the webinar, Jeffrey Breen (Principal at Think Big Academy) talked about extracting analytics from data in Hadoop and covered: How to use R and Hadoop Hadoop streaming Various R packages and RHadoop...

Read more »

Regressions 101: “Significance”

January 25, 2013
By
Regressions 101: “Significance”

SETUP (CAN BE SKIPPED) We start with data (how was it collected?) and the hope that Read more »

Effects of forest management on a woodland salamander in Missouri

January 25, 2013
By
Effects of forest management on a woodland salamander in Missouri

Effects of experimental forest management on a terrestrial, woodland salamander in Missouri by Daniel J. Hocking, Grant M. Connette, Christopher A. Conner, Brett R. Scheffers, Shannon E. Pittman, William E. Peterman, and Raymond D. Semlitsch. This is the first post … Continue reading →

Read more »

Object Orientation in R – Notes from a novice

January 25, 2013
By
Object Orientation in R – Notes from a novice

Having posted some code to Git a few days ago and having been wholly dissatisfied with it, I began to do what I often do with code I don’t like. I started re-writing it bigger and weirder and more philosophically pure. Part of this search for Platonic code lead me to explore object oriented programming

Read more »

Tensor Algebra: Efficient Operations on Multidimensional Arrays with R

January 25, 2013
By

 Multidimensional arrays are ubiquitous. Any complex problem having multivariate observables would easily generate a need to represent corresponding data in multidimensional arrays. Most of the practitioners would choose to apply operations on the...

Read more »

Resolving species names when you have a lot of them

January 25, 2013
By

taxize use case: Resolving species names when you have a lot of them Species names can be a pain in the ass, especially if you are an ecologist. We ecologists aren't trained in taxonomy, yet we often end up with huge species lists. Of course we want to correct any spelling errors in the names, and get the newest...

Read more »

Resolving species names when you have a lot of them

January 25, 2013
By

taxize use case: Resolving species names when you have a lot of them Species names can be a pain in the ass, especially if you are an ecologist. We ecologists aren't trained in taxonomy, yet we often end up with huge species lists. Of course we want to correct any spelling errors in the names, and get the newest...

Read more »

Time series cross-validation 5

January 24, 2013
By
Time series cross-validation 5

The caret package for R now supports time series cross-validation!  (Look for version 5.15-052 in the news file).  You can use the createTimeSlices function to do time-series cross-validation with a fixed window, as well as a growin...

Read more »

local package delays

January 24, 2013
By
local package delays

When Jean-Michel and I left Berlin, a month ago, I really thought we were that close to sending the new edition of Bayesian Core. Alas, we are not done yet for a series of reasons: leaving for India did not give me enough time to complete the help manual, some codes from the original version

Read more »

R PMML Support: Data Transformations

January 24, 2013
By
R PMML Support: Data Transformations

R and PMML Export   R is becoming the tool of choice for many data scientists. It is no wonder that many commercial and open-source statistical tools are also embracing R.Predictive ModelsA set of robust predictive analytic techniques is...

Read more »

Visualizing threaded conversation volume and intensity

January 24, 2013
By
Visualizing threaded conversation volume and intensity

As a researcher interested in information flows in digital environments I’m often interested in finding patterns in social trace data. For this discussion we can think of digital social trace data as the text that people post into threaded topics on forums, like on Reddit or a Wiki Talk page on Wikipedia. One way to

Read more »

Storing a Function in a Separate File in R

January 24, 2013
By

If you're going to be using a function across several different R files, you might want to store the function in its own file.If you want to name the function in its own fileThis is probably the best option in general, if only because you may want to p...

Read more »

Votamatic predicted the election with R

January 24, 2013
By
Votamatic predicted the election with R

While Nate Silver got a lot of the attention for correctly forecasting the US presidential election, other forecasters were just as succesful. Drew Linzer used the R language to build the statistical model behind votamatic.org, and was able to predict the outcome of the election months before most pundits. Drew's model initially relied mostly on fundamental quantities: the president’s...

Read more »

No more ascii-art

January 24, 2013
By
No more ascii-art

At least fourfive R packages will turn your regression models into pretty latex tables: texreg, xtable, apsrtable, memisc, and stargazer.  This is very nice if you happen to be a latex document or its final reader, but it’s not so great if you’re making those models to start with. What if you wanted to see

Read more »

Writing Better Statistical Programs in R

January 24, 2013
By
Writing Better Statistical Programs in R

A while back a friend asked me for advice about speeding up some R code that they’d written. Because they were running an extensive Monte Carlo simulation of a model they’d been developing, the poor performance of their code had become an impediment to their work. After I looked through their code, it was clear

Read more »

Sponsors