3887 search results for "git"

Update to Graphing Non-Proportional Hazards in R

December 30, 2012
By
Update to Graphing Non-Proportional Hazards in R

Update 1 February 2013: I've moved all of the functionality described in this post into an R package called simtvc. Have a look. It is much easier to use. This is a quick update for a previous post on Graphing Non-Proportional Hazards in R. In the previous post I showed how to simulate and graph 1,000...

Read more »

Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

December 29, 2012
By
Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

MotivationI was inspired by Revolution's blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R. However, this tutorial describes the implementation using VMware's application. One downside to using VMware is that it's not free. I know most of the people including me like to hear the words open-source and free,...

Read more »

Row-wise summary curves in faceted ggplot2 figures

December 29, 2012
By
Row-wise summary curves in faceted ggplot2 figures

I really enjoy reading the Junk Charts blog.  A recent post made me wonder how easy it would be to add summary curves for small-multiple type plots, assuming the “small multiples” to summarize were the X component of a ggplot2::facet_grid(Y ~ X) … Continue reading →

Read more »

High-Dimensional Microarray Data Sets in R for Machine Learning

December 29, 2012
By

Much of my research in machine learning is aimed at small-sample, high-dimensional bioinformatics data sets. For instance, here is a paper of mine on the topic. A large number of papers proposing new machine-learning methods that target high-dimensional data use the same two data sets and consider few others. These data sets are the 1) Alon colon cancer...

Read more »

Men who stare at needles

December 29, 2012
By

Buffon's needle problem is a question first posed in the 18th century by Georges-Louis Leclerc, Comte de Buffon:What is the probability that a needle thrown at a lined sheet of paper will cross a line?This problem can be used to estimate π. If we set the nail size and the line distance = 1, the estimator can be calculated...

Read more »

Clustering with selected Principal Components

December 28, 2012
By
Clustering with selected Principal Components

In the Visualizing Principal Components post, I looked at the Principal Components of the companies in the Dow Jones Industrial Average index over 2012. Today, I want to show how we can use Principal Components to create Clusters (i.e. form groups of similar companies based on their distance from each other) Let’s start by loading

Read more »

Find Duplicate Files Using R

December 28, 2012
By
Find Duplicate Files Using R

Find Duplicate Files This is a simple script to search a directory tree for all files with duplicate content. It …Continue reading »

Read more »

Label placement with spplot and lattice

December 28, 2012
By
Label placement with spplot and lattice

The package maptools includes new functions to label points and labels. Line labelling The lineLabel function produces and draws text …Continuar leyendo »

Read more »

Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

December 24, 2012
By
Who Survived on the Titanic? Predictive Classification with Parametric and Non-parametric Models

I recently read a really interesting blog post about trying to predict who survived on the Titanic with standard GLM models and two forms of non-parametric classification tree (CART) methodology. The post was featured on R-bloggers, and I think it's worth a closer look. The basic idea was to figure out which of these three

Read more »

Make a Christmas Tree in R with random ornaments/presents

December 24, 2012
By
Make a Christmas Tree in R with random ornaments/presents

Happy holidays!     Link to Gist

Read more »