Machine Learning Ex2 – Linear Regression

March 22, 2011
By
Machine Learning Ex2 – Linear Regression

Thanks to this post, I found OpenClassroom. In addition, thanks to Andrew Ng and his lectures, I took my first course in machine learning. These videos are quite easy to follow. Exercise 2 requires implementing gradient descent algorithm to model data with linear regression. Read More: 243 Words Totally

Read more »

JCGS 20th anniversary

March 22, 2011
By
JCGS 20th anniversary

For its 20th anniversary, JCGS offers free access to papers, including Andrew’s discussion paper Why tables are really much better than graphs. (Another serious ending for an April fool joke!) Incidentally (or rather coincidentally), I received today the great news that our Using parallel computation to improve Independent Metropolis-Hastings based estimation paper is accepted by

Read more »

Day #9 Using R in Knime nodes

March 22, 2011
By

First you need to create a workflow in Knime. This is what i used. I loaded in the Iris data, renamed the tables for further use in my scripts and showed a view, or first did an R snippet to show a view afterwards. Once this is done, make sure your R-B...

Read more »

Using R for Introductory Statistics 6, Simulations

March 21, 2011
By
Using R for Introductory Statistics 6, Simulations

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theorem come with caveats, such as "...for sufficiently large n...". Getting a feel for how...

Read more »

Using R for Introductory Statistics 6, Simulations

March 21, 2011
By
Using R for Introductory Statistics 6, Simulations

R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theore...

Read more »

Statistics forum

March 21, 2011
By
Statistics forum

The ASA is launching a new blog called the Statistics Forum, managed by Andrew Gelman and to which I will periodically contribute items that may induce some amount of discussion within the community, like the first entry by Michael Lavine on testing. (Meaning I will double-post on the Og and on the Statistics Forum, if

Read more »

A 3D Version of R’s curve() Function

March 21, 2011
By
A 3D Version of R’s curve() Function

I like exploring the behavior of functions of a single variable using the curve() function in R. One thing that seems to be missing from R’s base functions is a tool for exploring functions of two variables. I asked for examples of such a function on Twitter today and didn’t get any answers, so I

Read more »

Updated infochimps R package, includes several new APIs

March 21, 2011
By

Recently, the good folks at Infochimps.com rolled out a series of new APIs to add to their already impressive set of data resources. I have been in a perpetual state of catch-up since the new year, so I have only now got around to adding some of these new APIs to the infochimps R package. Here

Read more »

Code Highlighter for R in WordPress

March 21, 2011
By

First of all, welcome to my blog! I will write posts about trading, quantitative and algorithmic trading, programming and everything else what is on my mind. Feel free to comment and give suggestions how to improve this blog. Thank you! Well, one of th...

Read more »

RStudio Keyboard Shortcut Reference PDF

March 21, 2011
By
RStudio Keyboard Shortcut Reference PDF

I recently started using RStudio, the amazing new IDE for R. You can view all of RStudio's keyboard shortcuts by going to the help menu, but I made this printable reference for myself and thought I'd share it. I only included the Windows shortcuts, and...

Read more »

RStudio Keyboard Shortcut Reference PDF

March 21, 2011
By

I recently started using RStudio, the amazing new IDE for R. You can view all of RStudio's keyboard shortcuts by going to the help menu, but I made this printable reference for myself and thought I'd share it. I only included the Windows shortcuts, and...

Read more »

NBA Analysis: Coming Soon!

March 21, 2011
By
NBA Analysis:  Coming Soon!

I decided to spend a few hours this weekend writing the R code to scrape the individual statistics of NBA players (2010-11 only).  I originally planned to write up a few NBA-related analyses, but a friend was visiting from out … Continue reading →

Read more »

Day #8 Calling R in java

March 21, 2011
By

What is JRI? An additional library and plugin for Eclipse to call r functions in a Java-application. And according to wikipedia it’s  a now obsolete API for invoking native C++ calls from Java that has long been supplanted by Java Native Interfa...

Read more »

Looking at the "Curse of Dimensionality" with R, foreach, and lattice

March 20, 2011
By
Looking at the "Curse of Dimensionality" with R, foreach, and  lattice

Here are the results of a "Curse of Dimensionality" homework assignment for Terran Lane's Introduction to Machine Learning class. Pretty pictures, interesting results, and a good exercise in explicit parallelism with R. It's neat to see distance scaling linearly with standard deviation, and linearly with the Lth-root...

Read more »

Fast(ish) extraction of exon locations from a BED12 file using data.table

March 20, 2011
By

Here is a fast R function to extract exon locations from a BED12 file. Note that fast is a relative term, the function below is fast enough for me, may not be fast enough for others :) Anyway, a BED12 file typically has locations of genomic features (t...

Read more »

Fast(ish) extraction of exon locations from a BED12 file using data.table

March 20, 2011
By

Here is a fast R function to extract exon locations from a BED12 file. Note that fast is a relative term, the function below is fast enough for me, may not be fast enough for others :) Anyway, a BED12 file typically has locations of genomic features (t...

Read more »

Machine Learning Ex5.2 – Regularized Logistic Regression

March 20, 2011
By
Machine Learning Ex5.2 – Regularized Logistic Regression

Exercise 5.2 Improves the Logistic Regression implementation done in Exercise 4 by adding a regularization parameter that reduces the problem of over-fitting. We will be using Newton's Method. Data Here's the data we want to fit. # linear regression # load the data mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydHZPN2pFbkZGd1RKeU81OFY3ZHJldWc&output=csv", header = TRUE) # plot the data plot(mydata$u, mydata$v,, xlab="u", ylab="v") points(mydata$u,...

Read more »

Bertand’s paradox [R details]

March 19, 2011
By
Bertand’s paradox [R details]

Some may have had reservations about the “randomness” of the straws I plotted to illustrate Bertrand’s paradox. As they were all going North-West/South-East. I had actually made an inversion between cbind and rbind in the R code, which explained for this non-random orientation. Above is the corrected version, which sounds “more random” indeed. (And using

Read more »

How to: Binomial regression models in R

March 19, 2011
By
How to: Binomial regression models in R

Ever wondered how to predict success or failure as a function of other variables? Here's a quick tutorial on binomial regression in R.

Read more »

New GenABEL Website, and more *ABEL software

March 18, 2011
By
New GenABEL Website, and more *ABEL software

The *ABEL suite of R packages and software for genetic analysis has grown substantially since the appearance of GenABEL and the previously mentioned ProbABEL R packages. There are now a handful of useful R packages and other software utilities facilita...

Read more »

New GenABEL Website, and more *ABEL software

March 18, 2011
By

The *ABEL suite of R packages and software for genetic analysis has grown substantially since the appearance of GenABEL and the previously mentioned ProbABEL R packages. There are now a handful of useful R packages and other software utilities facilita...

Read more »

How to display scatter plot matrices with R and lattice

How to display scatter plot matrices with R and lattice

In lattice, there is a function called splom for the display of scatter plot matrices. For large datasets, the panel.hexbinplot from the hexbin package is a better option than the default panel. As an example, let’s use some meteorological data from MAPA-SIAR: library(solaR) library(hexbin) aranjuez <- readMAPA(prov=28, est=3, start='01/01/2004', end='31/12/2010') aranjuezDF <- subset(as.data.frame(getData(aranjuez)), select=c('TempMedia', 'TempMax',

Read more »

Flying off the Rack: R and the web in 2011

March 18, 2011
By

If there is ever a time to learn R and web application development, it is now...in the age of Big Data. The upcoming release of R 2.13 will provide basic functionality for developing R web applications on the desktop via the internal HTTP server, but t...

Read more »

Some upcoming R courses

March 18, 2011
By

A couple of quick notes about some upcoming R courses: In Vancouver, Canada, R trainer Isabella Ghement is presenting two R courses: An Introduction to the Statistical Software Package R, 8:30am-4:30pm, March 30-31, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMarch30and31_2011.html) Advanced Statistical Modeling Using the Statistical Software Package R, 8:30am-4:30pm, May 5-6, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMay5and6_2011.html); And in Seattle, Washington...

Read more »

More fun with sed

March 18, 2011
By

So I have this strange date and time string, which I would like to convert to a “useable” date, i.e., something that a spreadsheet programme or R can work with. It looks like this (MON has 3 chars): ddMONyr:hh:mm:ss The … Continue reading →

Read more »

Machine Learning Ex5.1 – Regularized Linear Regression

March 18, 2011
By
Machine Learning Ex5.1 – Regularized Linear Regression

Exercise 5.1 Improves the Linear Regression implementation done in Exercise 3 by adding a regularization parameter that reduces the problem of over-fitting. Over-fitting occurs especially when fitting a high-order polynomial, that we will try to do here. Data Here's the points we will make a model from: # linear regression mydata = read.csv("http://spreadsheets.google.com/pub?hl=en_GB&hl=en_GB&key=0AnypY27pPCJydGhtbUlZekVUQTc0dm5QaXp1YWpSY3c&output=csv", header = TRUE) # view data plot(mydata) http://al3xandr3.github.com/img/ml-ex51-data.png

Read more »

The housing bubble by city

March 17, 2011
By
The housing bubble by city

The housing bubble by city. Miami sailed high and fell far. Detroit rose modestly and but dropped more than it went up. Dallas held steady. DC is enjoying a bit of renewed growth, but are in and New York yet to fall?

Read more »

La historia detrás del software: el caso de R.

Hoy en día, en cualquiera que sea nuestra área de aplicación de la estadística requiere que sepamos programar. Los software convencionales como SPSS y STATA son un tanto limitados, las actualizaciones no son tan constantes y su precio puede ser con...

Read more »

More, Please!

March 17, 2011
By

Thanks to Jim, I’ve been using R in the shell more and more – in concert with vi. It’s been fun, and nice to integrate my workflows all on the server (although I haven’t had to do much graphing yet – I’m sure I’ll start kvetching then and return to a nice gui). One thing

Read more »