Machine Learning Ex2 – Linear Regression

March 22, 2011
Thanks to this post, I found OpenClassroom. In addition, thanks to Andrew Ng and his lectures, I took my first course in machine learning. These videos are quite easy to follow. Exercise 2 requires implementing gradient descent algorithm to model data with linear regression. Read More: 243 Words Totally

JCGS 20th anniversary

March 22, 2011
For its 20th anniversary, JCGS offers free access to papers, including Andrew’s discussion paper Why tables are really much better than graphs. (Another serious ending for an April fool joke!) Incidentally (or rather coincidentally), I received today the great news that our Using parallel computation to improve Independent Metropolis-Hastings based estimation paper is accepted by

Day #9 Using R in Knime nodes

March 22, 2011
First you need to create a workflow in Knime. This is what i used. I loaded in the Iris data, renamed the tables for further use in my scripts and showed a view, or first did an R snippet to show a view afterwards. Once this is done, make sure your R-B...

Using R for Introductory Statistics 6, Simulations

March 21, 2011
R can easily generate random samples from a whole library of probability distributions. We might want to do this to gain insight into the distribution's shape and properties. A tricky aspect of statistics is that results like the central limit theorem come with caveats, such as "...for sufficiently large n...". Getting a feel for how...

Statistics forum

March 21, 2011
The ASA is launching a new blog called the Statistics Forum, managed by Andrew Gelman and to which I will periodically contribute items that may induce some amount of discussion within the community, like the first entry by Michael Lavine on testing. (Meaning I will double-post on the Og and on the Statistics Forum, if

A 3D Version of R’s curve() Function

March 21, 2011
I like exploring the behavior of functions of a single variable using the curve() function in R. One thing that seems to be missing from R’s base functions is a tool for exploring functions of two variables. I asked for examples of such a function on Twitter today and didn’t get any answers, so I

Updated infochimps R package, includes several new APIs

March 21, 2011
Recently, the good folks at Infochimps.com rolled out a series of new APIs to add to their already impressive set of data resources. I have been in a perpetual state of catch-up since the new year, so I have only now got around to adding some of these new APIs to the infochimps R package. Here

Code Highlighter for R in WordPress

March 21, 2011
First of all, welcome to my blog! I will write posts about trading, quantitative and algorithmic trading, programming and everything else what is on my mind. Feel free to comment and give suggestions how to improve this blog. Thank you! Well, one of th...

RStudio Keyboard Shortcut Reference PDF

March 21, 2011
I recently started using RStudio, the amazing new IDE for R. You can view all of RStudio's keyboard shortcuts by going to the help menu, but I made this printable reference for myself and thought I'd share it. I only included the Windows shortcuts, and...

NBA Analysis: Coming Soon!

March 21, 2011
I decided to spend a few hours this weekend writing the R code to scrape the individual statistics of NBA players (2010-11 only).  I originally planned to write up a few NBA-related analyses, but a friend was visiting from out … Continue reading →

Day #8 Calling R in java

March 21, 2011
What is JRI? An additional library and plugin for Eclipse to call r functions in a Java-application. And according to wikipedia it’s  a now obsolete API for invoking native C++ calls from Java that has long been supplanted by Java Native Interfa...

Looking at the "Curse of Dimensionality" with R, foreach, and lattice

March 20, 2011
Here are the results of a "Curse of Dimensionality" homework assignment for Terran Lane's Introduction to Machine Learning class. Pretty pictures, interesting results, and a good exercise in explicit parallelism with R. It's neat to see distance scaling linearly with standard deviation, and linearly with the Lth-root...

Fast(ish) extraction of exon locations from a BED12 file using data.table

March 20, 2011
Here is a fast R function to extract exon locations from a BED12 file. Note that fast is a relative term, the function below is fast enough for me, may not be fast enough for others :) Anyway, a BED12 file typically has locations of genomic features (t...

Machine Learning Ex5.2 – Regularized Logistic Regression

March 20, 2011
Exercise 5.2 Improves the Logistic Regression implementation done in Exercise 4 by adding a regularization parameter that reduces the problem of over-fitting. We will be using Newton's Method. Data Here's the data we want to fit. # linear regression # load the data mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydHZPN2pFbkZGd1RKeU81OFY3ZHJldWc&output=csv", header = TRUE) # plot the data plot(mydata\$u, mydata\$v,, xlab="u", ylab="v") points(mydata\$u,...

March 19, 2011
Some may have had reservations about the “randomness” of the straws I plotted to illustrate Bertrand’s paradox. As they were all going North-West/South-East. I had actually made an inversion between cbind and rbind in the R code, which explained for this non-random orientation. Above is the corrected version, which sounds “more random” indeed. (And using

How to: Binomial regression models in R

March 19, 2011
$How to: Binomial regression models in R$

Ever wondered how to predict success or failure as a function of other variables? Here's a quick tutorial on binomial regression in R.

New GenABEL Website, and more *ABEL software

March 18, 2011
The *ABEL suite of R packages and software for genetic analysis has grown substantially since the appearance of GenABEL and the previously mentioned ProbABEL R packages. There are now a handful of useful R packages and other software utilities facilita...

How to display scatter plot matrices with R and lattice

In lattice, there is a function called splom for the display of scatter plot matrices. For large datasets, the panel.hexbinplot from the hexbin package is a better option than the default panel. As an example, let’s use some meteorological data from MAPA-SIAR: library(solaR) library(hexbin) aranjuez <- readMAPA(prov=28, est=3, start='01/01/2004', end='31/12/2010') aranjuezDF <- subset(as.data.frame(getData(aranjuez)), select=c('TempMedia', 'TempMax',

Flying off the Rack: R and the web in 2011

March 18, 2011
If there is ever a time to learn R and web application development, it is now...in the age of Big Data. The upcoming release of R 2.13 will provide basic functionality for developing R web applications on the desktop via the internal HTTP server, but t...

Some upcoming R courses

March 18, 2011
A couple of quick notes about some upcoming R courses: In Vancouver, Canada, R trainer Isabella Ghement is presenting two R courses: An Introduction to the Statistical Software Package R, 8:30am-4:30pm, March 30-31, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMarch30and31_2011.html) Advanced Statistical Modeling Using the Statistical Software Package R, 8:30am-4:30pm, May 5-6, 2011, Vancouver, B.C., Canada (http://www.ghement.ca/RworkshopMay5and6_2011.html); And in Seattle, Washington...

More fun with sed

March 18, 2011
So I have this strange date and time string, which I would like to convert to a “useable” date, i.e., something that a spreadsheet programme or R can work with. It looks like this (MON has 3 chars): ddMONyr:hh:mm:ss The … Continue reading →

Machine Learning Ex5.1 – Regularized Linear Regression

March 18, 2011
Exercise 5.1 Improves the Linear Regression implementation done in Exercise 3 by adding a regularization parameter that reduces the problem of over-fitting. Over-fitting occurs especially when fitting a high-order polynomial, that we will try to do here. Data Here's the points we will make a model from: # linear regression mydata = read.csv("http://spreadsheets.google.com/pub?hl=en_GB&hl=en_GB&key=0AnypY27pPCJydGhtbUlZekVUQTc0dm5QaXp1YWpSY3c&output=csv", header = TRUE) # view data plot(mydata)

The housing bubble by city

March 17, 2011
The housing bubble by city. Miami sailed high and fell far. Detroit rose modestly and but dropped more than it went up. Dallas held steady. DC is enjoying a bit of renewed growth, but are in and New York yet to fall?

La historia detrás del software: el caso de R.

Hoy en día, en cualquiera que sea nuestra área de aplicación de la estadística requiere que sepamos programar. Los software convencionales como SPSS y STATA son un tanto limitados, las actualizaciones no son tan constantes y su precio puede ser con...