This is a trivial but very useful tip:> x=data.frame(a=1:4, c=5)> x a c1 1 52 2 53 3 54 4 5> x a c1 1 5> x 1 2 3 4> x a1 12 23 34 4where you can see that:to avoid a become a vector, rather than a...

R news and tutorials contributed by (552) R bloggers

This is a trivial but very useful tip:> x=data.frame(a=1:4, c=5)> x a c1 1 52 2 53 3 54 4 5> x a c1 1 5> x 1 2 3 4> x a1 12 23 34 4where you can see that:to avoid a become a vector, rather than a...

If you want more info about clustering, I have another post about "Clustering analysis and its implementation in R". Here is the link: http://onetipperday.blogspot.com/2012/04/clustering-analysis-2.html------------Several R functions in this...

We were talking with one of my colleagues about doing some text analysis—that, by the way, I have never done before—for which the first issue is to get text in R. Not any text, but files that can be accessed … Continue reading →

I’d like to explore more the capabilities of my statistical packages to get data online and allocate it in memory instead of download each dataset by hand. After all, I found this task is pretty easy, but got me out of bed for one night trying to find the most efficient way to loop across

Data mining is not only statistics, even if statistics is the most recognized academic component of it. It also includes data cleaning, machine learning and data visualization. The scarce factor is the ability to understand that data and extract value ...

This is a piece of code I implemented in 2004, which was supposed to be part of an R-package in multivariate testing (to be named, rather creatively, mvttests). Time has flown, I haven’t still got around to implementing the said package, but people keep asking me for the varcomp function, so here it is, for

I have just finished reading this book by Bill Bolstad (University of Waikato, New Zealand) which a previous ‘Og post pointed out when it appeared, shortly after our Introducing Monte Carlo Methods with R. My family commented that the cover was nicer than those of my own books, which is true. Before I launch into

This is a piece of code I implemented in 2004, which was supposed to be part of an R-package in multivariate testing (to be named, rather creatively, mvttests). Time has flown, I haven’t still got around to implementing the said package, but people keep asking me for the sphericity.test function, so here it is, for

There are times when we need to write a function that makes changes to a generic data frame that is passed as an argument. Let’s say, for example, that we want to write a function that converts to factor any … Continue reading →

There have been some exciting developments in the Deducer ecosystem over the summer which should go into CRAN release in the next few months. Today I'm going to give a quick sneak peek at an Open Street Map - R connection with accompanying GUI. This post will just show the non-GUI components. The first part of the

With respect to multinomial logit model, the performance difference between the two packages are quite large, based on this post.

Once one starts writing more R code the need for consistency increases, as it facilitates managing larger projects and their maintenance. There are several style guides or suggestions for R; for example, Andrew Gelman’s, Hadley Wickham’s, Bioconductor’s and this one. … Continue reading →

If you use R and haven’t discovered Sweave then go and find out about it. It enables R code and plots to be incorporated into a document so the analysis and report can be combined together in a single document. … Continue reading →

Today I want to discuss a connection between Risk, Return and Analyst Ratings. Let’s start with defining our universe of stocks : 30 stocks from Dow Jones Industrial Average (^DJI) index. For each stock I will compute the number of Upgrades and Downgrades, Risk, and Return in 2010:2011. I will run a linear regression and

If you dig around enough on Amazon.com, you can find some pretty odd products (like the Badonkadonk tank now sadly unavailable). Attached to these products you can often find a new form of comedy: the funny Amazon review. The products that attract such attention can be hard to fathom: this gallon of milk has more than 1,000 reviews. (Sample:...

In case you missed them, here are some articles from September of particular interest to R users. The deadline to enter the "R Applications" contest with $20,000 in prizes is October 31. The RHadoop Project, a new collection of open-source R packages from Revolution Analytics, makes it possible to write map-reduce jobs in R to analyze huge data sets...

One question I get a lot about how to read large data frames into R. There are some useful tricks that can save you both time and memory when reading large data frames but I find that many people are not aware of them. Of course, your ability to read...

I have a bunch of time series whose power spectra (FFT via R's spectrum() function) I've been trying to visualize in an intuitive, aesthetically appealing way. At first, I just used lattice's bwplot, but the spacing of the X-axis here really matters. ...

plot.table function in the Systematic Investor Toolbox is a flexible table drawing routine. plot.table has a simple interface and takes following parameters: plot.matrix – matrix with data you want to plot smain – text to draw in (top, left) cell; default value is blank string highlight – Either TRUE/FALSE to indicate if you want to

On Thursday October 13, Hong Ooi from ANZ (Australia and New Zealand Banking Group) will give a webinar presentation on Successful Uses of R (along with SAS and Excel) in Banking. We've covered Hong's use of R for credit risk analysis here on the blog before, and in next week's webinar he'll take an in-depth look at applying R...