RcppArmadillo 0.3.2.3

Conrad releaser version 3.2.3 of Armadillo a few days ago, and the corresponding RcppArmadillo package 0.3.2.3 is now CRAN. (For these keeping score 3.2.1 never was a full release, and 3.2.2 containing fixes for a build issue that did not affect the ...

Read more »

A big list of the things R can do

July 2, 2012
By

R is an incredibly comprehensive statistics package. Even if you just look at the standard R distribution (the base and recommended packages), R can do pretty much everything you need for data manipulation, visualization, and statistical analysis. And for everything else, there's more than 5000 packages on CRAN and other repositories, and the big-data capabilities of Revolution R Enterprise....

Read more »

precise pangolin (Ubuntu 12.04)

July 2, 2012
By
precise pangolin (Ubuntu 12.04)

Following the crash of my hard drive right before leaving Kyoto, I bought a cheap Compaq Presario CQ57 to reinstall Ubuntu 12.04 over the weekend (and have a laptop available before leaving for Australia…)  It took about one hour to install from the DVD and everything seems to be working out of the box. The

Read more »

Graphics Artifacts from Quarterly Commentary

July 2, 2012
By
Graphics Artifacts from Quarterly Commentary

For my Q2 2012 commentary, I tried multiple graphs to illustrate the disconnect of the US stock markets with the rest of the world.  I think I finally settled on this simple Excel bar graph populated by Bloomberg data, but I thought some might lik...

Read more »

Project Euler — problem 11

July 2, 2012
By
Project Euler — problem 11

It’s been a while since I solved one Euler problem last time. Has been busy. Now I’m back and continue to solve the next problem, which is to find the maximum. Let’s take a look at the 11th problem: What … Continue reading →

Read more »

Citing R or SAS

July 2, 2012
By
Citing R or SAS

One of us recently read a colleague's first draft of a paper, in which she had written: "All analyses were done in R 2.14.0." We assume we're preaching to the converted here, when we say that the enormous amount of work that goes into R needs to be re...

Read more »

My first competition at Kaggle

July 2, 2012
By
My first competition at Kaggle

For me Kaggle becomes a social network for data scientist, as stackoverflow.com or github.com for programmers. If you are data scientist, machine learner or statistician you better off to have a profile there, otherwise you do not exist. Nevertheless, I won’t bet on rosy future for data scientist as journalists suggest (sexy job for next

Read more »

Popularity of R continues

July 2, 2012
By
Popularity of R continues

No doubt those that read my blog know that the tools I use to do my Industrial Engineering and Operations Research work heavily rely on the open source side of software.  That is why I try to support as many open source projects such as COIN-OR, G...

Read more »

Moving beyond hopeless graphics

July 2, 2012
By

I was at a talk awhile ago where the speaker presented tables with 4, 5, 6, even 8 significant digits even though, as is usual, only the first or second digit of each number conveyed any useful information. A graph would be better, but even if you’re too lazy to make a plot, a bit The post Moving...

Read more »

Random portfolios versus Monte Carlo

July 2, 2012
By
Random portfolios versus Monte Carlo

What is the difference between Monte Carlo — as it is usually defined in finance — and random portfolios? The meaning of “Monte Carlo” The idea of “Monte Carlo” is very simple.  It is a fancy word for “simulation”. As usual, it is all too possible to find incredibly muddied explanations of such a simple … Continue reading...

Read more »

Simple distribution plot in R

July 2, 2012
By
Simple distribution plot in R

Plot the distribution of a sample as bars and add a histogram line for visualizing the sample characteristics. No related posts.

Read more »

MatLab, SAS, STATA, SPSS, Excel users: Try R, damn it!

July 2, 2012
By
MatLab, SAS, STATA, SPSS, Excel users: Try R, damn it!

Due to my work with a multitude of statistical packages in my career I may be able to evaluate a lot of them. I’ve first used Excel for my calculations as most of the normal users do. I like the idea behind a spreadsheet and the combination of data and click-to-do functions. Nevertheless I’ve often

Read more »

Olive vs. Sunflower oil Spectra – 002 (ChemoSpec)

July 1, 2012
By
Olive vs. Sunflower oil Spectra – 002 (ChemoSpec)

I add other data set of “sunflower oil” to import together with the olive oil into ChemoSpec R package. Before, as I showed in a video (Preparing spectra to import into ChemoSpec), every sample has been acquired with a NIR instrument (in transmitta...

Read more »

Visualizing uncertainty using Jackknife

July 1, 2012
By
Visualizing uncertainty using Jackknife

Once again, I (re)discovered last week at the Rmetrics conference that old toolds can be extremely interesting to illustrate complex ideas, like uncertainty in fnancial markets, and stock prices. For instance a 99.5% quantile: we look for the scena...

Read more »

FAO statistical areas in Google Earth

July 1, 2012
By
FAO statistical areas in Google Earth

Some time ago I did a blog describing how to get ICES and NAFO statistical areas, originally as shapefiles into Google Earth readable format (ICES, NAFO). These areas are the primary fisheries areas upon which nominal fisheries catch statistics have be...

Read more »

Colored 3D Map

July 1, 2012
By
Colored 3D Map

In my previous post, I showed how to make a 3D view of an area using the persp function. However, I felt this was not a complete representation, especially for digital elevation. While looking for some reference for my presentation on the use of R for...

Read more »

Modeling Trick: Masked Variables

July 1, 2012
By
Modeling Trick: Masked Variables

A primary problem data scientists face again and again is: how to properly adapt or treat variables so they are best possible components of a regression. Some analysts at this point delegate control to a shape choosing system like neural nets. I feel such a choice gives up far too much statistical rigor, transparency and Related posts:

Read more »

Step up your R capabilities with new tools for increased productivity

July 1, 2012
By
Step up your R capabilities with new tools for increased productivity

I guess a lot of us actually use many tools to accomplish various things in their everyday life. There is the (not that uncommon) case where you have to build something that others will use in their everyday business life to get insights, information and/or take decisions. The basic implementation scenario here would be to build an excel workbook where

Read more »

Step up your R capabilities with new tools for increased productivity

July 1, 2012
By
Step up your R capabilities with new tools for increased productivity

I guess a lot of us actually use many tools to accomplish various things in their everyday life. There is the (not that uncommon) case where you have to build something that others will use in their everyday business life to get insights, information and/or take decisions. The basic implementation scenario here would be to build an excel workbook where ...read more

Read more »

Rchievement of the day #3: Bloggin’ from R

July 1, 2012
By
Rchievement of the day #3: Bloggin’ from R

I have become a complete knitr addict of late and have been using it in combination with RStudio’s R markdown support on a regular basis. In fact I wrote this post using it! It then dawned on me how great it would be if I … Continue reading →

Read more »

Getting numpy data into R

June 30, 2012
By

The other day, I found myself confronted with a large number of large files. Which were presented in (gzip-)compressed ascii format---which R reads directly via gzfile() connections---as well as (compressed) numpy files. The numpy can be read very ...

Read more »

Fun with the googleVis Package for R

June 30, 2012
By
Fun with the googleVis Package for R

Using packages such as ggplot and lattice can produce some great charts and visualization, but googleVis is tough to beat for interactive charts to share on the web. Click on the image below to open up the html page. This was all done in R! I will warn you that it is too easy to … Continue reading...

Read more »

R is fun

June 30, 2012
By

As mentioned in Universal portfolio, part 6, the wealth reported in Table 8.4 of Universal Portfolios could not be reproduced.  An other observation is that the random weight vectors reported in Table 8.4 are shown in descending lex...

Read more »

Coefficient Plots in R

June 30, 2012
By

One popular trend in presenting results is the "coefficient plot," an alternative to the table of regression coefficients. I am seeing this a little more often in political science research and have received a few requests for code, so I … Contin...

Read more »

Simple and heuristic optimization

June 29, 2012
By
Simple and heuristic optimization

This week, at the Rmetrics conference, there has been an interesting discussion about heuristic optimization. The starting point was simple: in complex optimization problems (here we mean with a lot of local maxima, for instance), we do not ne...

Read more »

igraph and SNA: an amateur’s dabbling

June 29, 2012
By
igraph and SNA: an amateur’s dabbling

I’ve been playing with the igraph package a bit lately (see previous post HERE) and wanted to approach a problem I once visited in the past. The basic gist of the problem is this: Students in a class are asked … Continue reading →

Read more »

SYTYCD — where are these terrific dancers come from?

June 29, 2012
By

It’s Saturday midnight and I’m already sleepy. However, after several hours, I finally got this google geographic map embedded in my post. Aha!!! This is about 20 finalists from the 9th season of So You Think You Can Dance. I count the states … Continue reading →

Read more »

Simple and heuristic optimization

June 29, 2012
By
Simple and heuristic optimization

This week, at the Rmetrics conference, there has been an interesting discussion about heuristic optimization. The starting point was simple: in complex optimization problems (here we mean with a lot of local maxima, for instance), we do not necessarily need extremely advanced algorithms that do converge extremly fast, if we cannot ensure that they reach the optimum. Converging extremely fast, with a...

Read more »

Wrap-up on Blogging with R Markdown and tumblr

June 29, 2012
By

This is a wrap-up post to summarize a few of the issues I’ve found so far with blogging on tumblr with R Markdown. tumblr Puts a 1Mb Cap On Its HTML Editor Fair warning. When I tried eating my own dogfood while writing the previous posts, I found that I had to manually upload all those pretty screenshots of the tumblr interface. For some...

Read more »