Bayes fits the data less closely than maximum likelihood

March 1, 2010
Lluis Bermudez writes: I'm from University of Barcelona and I've using "arm" package to obtain posterior estimates of glm parameters. I usually worked with "glm" function, but I need more than a point estimation. The problem is that when using...

End of the month investment

March 1, 2010
It is know, that the first day of the month provides bullish edge. According to Quantifiable edges not all the months are equal. So, I made a test on S&P500 index, from January, 1980 until February, 2010. It is true, March isn’t the best month to run this strategy. Only 3 months have significant results

How to use mcsm

February 27, 2010
Within the past two days, I received this email Dear Prof.Robert I have just bought your recent book on Introducing Monte Carlo Methods with R.  Although I have checked your web page for the R programs (bits of the code in the book, codes for generating the figures and tec – not the package available

Calculating LT50 (median lethal temperature, aka LD50) quickly in R

February 27, 2010
Say you’ve got a bunch of survival/mortality data from an experiment. Maybe you exposed batches of snails to various high temperatures for a few hours, and recorded the number alive and dead in each batch at the end. Now you’d like to repor...

Be Careful Searching Python Dictionaries!

February 27, 2010
$Be Careful Searching Python Dictionaries!$

For my talk on High Performance Computing in R (which I had to reschedule due to a nasty stomach bug), I used Wikipedia linking data, an adjacency list of articles and the articles to which they link. This data was linked from DataWrangling and was originally created by Henry Haselgrove. The dataset is small on disk, but I needed...

An interesting paper

February 27, 2010
Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.The paper pointed out the following options to fit GLMM using R:glmerglmmMLglm...

oro.dicom 0.2.4

February 26, 2010
The R package oro.dicom is a major revision, and improvement, on the previous package DICOM.  New features include

• Increased speed
• Reading implicit value representations (VR's)
• Parsing SequenceItem tags (undefined lengths are allowed)
• Integration with oro.nifti to convert DICOM to NIfTI
Provided below is a straightforward application of the oro.dicom package to an...

Steve Miller on R at Predictive Analytics World

February 26, 2010
At the Information Management blog, Steve Miller has provided two great reviews (here and here) of last week's Predictive Analytics World conference, including a recap of the Bay Area User's Group meeting featuring John Chambers. (My personal highlight from John's talk? A photograph of the very first sketch of what was to become the S system, which ultimately begat...

Because it’s Friday: Visualizing an email chain

February 26, 2010
We've all been there: someone sends an email to a mailing list with a Reply-To directing responses back to the mailing list. Before long, someone replies (unwittingly, to everyone) to ask to be taken of the list. And before long, the entire affair devolves into an endless cycle of requests to unsubscribe and pleas to stop mailing the entire...

R tip: Finding the location of minimum and maximums

February 26, 2010
I can never remember this R command, so I am going to post it here which probably means I will always remember it and never have to look it up here again.

I sometimes want to find the location of a minimum or maximum value in a vector, so I can look up the corresponding position in another vector, or...

R and Sudoku solvers: Plus ca change…

February 25, 2010
Christian Robert blogged about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in Le Monde. That had my symapthy as I like evolutionary computing methods, and his chart is rather pretty. From there, this spread on to the REvolutions blogs where David Smith riffed on it, and showed the acual puzzle. That didn't stop things as Christian blogged once more about...

Welcome, Robin!

February 25, 2010
Robin Ryder started his new blog with his different solutions to Le Monde puzzle of last Saturday (about the algebraic sum of products…), solutions that are much more elegant than my pedestrian rendering. I particularly like the one based on the Jacobian of a matrix! (Robin is doing a postdoc in Dauphine and CREST—under my

Responding to the Flowingdata GDP Graph Challenge

February 25, 2010
Nathan Yau of Flowingdata put up a challenge earlier today to improve upon a graph showing government spending as a percentage of GDP, published in the Economist. The underlying data wasn’t available. So I put on my graph-to-numbers glasses on and pulled out some data. Here it is in case you want to have a

Nutritional supplements efficacy score – Graphing plots of current studies results (using R)

February 25, 2010
In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it) * * * * The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image...

Solving Sudoku with Simulated Annealing

February 25, 2010
How long would it take you to solve this devlishly hard Sudoku puzzle (from Le Monde)? You could do it the old-fashioned way -- with a pencil -- but Xi'an decided to solve it by programming a simulated annealing solver in R. The algorithm works by first guessing a solution at random -- filling in the empty cells above...

inkblot: an alternative to stacked bar graphs

February 25, 2010
Sometimes it is not easy to get useful information from a stacked bar chart, see for instance
this blogpost at Support Analytics.

So-called inkblot charts, as discussed at Kaiser Fung's
Junk Charts, allow the reader to focus on the evolution
of a time series.

Now how to make this kind of charts with R? I asked on
StackOverflow....

Interaction plot from cell means

February 24, 2010
I needed to produce a few a interaction plots for my book in R and, while the interaction.plot() function is useful it has a couple of drawbacks. First, the default output isn't very pretty. Second, it works from the raw data, whereas I often need plot...

FFT (Fast Fourier Transform) of time series — promises and pitfalls towards trading

February 24, 2010
Fig 1. FFT transformed time series (EBAY) reconstructed with first three and twenty harmonics, respectively.I see quite a few traders interested in advanced signal processing techniques. It is often instructive to see why they may or may not be useful....

ggplot2: Plotting Dates, Hours and Minutes

February 24, 2010
Plotting timeseries with dates on x-axis and times on y-axis can be a bit tricky in ggplot2. However, with a little trick this problem can be easily overcome. Let’s assume that I wanted to plot when the sun rises in London in 2010. sunriset function in maptools package calculates the sunrise times using algorithms provided

PoRtable…

February 24, 2010
Jobless as I might be, I do have some clients for data analysis. I try not to visit them in their office coz then things get really slow and time-consuming. When I can’t escape this, the worst thing is tuning data and software with client. So, I have a USB with portable versions of my

Object types in R: The fundamentals

February 24, 2010
If you're a self-taught R programmer, you've probably grappled with the different kinds of objects you can use in the language. When should you use a list instead of a vector? What's the difference between a factor and character vector? These questions are easier to answer when you have some of the basics of R's object types down pat,...

SoilWeb iPhone App: Beta-Testers?

February 23, 2010
iPhone App Screenshot rev 0.2 - icon

iphone App Screenshot rev 0.2 - in Fresno

The application is now...

Reminder: useR! 2010 abstracts due Monday

February 23, 2010
Don't forget, if you're planning to attend the R user conference useR! 2010 and are going to present a talk (and if not, why not?), abstracts are due for submission this coming Monday, March 1. That's also the deadline for early-bird registrations, so if you haven't registered yet, now is the time. useR! 2010: The R User Conference

Numerical Integration/Differentiation in R: FTIR Spectra

February 23, 2010
Stumbled upon an excellent example of how to perform numerical integration in R. Below is an example of piece-wise linear and spline fits to FTIR data, and the resulting computed area under the curve. With a high density of points, it seems like the linear approximation is most efficient and sufficiently accurate. With very large...