Intelligent Enterprise: You Can Predict that R Will Succeed

March 3, 2010
By

Analyst David Stodder at Intelligent Enterprise also noted the activity around R at the recent Predictive Analytics World conference in San Francisco, and he reviews his impressions in a column today. In fact, he attributes the increasing prominence of predictive analytics to R: Possibly the most important factor influencing the spread of predictive analytics is the growing popularity of...

Read more »

Arrange multiple ggplot2 plots in the same image window

March 3, 2010
By

In a previous tutorial I showed you how to create plots faceted by the level of a third variable using ggplot2. A commenter asked about using faceted plots and viewports and reminded me of this function I found in the ggplot2 Google group. The arrange function below is similar to using par(mfrow=c(r,c)) in base graphics to put more than...

Read more »

Augmented support for complex survey designs in R

March 3, 2010
By
Augmented support for complex survey designs in R

We'll get back to code examples later this week, but wanted to let you know about an R package with updated functionality in the meantime.The appropriate analysis of sample surveys requires incorporation of complex design features, including stratification, clustering, weights, and finite population correction. These can be address in SAS and R for many common models. Section...

Read more »

Analyzing Google’s Winter Olympics Search Traffic with R

March 2, 2010
By
Analyzing Google’s Winter Olympics Search Traffic with R

The Official Google Blog today includes an analysis of Google's search traffic related to the recently-concluded Winter Olympics, correlating various high-profile events with searches from particular countries. For example, traffic from the United States shows the expected diurnal cycle but with promintent peaks for the opening ceremony and the hockey matches featuring the USA team: It's not specifically stated...

Read more »

MySQL alum Zack Urlocker join’s REvolution’s board

March 2, 2010
By

As you might have heard from this morning's press release, we've just welcomed a new member to REvolution's board of directors: Zack Urlocker. Zack has an impeccable open-source pedigree: until recently, he was responsible for engineering and marketing at MySQL, the wildly successful open-source database company recently acquired by Oracle (via its acquisition of Sun). Zack is also a...

Read more »

ACM Data Mining Camp, March 20

March 2, 2010
By

Following last year's successful unconference on data mining, the Bay Area Association for Computing Machinery (ACM) will again host the 2010 ACM Data Mining Camp on March 20 in San Jose, CA. The event is free and runs from 11:15am - 7:30pm, with an optional 2-hour pre-camp training in the morning. (REvolution Computing is a proud sponsor of this...

Read more »

The Economist reports on the information explosion

March 1, 2010
By

The current edition of The Economist includes a "special report on managing information", targeting the issue of the information explosion / data deluge / whatever you want to call it these days. It includes the usual attributes of the problem: data is being collected faster than we can store it, astronomers are creating petabytes of data daily, the usual....

Read more »

REvolution Computing hiring parallel computing developer

March 1, 2010
By

We're looking for a programmer with experience in high-performance computing and the R system to work on the ParallelR suite and other data-analysis projects. Sound like anyone you know? Check out the details at the link below. REvolution Computing careers: Parallel Computing Developer

Read more »

Example 7.24: Sampling from a pathological distribution

March 1, 2010
By
Example 7.24:  Sampling from a pathological distribution

Evans and Rosenthal consider ways to sample from a distribution with density given by:f(y) = c e^(-y^4)(1+|y|)^3where c is a normalizing constant and y is defined on the whole real line.Use of the probability integral transform (section 1.10.8) is not feasible in this setting, given the complexity of inverting the cumulative density function.The Metropolis--Hastings algorithm is a Markov...

Read more »

Bayes fits the data less closely than maximum likelihood

March 1, 2010
By

Lluis Bermudez writes: I'm from University of Barcelona and I've using "arm" package to obtain posterior estimates of glm parameters. I usually worked with "glm" function, but I need more than a point estimation. The problem is that when using...

Read more »

Bayes fits the data less closely than maximum likelihood

March 1, 2010
By

Lluis Bermudez writes: I'm from University of Barcelona and I've using "arm" package to obtain posterior estimates of glm parameters. I usually worked with "glm" function, but I need more than a point estimation. The problem is that when using...

Read more »

End of the month investment

March 1, 2010
By
End of the month investment

It is know, that the first day of the month provides bullish edge. According to Quantifiable edges not all the months are equal. So, I made a test on S&P500 index, from January, 1980 until February, 2010. It is true, March isn’t the best month to run this strategy. Only 3 months have significant results

Read more »

How to use mcsm

February 27, 2010
By
How to use mcsm

Within the past two days, I received this email Dear Prof.Robert I have just bought your recent book on Introducing Monte Carlo Methods with R.  Although I have checked your web page for the R programs (bits of the code in the book, codes for generating the figures and tec – not the package available

Read more »

Calculating LT50 (median lethal temperature, aka LD50) quickly in R

February 27, 2010
By

Say you’ve got a bunch of survival/mortality data from an experiment. Maybe you exposed batches of snails to various high temperatures for a few hours, and recorded the number alive and dead in each batch at the end. Now you’d like to repor...

Read more »

Be Careful Searching Python Dictionaries!

February 27, 2010
By
Be Careful Searching Python Dictionaries!

For my talk on High Performance Computing in R (which I had to reschedule due to a nasty stomach bug), I used Wikipedia linking data, an adjacency list of articles and the articles to which they link. This data was linked from DataWrangling and was originally created by Henry Haselgrove. The dataset is small on disk, but I needed...

Read more »

An interesting paper

February 27, 2010
By
An interesting paper

Ben Bolker has an interesting paper (outline of a paper) comparing different approaches to estimate GLMM in R environment, which is very helpful to what I am doing right now.The paper pointed out the following options to fit GLMM using R:glmerglmmMLglm...

Read more »

oro.dicom 0.2.4

February 26, 2010
By
oro.dicom 0.2.4

The R package oro.dicom is a major revision, and improvement, on the previous package DICOM.  New features includeIncreased speedUploading only header information (for restricted memory)Reading implicit value representations (VR's)Parsing SequenceItem tags (undefined lengths are allowed)Integration with oro.nifti to convert DICOM to NIfTIProvided below is a straightforward application of the oro.dicom package to an...

Read more »

oro.dicom 0.2.4

February 26, 2010
By
oro.dicom 0.2.4

The R package oro.dicom is a major revision, and improvement, on the previous package DICOM.  New features include Increased speed Uploading only header information (for restricted memory) Reading implicit value representations (VR's) Parsing SequenceItem tags (undefined lengths are allowed) Integration with oro.nifti to convert DICOM to NIfTI Provided below is a straightforward application of the oro.dicom package to an...

Read more »

Steve Miller on R at Predictive Analytics World

February 26, 2010
By

At the Information Management blog, Steve Miller has provided two great reviews (here and here) of last week's Predictive Analytics World conference, including a recap of the Bay Area User's Group meeting featuring John Chambers. (My personal highlight from John's talk? A photograph of the very first sketch of what was to become the S system, which ultimately begat...

Read more »

Because it’s Friday: Visualizing an email chain

February 26, 2010
By
Because it’s Friday: Visualizing an email chain

We've all been there: someone sends an email to a mailing list with a Reply-To directing responses back to the mailing list. Before long, someone replies (unwittingly, to everyone) to ask to be taken of the list. And before long, the entire affair devolves into an endless cycle of requests to unsubscribe and pleas to stop mailing the entire...

Read more »

R tip: Finding the location of minimum and maximums

February 26, 2010
By

I can never remember this R command, so I am going to post it here which probably means I will always remember it and never have to look it up here again.I sometimes want to find the location of a minimum or maximum value in a vector, so I can look up the corresponding position in another vector, or...

Read more »

R tip: Finding the location of minimum and maximums

February 26, 2010
By

I can never remember this R command, so I am going to post it here which probably means I will always remember it and never have to look it up here again.I sometimes want to find the location of a minimum or maximum value in a vector, so I can look up the corresponding position in another vector, or...

Read more »

R and Sudoku solvers: Plus ca change…

February 25, 2010
By

Christian Robert blogged about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in Le Monde. That had my symapthy as I like evolutionary computing methods, and his chart is rather pretty. From there, this spread on to the REvolutions blogs where David Smith riffed on it, and showed the acual puzzle. That didn't stop things as Christian blogged once more about...

Read more »

R and Sudoku solvers: Plus ca change…

February 25, 2010
By

Christian Robert blogged about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in Le Monde. That had my symapthy as I like evolutionary computing methods, and his chart is rather pretty. From there, this spread on to the REvolutions...

Read more »

R and Sudoku solvers: Plus ca change…

February 25, 2010
By

Christian Robert blogged about a particularly heavy-handed solution to last Sunday's Sudoku puzzle in Le Monde. That had my symapthy as I like evolutionary computing methods, and his chart is rather pretty. From there, this spread on to the REvolutions blogs where David Smith riffed on it, and showed the acual puzzle. That didn't stop things as Christian blogged once more about...

Read more »

Welcome, Robin!

February 25, 2010
By
Welcome, Robin!

Robin Ryder started his new blog with his different solutions to Le Monde puzzle of last Saturday (about the algebraic sum of products…), solutions that are much more elegant than my pedestrian rendering. I particularly like the one based on the Jacobian of a matrix! (Robin is doing a postdoc in Dauphine and CREST—under my

Read more »

Responding to the Flowingdata GDP Graph Challenge

February 25, 2010
By
Responding to the Flowingdata GDP Graph Challenge

Nathan Yau of Flowingdata put up a challenge earlier today to improve upon a graph showing government spending as a percentage of GDP, published in the Economist. The underlying data wasn’t available. So I put on my graph-to-numbers glasses on and pulled out some data. Here it is in case you want to have a

Read more »

Nutritional supplements efficacy score – Graphing plots of current studies results (using R)

February 25, 2010
By
Nutritional supplements efficacy score – Graphing plots of current studies results (using R)

In this post I showcase a nice bar-plot and a balloon-plot listing recommended Nutritional supplements , according to how much evidence exists for thier benefits, scroll down to see it(and click here for the data behind it) * * * * The gorgeous blog “Information Is Beautiful” recently publish an eye candy post showing a “balloon race” image...

Read more »

Solving Sudoku with Simulated Annealing

February 25, 2010
By
Solving Sudoku with Simulated Annealing

How long would it take you to solve this devlishly hard Sudoku puzzle (from Le Monde)? You could do it the old-fashioned way -- with a pencil -- but Xi'an decided to solve it by programming a simulated annealing solver in R. The algorithm works by first guessing a solution at random -- filling in the empty cells above...

Read more »