schoolmath

March 6, 2010
By
schoolmath

In connection with the Le Monde puzzle of last week, I was looking for an R function that would give me the prime factor decomposition of any integer. Such a function exists within the package schoolmath, developped by Joerg Schlarmann and Josef Wienand. It is called prime.factor and it returns the prime factors of any

Read more »

Visualizing Drought

March 6, 2010
By
Visualizing Drought

The impacts of drought depend on time-scale. On short time-scales, drought means dry soil. On long time-scales, it means dry rivers and empty reservoirs. A region may simultaneously experience dry conditions on one time-scale and wet conditions on another e.g. wet soil but low streamflow or visa versa. Standardized Precipitation Index (SPI) is a widely

Read more »

Contingency Tables – Fisher’s Exact Test

March 6, 2010
By

A contingency table is used in statistics to provide a tabular summary of categorical data and the cells in the table are the number of occassions that a particular combination of variables occur together in a set of data. The relationship between variables in a contingency table are often investigated using Chi-squared tests. The simplest contingency

Read more »

Posterior likelihood

March 6, 2010
By
Posterior likelihood

At the Edinburgh mixture estimation workshop, Murray Aitkin presented his proposal to compare models via the posterior distribution of the likelihood ratio. As already commented in a post last July, the positive aspect of looking at this quantity rather than at the Bayes factor is that the priors are then allowed to be improper if

Read more »

oro.nifti 0.1.3

March 5, 2010
By

The R package oro.nifti has been released.  Medical imaging data, in NIfTI or Analyze formats, may be input, created from scratch, converted from DICOM (using oro.dicom) and output to a file. 

Read more »

oro.nifti 0.1.3

March 5, 2010
By

The R package oro.nifti has been released.  Medical imaging data, in NIfTI or Analyze formats, may be input, created from scratch, converted from DICOM (using oro.dicom) and output to a file. 

Read more »

InformationWeek on Urlocker

March 5, 2010
By

InformationWeek published today a profile of Zack Urlocker, the former MySQL executive who recently joined REvolution's board: Former MySQL staffer Zack Urlocker is going to try to do for predictive analytics what he once did for relational database systems: bring open source code to a user population that hasn't necessarily had access to the technology before. REvolution Computing of...

Read more »

Because it’s Friday: Why a Salad Costs More than a Big Mac

March 5, 2010
By
Because it’s Friday: Why a Salad Costs More than a Big Mac

In the US, at least. Via The Consumerist: Incidentally, the US FDA doesn't publish pyramids like this any more: it's now a garish personalized 2-d triangle with stripes. But at least it doesn't make the error of dimension committed by the left-hand pyramid: that orange section is a hell of a lot larger than 74% of the volume. The...

Read more »

GLMM revisted

March 5, 2010
By
GLMM revisted

A short while ago, I reported some discrepancies between the results produced by "lme4" and other R packages as well as Stata. Today I upgraded to the most recent version of "lme4a" and re-ran my model. The error of false convergence disappea...

Read more »

R amusements

March 5, 2010
By
R amusements

On a lark, and to kill a bit of time, I was running the R fortune command looking for references to SAS. Here’s what two successive random fortunes turned up. Can there be two more antipodal opinions about the same product? I laughed out loud. > fortune(‘SAS’) There are companies whose yearly license fees to

Read more »

Example 7.25: compare draws with distribution

March 5, 2010
By
Example 7.25: compare draws with distribution

In example 7.24, we demonstrated a Metropolis-Hastings algorithm for generating observations from awkward distributions. In such settings it is desirable to assess the quality of draws by comparing them with the target distribution.Recall that the dis...

Read more »

Getting data from an image (introductory post)

March 5, 2010
By
Getting data from an image (introductory post)

Hi there! This blog will be dedicated to data visualization in R. Why? Two reasons. First, when it comes to statistics, I am always starting by some exploratory analyses, mostly with plots. And when I handle large quantities of data, it’s nice to make some graphs to get a grasp about what is going on.

Read more »

Accessing Climate Change Data and a Custom Panel Function for Filled Polygons

March 4, 2010
By
Accessing Climate Change Data and a Custom Panel Function for Filled Polygons

GCS Model Grids Recently finished some collaborative work with Vishal, related to visualizing climate change data for the SEI. This project was funded in part by the California Energy Commission, with additional technical support from the Google Earth Team. One of the final products was an...

Read more »

An email about mixtures

March 4, 2010
By
An email about mixtures

As a coincidence, or not, I received the following email just before starting our mixture estimation workshop (the above is Ben Nevis on Monday, whose skyline really looks like a three component mixture!) and giving a discussion on label switching: I am implementing a Markov-Chain Monte Carlo method for Gibbs sampling from a simple mixture

Read more »

Yet Another plyr Example

March 4, 2010
By
Yet Another plyr Example

another plyr example quantiles (0.05, 0.25, 0.5, 0.75, 0.95) of DSC by temperature bin There are plenty of good examples on how to use functions from the plyr package. Here is one more, demonstrating how to use ddply with a custom function. Note that there...

Read more »

More on the Economist’s special report on big data

March 4, 2010
By

I totally missed this the other day, but there's much more to that special report on the data deluge in The Economist. (Thanks to readers SB and DN for pointing this out.) There's an total of nine articles in the report (you can find them all in the Related Items box on this page), including a section on business...

Read more »

New Le Monde puzzle

March 3, 2010
By
New Le Monde puzzle

When I first read Le Monde puzzle this weekend, I though it was even less exciting than the previous one: find and , such that is a multiple of . The solution is obtained by brute-force checking through an R program: and then the a next solution is (with several values for N). However, while

Read more »

Quality trimming in R using ShortRead and Biostrings

March 3, 2010
By

I wrote an R function to do soft-trimming, right clipping FastQ reads based on quality.This function has the option of leaving out sequences trimmed to extinction and will do left-side fixed trimming as well.#softTrim#trim first position lower than minQuality and all subsequent positions#omit sequences that after trimming are shorter than minLength#left trim to firstBase, (1 implies no left trim)#input:...

Read more »

Quality trimming in R using ShortRead and Biostrings

March 3, 2010
By

I wrote an R function to do soft-trimming, right clipping FastQ reads based on quality.This function has the option of leaving out sequences trimmed to extinction and will do left-side fixed trimming as well.#softTrim#trim first position lower than minQuality and all subsequent positions#omit sequences that after trimming are shorter than minLength#left trim to firstBase, (1 implies no left trim)#input:...

Read more »

Example of plotting a serial position curve in R

March 3, 2010
By
Example of plotting a serial position curve in R

A while ago I wrote a co-wrote chapter for an introductory psychology text book Essential Psychology: A Concise Introduction. This is a book edited and written by members of the department where I work. My contribution was the chapter on huma...

Read more »

R-bloggers (with ~50 blogs) has just crossed the 1000 subscribers mark!

March 3, 2010
By
R-bloggers (with ~50 blogs) has just crossed the 1000 subscribers mark!

I am very happy to discover so many of you readers are interested in the content that bloggers are posting about R. Over the past few months, 50 bloggers have come together in this place to share with all of us what they write about R. If you wish, you can see all of the articles they wrote in...

Read more »

Intelligent Enterprise: You Can Predict that R Will Succeed

March 3, 2010
By

Analyst David Stodder at Intelligent Enterprise also noted the activity around R at the recent Predictive Analytics World conference in San Francisco, and he reviews his impressions in a column today. In fact, he attributes the increasing prominence of predictive analytics to R: Possibly the most important factor influencing the spread of predictive analytics is the growing popularity of...

Read more »

Arrange multiple ggplot2 plots in the same image window

March 3, 2010
By

In a previous tutorial I showed you how to create plots faceted by the level of a third variable using ggplot2. A commenter asked about using faceted plots and viewports and reminded me of this function I found in the ggplot2 Google group. The arrange function below is similar to using par(mfrow=c(r,c)) in base graphics to put more than...

Read more »

Augmented support for complex survey designs in R

March 3, 2010
By
Augmented support for complex survey designs in R

We'll get back to code examples later this week, but wanted to let you know about an R package with updated functionality in the meantime.The appropriate analysis of sample surveys requires incorporation of complex design features, including stratification, clustering, weights, and finite population correction. These can be address in SAS and R for many common models. Section...

Read more »

Analyzing Google’s Winter Olympics Search Traffic with R

March 2, 2010
By
Analyzing Google’s Winter Olympics Search Traffic with R

The Official Google Blog today includes an analysis of Google's search traffic related to the recently-concluded Winter Olympics, correlating various high-profile events with searches from particular countries. For example, traffic from the United States shows the expected diurnal cycle but with promintent peaks for the opening ceremony and the hockey matches featuring the USA team: It's not specifically stated...

Read more »

MySQL alum Zack Urlocker join’s REvolution’s board

March 2, 2010
By

As you might have heard from this morning's press release, we've just welcomed a new member to REvolution's board of directors: Zack Urlocker. Zack has an impeccable open-source pedigree: until recently, he was responsible for engineering and marketing at MySQL, the wildly successful open-source database company recently acquired by Oracle (via its acquisition of Sun). Zack is also a...

Read more »

ACM Data Mining Camp, March 20

March 2, 2010
By

Following last year's successful unconference on data mining, the Bay Area Association for Computing Machinery (ACM) will again host the 2010 ACM Data Mining Camp on March 20 in San Jose, CA. The event is free and runs from 11:15am - 7:30pm, with an optional 2-hour pre-camp training in the morning. (REvolution Computing is a proud sponsor of this...

Read more »

The Economist reports on the information explosion

March 1, 2010
By

The current edition of The Economist includes a "special report on managing information", targeting the issue of the information explosion / data deluge / whatever you want to call it these days. It includes the usual attributes of the problem: data is being collected faster than we can store it, astronomers are creating petabytes of data daily, the usual....

Read more »

REvolution Computing hiring parallel computing developer

March 1, 2010
By

We're looking for a programmer with experience in high-performance computing and the R system to work on the ParallelR suite and other data-analysis projects. Sound like anyone you know? Check out the details at the link below. REvolution Computing careers: Parallel Computing Developer

Read more »