## Text Editors in The Lord of the Rings

July 28, 2011
Prompted by a passing thought about TextMate, I thought I’d make a comprehensive, accurate, unbiased, and irrefutable survey of text editors by way of comparison to locations in The Lord of the Rings. TextMate: Minas Tirith A once-great but now decaying city. Only the King has the power to renew it, but he is a long absent, indeed...

## Challenge alert — material identification

July 28, 2011
We start yet another series of post — challenge alerts. This series is intended to share news about machine learning or data mining challenges which may be interesting to the members of our community, possibly with some brief introduction to the problem. So if you hear about some contest, notify us on Skewed distribution. Today

## Le Monde puzzle [#29]

July 28, 2011
$Le Monde puzzle [#29]$

This week, the puzzle from the weekend edition of Le Monde was easy to state: in the sequence (8+17n), is there a 6th power? a 7th? an 8th? If so, give the first occurrence. So I first wrote an R code for a function testing whether an integer is any power: (The function returns the

## Pattern Recognition: forward Boxplot Trajectories using R

July 28, 2011
Although the following discussion can apply to the Quantitative Candlestick Pattern Recognition series, it is addressing the same issue as any basic conditional type system -- how and when to exit.  The following is one way to visualize and think ...

## Program for useR! 2011 available

July 28, 2011
The final program for the worldwide user conference, useR! 2011, is now available as a downloadable booklet (PDF, 7Mb). Revolution Analytics is very proud to sponsor this annual gathering of R users from around the world, and the program includes an outstanding lineup of speakers from the R Core Group, package developers, users in industry and academia, and the...

July 28, 2011
It always strikes me as curious that some posts get a lot of love on Twitter, while others get many more shares on Facebook: What accounts for this difference? Some of it is surely site-dependent: maybe one blogger has a Facebook page but not a Twitter account, while another has these roles reversed. But even

## Getting rid of white space at the beginning and end of a string

July 28, 2011
There are situations where we are working with character strings extracted from various sources and it can be annoying when there is white space at the beginning and/or end of the strings. This whitespace can cause problems when attemping to sort, subset or various other common operations. The stringr package has a handy function str_trim

## I can’t resist a word cloud: now using R!

July 28, 2011
The wordcloud package is word clouds for R with a difference: they look great. Of course, having just analysed online coverage of the ISMB conference, I had to run all 6 906 comments from the 2008-2011 meetings through some code. If you followed along via the Sweave code, I went as far as generating the

## More S&P 500 correlation

July 28, 2011
Here are some additions to the previous post on S&P 500 correlation. Correlation distribution Before we only looked at mean correlations.  However, it is possible to see more of the distribution than just the mean.  Figures 1 and 2 show several quantiles: 10%, 25%, 50%, 75%, 90%. Figure 1: Quantiles of 50-day rolling correlation of … Continue reading...

## Displaying Missouri sex offender/child day care facility proximity map using batchgeo.com

July 28, 2011
Computer Assisted Reporting This is the last of four articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The previous two articles gave deta...

## Marketing optimization using the nonlinear minimization function nlm

July 28, 2011
Guest post by Bob Agnew ([email protected]) —————- Introduction Marketing optimization consists of assigning offers to prospects in order to maximize total expected profit subject to a few general linear constraints and the requirement that a prospect receives at most one offer.  What distinguishes these problems is their sheer size.  With millions of prospects, brute force linear solvers are unsuitable. ...

## Computing distance matrix between Missouri sex offenders and child daycare facilities

July 28, 2011
Computer Assisted Reporting This is the third of four articles about analyzing distances between sex offenders and child daycare centers in Missouri as part of a joint project with KSHB NBC Action News in Kansas City. The previous article explained how...

## Core not in CiRM

July 27, 2011
Despite not enjoying this year the optimal environment of CiRM, we are still making good progress on the revision (or the R vision) of Bayesian Core. In the past two days, we went over Chapters 1 (Introduction), 2 (Normal Models), 5 (Capture-Recapture Experiments), and 6 (Mixture Models), with Chapters 3 (Regression), 4 (Generalised Linear Models)

## Creating Financial Instrument metadata in R

July 27, 2011
(This is a guest post by Ilya Kipnis)When trading stocks in a single currency, instrument metadata can be safely ignored because the multiplier is 1 and the currencies are all the same.  When doing analysis on fixed income products, options, futures, or other complex derivative instruments, the data defining the properties of these instruments becomes critical to tasks...

## Join the Reserves

July 27, 2011
Most forget that the tremendous macro imbalances caused by the 10 Trillion in foreign reserves are just 14 years old phenomenon but the results have been and will be profound.  The buying started after the Asia Pacific collapse of 1997, and the As...

## Analysis of ISMB coverage at FriendFeed: 2008 – 2011

July 27, 2011
ISMB/ECCB 2011 was held between July 15-19 this year and as in previous years, FriendFeed was used to cover the meeting. Last year, I wrote a post about how to use R to analyse the coverage. I was planning something similar for 2011 when I thought: we have 4 years of ISMB at FriendFeed now

## A Currency Graph

July 27, 2011
Here's a graph in which nodes (and edges) represent currencies (and exchange rates):library(igraph)currencies <- factor(c("EUR", "USD", "JPY", "GBP"))df <- subset(expand.grid(from=currencies, to=currencies), from != to)GetExchangeRate...

## Adding Sweave.sty and Rd.sty to your LaTeX path in Mac OS X

July 27, 2011
Okay, this extremely specific, but a headache that must be solved if you plan to create or modify R extensions (aka R source code packages). This is because R requires that you also build documentation through a process that is … Continue reading →

## Rcpp 0.9.6

July 27, 2011
A new maintenance release version 0.9.6 of Rcpp went onto CRAN and into Debian earlier today. This release contains a fix which helps the RppEigen package (mentioned previously on this blog), as well as an addition which permits user-defined fina...

## A slice of infinity

July 27, 2011
Peng Yu sent me an email about the conditions for convergence of a Gibbs sampler: The following statement mentions convergence. But I’m not familiar what the regularity condition is. “But it is necessary to have a finite probability of moving away from the current state at all times in order to satisfy the regularity conditions on which

## How big block trades affect stock market prices?

July 27, 2011
I will be giving a presentation on “Optimal transaction cost” in Vilnius on  16  August. While preparing the presentation and looking for an optimal execution solution, a natural question arises: does the size of the trade affect stock market price? I’m sure, you would say 100 % yes. Well, you would be right, but what is

## The Stats Clinic

July 27, 2011
Here at HSL we have a lot of smart kinda-numerate people who have access to a lot of data. On a bad day, kinda-numerate includes myself, but in general I’m talking about scientists who have have done an introductory stats course, but not much else. When all you have is a t-test, suddenly everything looks

## An efficient way to do dataset intersection

July 27, 2011
The main message is to use "match" to get index of needed rows and then get the rows by the index, instead of using the row names to select, which is much slower. Here is example:In example above, we know that the same values of column 2nd have same values of columns from 4th to the end. So, instead...

## Social reception for R enthusiasts at Joint Statistics Meetings

July 27, 2011
At the JSM 2011 conference in Miami on Monday, August 1, Revolution Analytics will be hosting a cocktail reception for R users, and anyone interested in R. From 5:30-7:30 at Emeril's Miami Beach House (in the JSM host hotel), we'll have appetizers, drinks, and the opportunity to socialize with R users from around the word. There will also be...

## Word Cloud in R

July 27, 2011
A word cloud (or tag cloud) can be an handy tool when you need to highlight the most commonly cited words in a text using a quick visualization. Of course, you can use one of the several on-line services, such as wordle or tagxedo ,...

## RStudio available at cloudnumbers.com

RStudio™ is an integrated development environment (IDE) for the statistical software R (www.r-project.org). It combines an intuitive user interface with powerful coding tools to help you get the most out of R. cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. One often used application at cloudnumbers.com’s

## Bayesian Core and loose logs

July 26, 2011
By

Jean-Michel (aka Jean-Claude!) Marin came for a few days so that we could make late progress on the revision of our book Bayesian Core towards an Use R! version. In one of the R programs in the mixture chapter, we were getting improbable answers, until we found an R mistake in the shape of which