Yesterday Gareth pointed me to this article on the BBC website. The underlying story has to do with Meredith Kercher's murder and the subsequent trial involving mainly her flat-mate Amanda Knox, in Perugia (Italy). As often in these grue...

Integrating Documentation and Calculation Integrating Documentation and Calculation This post is a first in that I've authored it using RStudio. I would guess most people who work in computational finance or quantitative risk are at least familiar with R. Unfortunately R as...

R users know it can be finicky in its requirements and opaque in its error messages. The beginning R user often then happily discovers that a mailing list for dealing with R problems with a large and active user base, R-help, has existed since 1997. Then, the beginning R user wades into the waters, asks…

by Thomas Dinsmore On April 26, SAS published on its website an undated Technical Paper entitled Big Data Analytics: Benchmarking SAS, R and Mahout. In the paper, the authors (Allison J. Ames, Ralph Abbey and Wayne Thompson) describe a recent project to compare model quality, product completeness and ease of use for two SAS products together with open source...

Whenever somebody mentions “The Simpsons” it always stirs up feelings of nostalgia in me. The characters, uproarious gags, zingy one-liners, and edgy animation all contributed towards making, arguably, the greatest TV ever. However, it’s easy to forget that as a TV show “The Simpsons” is still ongoing—in its twenty-fourth season no less. For me, and

JOURNEE R LE 24/05/2013 A PARIS - MUSEUM NATIONAL D'HISTOIRE NATURELLE VENEZ PARTAGER VOTRE (ME)CONNAISSANCE DE R ! Au programme : chimie, rapports automatisés, mélanges gaussiens, analyse spatiale, analyse de réseaux, interface R, atlas botanique, bases de données, analyse textuelle et biologie de l'évolutionInscription :...

A client has a specific audit they perform quarterly across 200 of their manufacturing plants. The audit has 8 distinct sections examining the different areas of the plant (shipping, receiving, storage, packaging,etc.) Instead of having one cumulative final score, the audit displays a final score for each section. I wanted to examine the distribution of

New .deb packages for R 3.0.0 on Raring Ringtail (13.04) are available on both CRAN and my Launchpad PPA. Some notes for this release. The initial build for Raring Ringtail did not come with Tcl/Tk being supported. This issue has been addressed and...

In the feature selection chapter, we describe several search procedures ("wrappers") that can be used to optimize the number of predictors. Some techniques were described in more detail than others. Although we do describe genetic algorithms and how they can be used for reducing the dimensions of the data, this is the first of series of blog posts that...

Tree methods such as CART (classification and regression trees) can be used as alternatives to logistic regression. It is a way that can be used to show the probability of being in any hierarchical group. The following is a compilation of many of the key R packages that cover trees and forests. The goal here

Every once in a while I encounter a problem that requires the use of calculus. This can be quite bothersome since my brain has refused over the years to retain any useful information related to calculus. Most of my formal training in the dark arts was completed in high school and has not been covered

I believe that the NY Times interactive feature 512 Paths to the White House is one of the best visualizations of all time. It is even better when we have details on the process of creating this marvel. Although the graphic is not suited for other data sources (please tell me if this is not...

Major retailers like Williams Sonoma use UpStream Software for marketing analytics, including revenue attribution, targeting, and optimization. In the video below Tess Nesbitt (senior statistician at UpStream) describes how she uses Revolution R Enterprise and Hadoop to figure out the impact on various marketing channels (for example direct mail, email offers, and catalogs) on consumer retail sales. (The slides...

Revolution R Enterprise 6.2 is now available, and includes many new features that enhance the performance, scalability and enterprise readiness of R. On May 1, product manager Thomas Dinsmore will give an overview of the new features in a 30-minute webinar. You can register for the webinar (and the post-webinar slides and replay) at the link below. Revolution Analytics...

After reading the arXiv paper by Korattikara, Chen and Welling, I wondered about the expression of the acceptance step of the Metropolis-Hastings algorithm as a mean of log-likelihoods over the sample. More specifically the long sleepless nights at the hospital led me to ponder the rather silly question of the impact of replacing mean by

I was trying to change few levels in my factor variable by simply coercing characters on that factor variable but it dint seem to work. data(iris)iris$Species <- rep("Random", 71) ## Warning: invalid factor level, NAs generated iris$Species ## setosa ...

I got intrigued by the numbers presented in this news article talking about the re-trial in the Amanda Knox case. The defendants, accused and initially convicted of murder, were acquitted in the appeal's instance when the judge ruled that the forensic evidence was insufficiently conclusive. The appeals judge ignored the forensic scientist's advice to retest a DNA sample,...

When investment skill is simulated, it is often presented as if it is obvious how to do it. Maybe I’m wrong, but I don’t think it’s obvious. Previously In “Simple tests of predicted returns” we saw that prediction quality need not look like what you would find in a textbook. For example, there was a … Continue reading...

Often I like to reduce the alpha value (level of transparency) of colours to identify patterns of over-plotting when displaying lots of data points with R. So, here is a tiny function that allows me to add an alpha value to a given vector of colours, e.g. a RColorBrewer palette, using col2rgb and rgb, which has...

Introduction Recently, I introduced the golden search method – a special way to save computation time by modifying the bisection method with the golden ratio, and I illustrated how to minimize a cusped function with this script. I also wrote an R function to implement this method and an R script on how to apply this

