R and the web (for beginners), Part I: How is the local nuclear plant doing?

June 21, 2012
By
R and the web (for beginners), Part I: How is the local nuclear plant doing?

One of the things I especially like about R is its ability to easily access and process data from the web. If you are new to R, or never have used it to access data from the Internet, here is the first part of a little series of posts with examples to ...

Read more »

An example on sentiment analysis with R

June 21, 2012
By
An example on sentiment analysis with R

by Yanchang Zhao, RDataMining.com There is a nice example on sentiment analysis with R at <http://viksalgorithms.blogspot.com.au/2012/06/tracking-us-sentiments-over-time-in.html>. In the example, the Wikileaks cable corpus is analyzed to track US sentiments of other countries and their presidents over time. The example describes … Continue reading →

Read more »

FDA: R OK for drug trials

June 21, 2012
By
FDA: R OK for drug trials

In a poster (PDF) presented at the UseR 2012 conference, FDA biostatistician Jae Brodsky reiterated the FDA policy regarding software used to prepare submissions for drug approvals with clinical trials: Sponsors may use R in their submissions. The FDA does not endorse or require any particular software to be used for clinical trial submissions, and there are no regulations...

Read more »

Normalising data within groups

June 21, 2012
By
Normalising data within groups

Occasionally it proves useful to normalise data. By this I mean to scale it between zero and one. Admittedly, most people frown of this but there are papers out there with this method in use*. How do we go about this? Its a very simple formula to calculate: y' = y/sqrt(sum(y^2)) So we square all

Read more »

The stimuli-as-fixed-effect fallacy

June 21, 2012
By

Neuroskeptic has just blogged on a new paper by Judd, Westfall and Kenny on Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. I can't access the original pap...

Read more »

Solving Big Problems with Oracle R Enterprise, Part I

June 21, 2012
By
Solving Big Problems with Oracle R Enterprise, Part I

Abstract: This blog post will show how we used Oracle R Enterprise to tackle a customer’s big calculation problem across a big data set. Overview: Databases are great for managing large amounts of data in a central place with rigorous enterprise-level controls.  R is great for doing advanced computations.  Sometimes you need to...

Read more »

Experimental Design: Problem Set

June 21, 2012
By
Experimental Design: Problem Set

QUESTIONSThe tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. The following data have been collected: MixingTechniques Tensile Strength (lb/in­­2) ...

Read more »

The Great Julia RNG Refactor

June 21, 2012
By

Many readers of this blog will know that I’m a big fan of Bayesian methods, in large part because automated inference tools like JAGS allow modelers to focus on the types of structure they want to extract from data rather than worry about the algorithmic details of how they will fit their models to data.

Read more »

Will Tiger Woods catch Jack Nicklaus? And a discussion of the virtues of using continuous data even if your goal is discrete prediction

June 21, 2012
By

I know next to nothing about golf. My mini-golf scores typically approach the maximum of 7 per hole, and I’ve never actually played macro-golf. I did publish a paper on golf once (A Probability Model for Golf Putting, with Deb Nolan), but it’s not so rare for people to publish papers on topics they know The post Will...

Read more »

To R or not to R, and other events

June 21, 2012
By
To R or not to R, and other events

New events To R, or not to R, that is the question The Statistical Computing Section of the Royal Statistical Society presents a one-day event on 2012 June 29. The details of the day.  See in particular the abstract for “Teaching statistics: a pain in the R?” by Andy Field — it involves a sheepdog … Continue reading...

Read more »

Body Weight in the United States – Part 3, "Contributing Factors"

June 20, 2012
By
Body Weight in the United States – Part 3, "Contributing Factors"

Carbs In Part 2 of this series, micro-nutrients were cited as a non-factor for weight gain. This is not the case with macro-nutrients (carbohydrates, fats, proteins, water). While fats, proteins and water are essential (without them you could no...

Read more »

R Workshop: Introducing Slidify – HTML5 slides from R markdown

June 20, 2012
By
R Workshop: Introducing Slidify – HTML5 slides from R markdown

Thursday, June 28th, 2012  19h. <–  new evening time! Tomson House: 650 McTavish, H3A 1Y2, Montréal, QC <– new social setting! guRu: Ramnath Vaidyanathan (McGill University) Ramnath Vaidyanathan will introduce the group to slidify, his brand new R package. From the slidify website: “The objective of slidify is to make it easy to create reproducible

Read more »

UseR 2012 highlights

June 20, 2012
By
UseR 2012 highlights

The eighth annual R user conference, UseR! 2012, has come and gone — and what an event it was! I've been to five useR! conferences so far, and each one improves upon the last. This year's conference at Vanderbilt was the best so far: an outstanding location (my first visit to Nashville, a great city), excellent facilities (the lecture...

Read more »

Simulation and resampling

June 20, 2012
By
Simulation and resampling

In financial applications one frequently comes across the need to draw samples according to an assumed distribution. This could be because one wants to simulate stock prices for a Monte Carlo simulation, to price an option payout or to generate … Continue reading →

Read more »

The R-Podcast Episode 8: Visualization with ggplot2

June 20, 2012
By

I’m happy to present this jam-packed episode of the R-Podcast dedicated to using the ggplot2 package for visualization. This episode will have a companion screencast released in the next few days. I use data from the Hockey Summary Project to demonstrate how to create a series of boxplots of NHL regular season attendance for each

Read more »

Color Palettes in RGB Space

June 20, 2012
By
Color Palettes in RGB Space

Introduction I've recently been interested in how to communicate information using color. I don't know much about the field of Color Theory, but it's an interesting topic to me. The selection of color palettes, in particular, has been a topic I've been faced with lately. I downloaded 18 different sequential color palettes from Cynthia Brewer's

Read more »

Euro 2012: End of Group Stage

June 20, 2012
By
Euro 2012: End of Group Stage

Time for an update of the plots. Here are the teams still left in the competition. This is the group stratification. Finally, the busy plot.

Read more »

Factor Attribution

June 19, 2012
By
Factor Attribution

I came across a very descriptive visualization of the Factor Attribution that I will replicate today. There is the Three Factor Rolling Regression Viewer at the mas financial tools web site that performs rolling window Factor Analysis of the “three-factor model” of Fama and French. The factor returns are available from the Kenneth R French:

Read more »

useR 2012: impressions, tutorials

June 19, 2012
By
useR 2012: impressions, tutorials

First of all, useR 2012 (the 8th International R User Conference) was, hands down, the best-organized conference I’ve had the luck to attend. The session chairs kept everything moving on time, tactfully but sternly; the catering was delicious and varied; … Continue reading →

Read more »

Pricing options on multiple assets (part 1) with trees

June 19, 2012
By
Pricing options on multiple assets (part 1) with trees

I am a big fan of trees. It is a very nice way to see how financial pricing works, for derivatives. An with a matrix-based language (R for instance), it is extremely simple to compute almost everything. Even multiple assets options. Let us see how ...

Read more »

Notes from A Recent Spatial R Class I Gave

June 19, 2012
By

Below is a link to a pdf (compiled with the amazing knitr package) and some accompanying data for a recent short course I gave on basic spatial data import/analysis/visualization in R. The class was only two hours and some of the participants were bein...

Read more »

Notes from A Recent Spatial R Class I Gave

June 19, 2012
By

Below is a link to a pdf (compiled with the amazing knitr package) and some accompanying data for a recent short course I gave on basic spatial data import/analysis/visualization in R. The class was only two hours and some of the participants were bein...

Read more »

Time Series Data Library now on DataMarket

June 19, 2012
By

The Time Series Data Library is a collection of about 800 time series that I have maintained since about 1992, and hosted on my personal website. It includes data from a lot of time series textbooks, as well as many other series that I’ve either collected for student projects or helpful people have sent to me. I’ve now moved...

Read more »

Correction to intergraph update

June 19, 2012
By

It turned out that I wrote the last post on “intergraph” package too hastily. After some feedback from CRAN maintainers and deliberation I decided to release the updated version of the “intergraph” package under the  original name (so no new package “intergraph0″) with version number 1.2. This version relies on legacy “igraph” version 0.5, which

Read more »

CIO.com: R is a Big Data open-source technology to watch

June 19, 2012
By

CIO.com recently published its list of 9 open-source technologies to watch. Hadoop is first on the list, and second up is the R Project: R is an open source programming language and software environment designed for statistical computing and visualization. R was designed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand beginning in 1993...

Read more »

A wrapper for R’s data() function

June 19, 2012
By

The workflow for statistical analyses is discussed at several places. Often, it is recommended:never change the raw data, but transform it, keep your analysis reproducible, separate functions and data, use R package system as organizing structure. In some recent projects I tried an S4 class approach for this workflow, which I want to present and discuss. It makes use of...

Read more »

Where are the Fat Tails?

June 19, 2012
By
Where are the Fat Tails?

In Crazy RUT, I started to explore why the moving average strategy has failed for the last 2 decades on the Russell 2000.  I still do not have an answer, but I thought looking at skewness and kurtosis might help explain some of the challenge of be...

Read more »

google R style guide

June 19, 2012
By

After writing several hundreds of lines of R codes, I start to pay some attention to my coding style. Fortunately, I find a document about R style guide in google code. Surprisingly, R is among the most popular programming languages, such as C++, objective-C, python, java and html. I didn’t realize … Continue reading →

Read more »

For those interested in knitr with Rmarkdown to beamer slides

June 19, 2012
By

Seeing as more people were interested in how I created my slides for the R conference than what was actually on them, I posted my source and commands to github. I used knitr with Rmarkdown source to convert to markdown that went into pandoc to create beamer slide. Enjoy! https://gist.github.com/2955183

Read more »