Quick-R Gets a Blog

November 21, 2011
By
Quick-R Gets a Blog

After maintaining the  Quick-R website (R tutorials and jumpstart) for the past 5 years, I’ve decided to add a blog so that I can go into more detail on topics related to practical data analysis. The statMethods blog will contain articles … Continue reading →

Read more »

Asynchrony in market data

November 21, 2011
By
Asynchrony in market data

Be careful if you have global daily data. The issue Markets around the world are open at different times.  November 21 for the Tokyo stock market is different from November 21 for the London stock market.  The New York stock market has yet a different November 21. The effect The major effect is that correlations … Continue reading...

Read more »

Popular Baby Names Walk-Through Part 1 – Web Scrapping and ggploting

November 20, 2011
By
Popular Baby Names Walk-Through Part 1 – Web Scrapping and ggploting

This is the first walk-through I have posted. Reading these types of posts has been incredibly helpful as I have been learning R and other useful tools in the Unix universe. Hopefully you find it helpful. First, I have been watching Google Python Video...

Read more »

Indexing Nested Lists

November 20, 2011
By
Indexing Nested Lists

I’ve long searched for a somewhat efficient approach to indexing nested lists and/or environments and here’s my best solution so far. For me, being able to compute such an index is the crucial part in order to actually manage such nested structures (which are very helpful in a lot of scenarios where formal classes are … Continue reading...

Read more »

Cross Pollination from Systematic Investor

November 20, 2011
By
Cross Pollination from Systematic Investor

After reading the fine article Style Analysis from Systematic Investor and What we can learn from Bill Miller and the Legg Mason Value Trust from Asymmetric Investment Returns, I thought I should combine the two in R with the FactorAnalytics package.&n...

Read more »

Matrix Performance in R

November 20, 2011
By

I've been working on an example of the new Graph Template Language from SAS.  As I don't have direct access to SAS 9.2, I've been developing via email with a friend that does.In the meantime, I thought I would start to investigate some of the performance properties of R.  I work in the financial risk industry and I often...

Read more »

Interactive presentations with deck.js

November 20, 2011
By
Interactive presentations with deck.js

Data analysis is often an iterative and interactive process. However, when I present about this subject, I feel often limited by the presentation software I use. It doesn't matter if I use LaTeX/PDF, PowerPoint or Keynote. In all cases it is either ver...

Read more »

Tikz absolute positioning

November 20, 2011
By

When working with a tikz drawing within LaTeX document we might want to locate an object using an absoute position on the page rather than leaving LaTeX to make the decision for us. The use of nodes and the current.page label in conjunction with some other parameters attached to the tikz drawing will allow us

Read more »

RcppArmadillo 0.2.30 (and 0.2.29)

November 20, 2011
By

A few days ago, Conrad Sanderson released the first pre-release version of what will be Armadillo 2.4.*, giving it the 2.3.91 release handle. We folded this into RcppArmadillo release 0.2.30, with Romain making a few adjustments to our template stru...

Read more »

CloudStat: Learn & Do R Language on the Cloud

November 19, 2011
By

Hi! My fellow useRs! I’m making a web-based R Language platform ( http://cloudst.at/ ) for my students. My aim is to decrease the learning curve of learning R and collaboration. With CloudStat, there is no more download, installation, update and mai...

Read more »

Keep your files in sync for free

November 19, 2011
By
Keep your files in sync for free

It is not uncommon to have two computers at work, four at home and a server out on the wild, wild internet (that's what we have, anyway ... wait, we forgot one in London). How to keep all these files in sync? Here are our file synchronization tips.

Read more »

Data is everywhere!

November 19, 2011
By
Data is everywhere!

I was writing earlier today that I am getting really fed to using the same datasets over and over again. Of course using the same data over time with different methods (eg look this) serves really well on a comparison scope but still we can use other data in a web world. For example, you

Read more »

Data is everywhere!

November 19, 2011
By
Data is everywhere!

I was writing earlier today that I am getting really fed to using the same datasets over and over again. Of course using the same data over time with different methods (eg look this) serves really well on a comparison scope but still we can use other data in a web world. For example, you ...read more

Read more »

Public vote open for Mendely-PLoS Binary Battle: vote rOpenSci!

November 19, 2011
By
Public vote open for Mendely-PLoS Binary Battle: vote rOpenSci!

http://www.surveygizmo.com/s3/722753/Mendeley-PLoS-Binary-Battle-Public-Vote

Read more »

randu dataset, part 2

November 19, 2011
By
randu dataset, part 2

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.It can be done in four steps:identifying four points lying...

Read more »

Plotting randu dataset

November 18, 2011
By
Plotting randu dataset

Recently I have stumbled on help description of randu data from datasets package. It contains pseudorandom numbers that are flawed. Help says that "In three dimensional displays it is evident that the triples fall on 15 paralle...

Read more »

Let the Lagging Lead

November 18, 2011
By
Let the Lagging Lead

THIS IS NOT INVESTMENT ADVICE AND WILL PROBABLY WIPE OUT ALL YOUR MONEY IF PURSUED.  While exploring utilities, I discovered a strange phenomenon that I have not quite thoroughly understood, but I attribute to the business cycle.  If I dust o...

Read more »

Analyzing birth rates from census data from RevoScaleR

November 18, 2011
By
Analyzing birth rates from census data from RevoScaleR

In yesterday's webinar, "New Features in Revolution R Enterprise 5.0 to Support Scalable Data Analysis", Sue Ranney demonstrated the features of the RevoScaleR big data analysis package included with Revolution R Enterprise. In the webinar, she showed how to use the rxImport function to import big data sets from SAS, SPSS or ODBC, how to use the rxDataStep function...

Read more »

My talk on doing phylogenetics in R

November 18, 2011
By

I gave a talk today on doing very basic phylogenetics in R, including getting sequence data, aligning sequence data, plotting trees, doing trait evolution stuff, etc.Please comment if you have code for doing bayesian phylogenetic inference in R.  ...

Read more »

My talk on doing phylogenetics in R

November 18, 2011
By
My talk on doing phylogenetics in R

I gave a talk today on doing very basic phylogenetics in R, including getting sequence data, aligning sequence data, plotting trees, doing trait evolution stuff, etc.Please comment if you have code for doing bayesian phylogenetic inference in R.  ...

Read more »

Why balloons are better than balls (in urn schemes)

November 18, 2011
By

The below is taken from a work in progress: The Polya urn is a heuristic associated with Dirichlet process mixtures. We present the scheme in a modified format, using balloons instead of balls, where the probability of drawing a balloon from the urn is proportional to its volume. Balloons are preferred because their volume may

Read more »

htmlToText(): Extracting Text from HTML via XPath

November 18, 2011
By
htmlToText(): Extracting Text from HTML via XPath

Converting HTML to plain text usually involves stripping out the HTML tags whilst preserving the most basic of formatting. I wrote a function to do this which works as follows (code can be found on github): The above uses an XPath approach to achieve it’s goal. Another approach would be to use a regular expression. These

Read more »

FBS Coaches Avg. Salary

November 18, 2011
By
FBS Coaches Avg. Salary

Of course, a few days before I leave for a much needed vacation, USA Today released their updated NCAA coaching salary database. For sports junkies, there’s an unlimited number of analysis and visualizations that can be done on the data. I took a quick break from packing to condense the data to a csv and

Read more »

Style Analysis

November 17, 2011
By
Style Analysis

During the final stage of asset allocation process we have to decide how to implement our desired allocation. In many cases we will allocate capital to the mutual fund managers who will invest money according to their fund’s mandate. Usually there is no perfect relationship between asset classes and fund managers. To determine the true

Read more »

Spinner Doctor

November 17, 2011
By
Spinner Doctor

The setup Dan Meyer, a (former?) math teacher with some extraordinary ideas, has a nifty concept for teaching expected values: “So one month before our formal discussion of expected value, I’d print out this image, tack a spinner to it, … Continue reading →

Read more »

Revolution Newsletter: November 2011

November 17, 2011
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you read the full November edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. R Training from Hadley Wickham: The R guru (and author of ggplot2, plyr and several...

Read more »

GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

November 17, 2011
By
GEO2R: Web App to Analyze Gene Expression in GEO Datasets Using R

Gene Expression Omnibus is NCBI's repository for publicly available gene expression data with thousands of datasets having over 600,000 samples with array or sequencing data. You can download data from GEO using FTP, or download and load the data direc...

Read more »

Using neural network for regression

November 17, 2011
By
Using neural network for regression

Artificial neural networks are commonly thought to be used just for classification because of the relationship to logistic regression: neural networks typically use a logistic activation function and output values from 0 to 1 like logistic regression. However, the worth … Continue reading →

Read more »

Bayesian vs. Frequentist Intervals: Which are more natural to scientists?

November 17, 2011
By

I don't know, of course, because the evidence at hand is based on my experience. But, I'll leave the reader to consider whether these observations generalize. Proponents of Bayesian statistical inference argue that Bayesian credible intervals are more intuitive than the frequentist confidence intervals, because the Bayesian inference is a probability statement about a parameter.

Read more »