Bank of America Merrill Lynch Bond Returns on St. Louis Fed

May 4, 2011
By
Bank of America Merrill Lynch Bond Returns on St. Louis Fed

After all my complaining about proprietary data, the St. Louis Federal Reserve announced today the availability of Bank of America Merrill Lynch Bond Indicies on their FRED site.  The data is limited in scope and duration, but accessibility especi...

Read more »

Using R for Map-Reduce applications in Hadoop

May 4, 2011
By

Data Scientist Antonio Piccolboni recently published this comparison of the various language and interfaces available for programming Big Data analysis tasks in the map-reduce framework. The interfaces he reviewed included: Java Hadoop (mature and efficient, but verbose and difficult to program) Cascading (brings an SQL-like flavor to Java programming with Hadoop) Pipes/C++ (a C++ interface to programming on Hadoop)...

Read more »

R Exercise with USDA Data

May 4, 2011
By
R Exercise with USDA Data

After the helpful comment by Bradley on my post Commodity Index Estimators, How about the National Agricultural Statistics Service (NASS)? Looks like they have information for prices received back to 1908 for many agricultural goods (http://www.nass.u...

Read more »

PLINK/SEQ for Analyzing Large-Scale Genome Sequencing Data

May 4, 2011
By
PLINK/SEQ for Analyzing Large-Scale Genome Sequencing Data

PLINK/SEQ is an open source C/C++ library for analyzing large-scale genome sequencing data. The library can be accessed via the pseq command line tool, or through an R interface. The project is developed independently of PLINK but it's syntax will be f...

Read more »

PLINK/SEQ for Analyzing Large-Scale Genome Sequencing Data

May 4, 2011
By

PLINK/SEQ is an open source C/C++ library for analyzing large-scale genome sequencing data. The library can be accessed via the pseq command line tool, or through an R interface. The project is developed independently of PLINK but it's syntax will be f...

Read more »

Whassup with glm()?

May 4, 2011
By

We're having problem with starting values in glm(). A very simple logistic regression with just an intercept with a very simple starting value (beta=5) blows up....

Read more »

Again with Ledoit-Wolf and factor models

May 4, 2011
By
Again with Ledoit-Wolf and factor models

We come closer to a definitive answer on the relative merit of Ledoit-Wolf shrinkage versus a statistical factor model for variance matrices. Previously This post builds on the post entitled: A test of Ledoit-Wolf versus a factor model That post depended on some posts previous to it. New information Previously we generated random portfolios with … Continue reading...

Read more »

Invisible blogs!

May 4, 2011
By
Invisible blogs!

Julien just signaled an intermitent disappearance of the posts on the ‘Og, depending on the operating system: Ubuntu 10.10 seems to be working (most of the time!) while Mac and Windows are having problems… This is beyond my abilities, I have contacted WordPress support, maybe they are working on some new feature, maybe I once

Read more »

Day #35 replacing characters

May 4, 2011
By

Today I had a meeting with Emmanuel. He is a guy from inside Janssen who is very good with R-scripts. He made a lot of great plots which I had to use for our reports. During the meeting we came to a conclusion that all the difficult R-scripting he did,...

Read more »

bigkmeans also works well for ordinary matrix objects: The biganalytics package

May 4, 2011
By
bigkmeans also works well for ordinary matrix objects: The biganalytics package

The bigmemory is an excellent package for handling big matrix in R. There are several sister packages provided by "The Bigmemory Project": biganalytics for analysis, bigtabulate for tabulation, bigalgebra for linear algebra functionality, synchronicity for synchronization via mutexes and interprocess communication and message passing.biganalytics provides a few functions for analysis: linear regression model, generalized linear regression model, and...

Read more »

bigkmeans also works well for ordinary matrix objects: The biganalytics package

May 4, 2011
By
bigkmeans also works well for ordinary matrix objects: The biganalytics package

The bigmemory is an excellent package for handling big matrix in R. There are several sister packages provided by "The Bigmemory Project": biganalytics for analysis, bigtabulate for tabulation, bigalgebra for linear algebra functionality, synchronicity...

Read more »

Extension to mtable function

May 4, 2011
By

Here are some useful extension to the "mtable" function in the memisc package.

Read more »

Extension to mtable function

May 4, 2011
By

Here are some useful extension to the "mtable" function in the memisc package.

Read more »

Guide to Getting Started with R: 2011 Update

May 4, 2011
By

In mid-2009, I wrote a post on getting started with R. A lot has happened in the world of R over the last two years. New books, videos, online documentation, blogs and other resources have emerged. New community structures have emerged. As such I'v...

Read more »

Guide to Getting Started with R: 2011 Update

May 4, 2011
By
Guide to Getting Started with R: 2011 Update

In mid-2009, I wrote a post on getting started with R. A lot has happened in the world of R over the last two years. New books, videos, online documentation, blogs and other resources have emerged. New community structures have emerged. As such I've gi...

Read more »

How to learn R

May 3, 2011
By

Over at R community site inside-R.org, Revolution's Joseph Rickert has published a How-To guide with tips for new users on How to Learn R, with links to resources for R books, blogs and courses. Check it out at the link below. Inside-R: How to Learn R

Read more »

Putting Robust Standard Errors into LaTeX Tables: An Extension of mtable

May 3, 2011
By
Putting Robust Standard Errors into LaTeX Tables: An Extension of mtable

I recently discovered the mtable() command in the memisc library and its use with toLatex() to produce nice summary output for lm and glm objects in a nicely formatted table like this:Once you have your linear model objects, all you need is one command...

Read more »

Putting Robust Standard Errors into LaTeX Tables: An Extension of mtable

May 3, 2011
By
Putting Robust Standard Errors into LaTeX Tables: An Extension of mtable

I recently discovered the mtable() command in the memisc library and its use with toLatex() to produce nice summary output for lm and glm objects in a nicely formatted table like this:Once you have your linear model objects, all you need is one command...

Read more »

Fun with twitteR: Osama bin Laden tweets

May 3, 2011
By
Fun with twitteR: Osama bin Laden tweets

I thought it would be fun to play around with the R package twitteR , an R API into Twitter.  I decided to take the most prominent news story of the past few days, Osama bin Laden’s death, to see … Continue reading →

Read more »

Running R on an iPhone/iPad with RStudio

May 3, 2011
By
Running R on an iPhone/iPad with RStudio

This thread has been widely discussed on a lot of forums. To make a long story short, running natively R on an iDevice (meaning iPhone/iPad) is disabled by its OS, unless it is jailbroken. The steps for the installation through Cydia are described in this R wiki, or this post. But there are some limitations,

Read more »

CPI and US 10y Treasury Extreme –> System Idea

May 3, 2011
By
CPI and US 10y Treasury Extreme –> System Idea

When I see extremes, I feel compelled to explore. The US 10y Treasury yield is at an extreme versus the annualized 3 month CPI rate of change. From TimelyPortfolio Of course, I have to try to build a system around the idea.  While this 3 mont...

Read more »

Day #34 Detailing graphs

May 3, 2011
By

Today mostly existed in adding details or changing certain aspects of my graphs. For example, I had to turn around the y-axis on my levelplot, circleplot, … which wasn’t so easy at first. But after a bit of googling I found out I had to rev...

Read more »

Day #32-33 reporting, R and Birt

I’m starting on my real R-Scripts now. We got an assesment for some Reports and they needed my graphs and Veerle’s Reports. So the tasks are: Well distribution plot Quality plate control Surface Plot Heatmap CirclePlot and more to come&#823...

Read more »

To attach() or not attach(): that is the question

May 3, 2011
By
To attach() or not attach(): that is the question

R objects that reside in other R objects can require a lot of typing to access. For example, to refer to a variable x in a dataframe df, one could type df$x. This is no problem when the dataframe and variable names are short, but can become burdensom...

Read more »

Treebase trees from R

May 3, 2011
By
Treebase trees from R

Treebase is a great resource for phylogenetic trees, and has a nice interface for searching for certain types of trees. However, if you want to simply download a lot of trees for analyses (like that in Davies et al.), then you want to be able to access...

Read more »

Kaggle Competition Walkthrough: Introduction

May 3, 2011
By
Kaggle Competition Walkthrough: Introduction

Kaggle is a site for participating in predictive analytics competitions. It is also a great resource for learning how to build powerful predictive models, and the Overfitting competition provides a good introduction to the common tools used by a predic...

Read more »

For happy-R blogging

May 3, 2011
By
For happy-R blogging

You may notice that I don’t have that many posts on my blog, and they are all about R. The paucity of my posts makes me a bit sad, but not much, really… What makes (or better, used to make me) sad is that posts of R code (used to) look awful. However, your code

Read more »

Playing with robots

May 3, 2011
By
Playing with robots

My son would be extremely proud if I tell him I can spend hours building robots. Well, my robots are not as fancy as Dr Tenma's, but they usually do what I ask them to do. For instance, it is extremely simple to build a robot with R, to extract dat...

Read more »

Estimate Gene Diversity

May 3, 2011
By
Estimate Gene Diversity

I provide here an R function to estimate gene diversity of diallelic sites (e.g. SNPs), given allele frequencies at each segregating site. The function takes three input parameters: maf: a numeric value (or vector) representing minor allele frequency at each site. Default is 0.5 nreads: size of each resampling experiment. Default is 10000. nreplicates: Number

Read more »