Improving data quality with deducorrect

July 1, 2011
By
Improving data quality with deducorrect

Does your raw numerical data suffer from typos? sign errors? variable swaps? rounding errors? You may be able to fix all that with the deducorrect package. Today, we (that is Edwin de Jonge, Sander Scholtus and myself) uploaded the, 1.0-0 … Continue reading →

Read more »

Improving data quality with deducorrect

July 1, 2011
By
Improving data quality with deducorrect

Does your raw numerical data suffer from typos? sign errors? variable swaps? rounding errors? You may be able to fix all that with the deducorrect package. Today, we (that is Edwin de Jonge, Sander Scholtus and myself) uploaded the, 1.0-0 release to CR...

Read more »

Low-hanging R Optimizations on Ubuntu

July 1, 2011
By
Low-hanging R Optimizations on Ubuntu

A friend of mine brought my attention recently to the fact that the default R install is way to generic and thus sub-optimal. While I didn’t go all the way rebuilding everything from scratch, I did find a few cheap steps one can do to help things a little. Simply install the libatlas3gf-base package. That’s

Read more »

The stringr package just turned 0.5

July 1, 2011
By

(Re-posted from a post made by Hadley Wickham to the mailing list) # About the stringr package Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent...

Read more »

Weighting and prediction in sample surveys

July 1, 2011
By

A couple years ago Rod Little was invited to write an article for the diamond jubilee of the Calcutta Statistical Association Bulletin. His article was published with discussions from Danny Pfefferman, J. N. K. Rao, Don Rubin, and myself. Here it all is.I'll paste my discussion below, but it's worth reading the others' perspectives too. Especially...

Read more »

A third year of entries!

July 1, 2011
By
A third year of entries!

Contrary to previous reports, we started blogging after our book was published, with the conceit that we were adding examples to the book. Today marks the second anniversary of the book's appearance and of the blog. To celebrate, we're turning over o...

Read more »

RcppArmadillo 0.2.25

Following a series of pre-releases, Armadillo version 2.0.0 was announced by Conrad Sanderson earlier in the week. As it happens, it contained another minor build regression so version 2.0.1 followed the next day. We created versions 0.2.24 and 0.2...

Read more »

Last Time That Happened

June 30, 2011
By
Last Time That Happened

I bought a lottery ticket yesterday. I hardly ever buy them. The last time I did I lost a dollar. Actually, every time I've bought a ticket I've lost.  Yesterday I was at the local gas station in line behind a bloke who had a comprehensive folder ...

Read more »

Beating Kenneth French Small – High

June 30, 2011
By
Beating Kenneth French Small – High

With 148 pageviews over the last 24 hours, my post Kenneth French Gift to the Finance World has been popular relative to most of my other posts.  I think the popularity is due to Kenneth French’s notoriety and the amazing outperformance of Small...

Read more »

Mapping SNPs to Genes for GWAS Enrichment Analysis

June 30, 2011
By
Mapping SNPs to Genes for GWAS Enrichment Analysis

There are several tools available for conducting a post-hoc analysis of GWAS data looking for enrichment of significant SNPs using literature or pathway based resources. Examples include GRAIL, ALLIGATOR, and WebGestalt among others (see SNPath R Pac...

Read more »

R 2.13.1 scheduled for July 8

June 30, 2011
By

The R Core team announced today that the next update to R, version 2.13.1, will be released on July 8. Core team member Peter Dalgaard noted: The 2.13.0 release has been quite solid, but some people expect an x.y.1 to roll out on larger installations for the next academic year. Of course, there have also been a sampling of...

Read more »

Cash Might be Your Tail Risk

June 30, 2011
By
Cash Might be Your Tail Risk

Just like James Montier Ode to the Joy of Cash and David Merkel Got Cash?, I think cash is an extremely powerful tool.  Of the 3 ingredients (land, labor, and capital) of the economy, capital (cash) is most scarce at the end of a crisis or recessi...

Read more »

Calculate LCM of ‘n’ consecutive natural numbers using R

Calculate LCM of ‘n’ consecutive natural numbers using R

Well I shall hit the nail right on the head and not beat around the bush. I am taking programming lessons on R from my pro bro(Utkarsh Upadhyay) who agreed on teaching me only if I would disseminate my learning(a paranoia all the open-source advocates ...

Read more »

Don’t stop being a statistician once the analysis is done

June 30, 2011
By

I received an email from the Royal Statistical Society asking if I wanted to submit a 400-word discussion to the article, Vignettes and health systems responsiveness in cross-country comparative analyses by Nigel Rice, Silvana Robone and Peter C. Smith. My first thought was No, I can’t do it, I don’t know anything about health systems

Read more »

Clarke and Ainsworth’s BIOENV and BVSTEP (and BIO-BIO etc…)

June 30, 2011
By
Clarke and Ainsworth’s BIOENV and BVSTEP (and BIO-BIO etc…)

Nonmetric Multidimensional Scaling (NMDS) plot of vegetation sample dissimilarities with best correlating environmental variables (left) and species (right) plotted as vectors (datasets "varespec" and "varechem" from the package ...

Read more »

Winsorization

June 30, 2011
By
Winsorization

Winsorization replaces extreme data values with less extreme values. But why Extreme values sometimes have a big effect on statistical operations.  That effect is not necessarily a good effect.  One approach to the problem is to change the statistical operation — this is the field of robust statistics. An alternative solution is to just change … Continue reading...

Read more »

Kenneth French Gift to the Finance World

June 29, 2011
By
Kenneth French Gift to the Finance World

Kenneth French gives one of the best gifts to the finance world at his website http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html.  I am surprised I have waited so long to write a post about this wonderful resource.  Aft...

Read more »

Two browsers for R help documentation

June 29, 2011
By
Two browsers for R help documentation

The same excellent documentation for R commands is available through two different help browsers: text and HTML, and let’s see how how each looks, works, and how to switch the default. Look and feel Here is how both look for … Continue reading →

Read more »

roll calls, ideal points, 112th Congress

June 29, 2011
By
roll calls, ideal points, 112th Congress

Now that classes are over, I took a little time to update my scripts that update the analysis of Congressional roll calls in close to real time.   Links appear at the top of the blog.   As of about 15 minutes ago, we’re up to 77 non-unanimous roll calls in the 112th Senate.  

Read more »

A simple ggplot2 scatterplot

June 29, 2011
By

Here’s a bit of code used to produce one of the figures in my recent paper dealing with modeling rocky intertidal snail body temperatures. This was my first foray into ggplot2, and it only involved a few hours of head-scratching. The plot is a co...

Read more »

Putting together multinomial discrete regressions by combining simple logits

June 29, 2011
By

When predicting 0/1 data we can use logit (or probit or robit or some other robust model such as invlogit (0.01 + 0.98*X*beta)). Logit is simple enough and we can use bayesglm to regularize and avoid the problem of separation. What if there are more than 2 categories? If they’re ordered (1, 2, 3, etc),

Read more »

Stata 12 embraces structural equation models

June 28, 2011
By
Stata 12 embraces structural equation models

Stata 12 has just been announced. The software will start shipping by the end of July.  A key new feature introduced in the new version is the module for structural equation models (SEM), a staple tool in marketing, psychology, and several other research disciplines.LISREL and AMOS have...

Read more »

Saving Chunks of SSURGO Data in SoilWeb for Google Earth

June 28, 2011
By
Saving Chunks of SSURGO Data in SoilWeb for Google Earth

SoilWeb is an interactive, multifaceted interface to USDA-NCSS soil survey information. Our SoilWeb application for Google Earth streams soil map units and point data as you navigate across the lower '48 states. Currently, our system imposes a 30,000 ...

Read more »

Synctex with Sweave/pgfSweave in TeXShop/TeXWorks

June 28, 2011
By

Ever been editing an .Rnw (Sweave) file and tried to sync a pdf with the source in TeXShop (or TeXWorks) and had it open the .tex file? This is because the synctex information (in the .synctex.gz file) is messed up. Both TeXShop and TeXWorks support synctex, that means that if everything is groovy, we should

Read more »

Benchmarking Revolution R for data mining

June 28, 2011
By
Benchmarking Revolution R for data mining

The blog Heuristically Andrew puts Revolution R through its paces by running some benchmarks versus open-source R for data mining applications. The benchmarks set out to answer the following question: I recently upgraded my notebook (where I often use R for data mining) and was faced with two questions: for the fastest speed for building models, do I use...

Read more »

p-Values for Cointegration Tests With Breaks in the Data

June 28, 2011
By
p-Values for Cointegration Tests With Breaks in the Data

In an earlier post I went through some econometrics that involved the problem of testing for multivariate cointegration in the case where there are one or more trend-breaks or level-breaks in the time-series data.  Specifically, I talked about the modified Trace tests introduced by Johansen et al. (2000), and I mentioned the really nice discussion of the application of these tests...

Read more »

Visualizing Periodic Data

June 28, 2011
By

Yesterday the Princeton machine learning reading group went through a paper by Tukey on “Some graphic and semigraphic displays”. One issue we talked about at length was Tukey’s idiosyncratic approach to visualizing periodic data in a circular format to emphasize the connections between the “start” and the “end” of the data set. Allison Chaney pointed

Read more »

Slideshow of Graphs since TimelyPortfolio’s November Inception

June 28, 2011
By
Slideshow of Graphs since TimelyPortfolio’s November Inception

I have had a lot of fun blogging at Timely Portfolio over the last 7 months.  Here are all the graphs that I have shown.  Thanks especially to R.

Read more »

Monitoring Sources of Bond Returns with ML/BAC Corporate OAS and CPI

June 28, 2011
By
Monitoring Sources of Bond Returns with ML/BAC Corporate OAS and CPI

In response to the nice comment requesting an update to Monitoring Sources of Bond Return and also longer history, I thought I would update the original and then rerun with CPI to give a longer time series.  For even longer history back to 1919, s...

Read more »