Example 9.27: Baseball and shrinkage

April 16, 2012
By
Example 9.27: Baseball and shrinkage

To celebrate the beginning of the professional baseball season here in the US and Canada, we revisit a famous example of using baseball data to demonstrate statistical properties. In 1977, Bradley Efron and Carl Morris published a paper about the Jame...

Read more »

Benford’s Law

April 16, 2012
By
Benford’s Law

Here is a quick quiz. If you visit the Wikipedia page List of countries by GDP, you will find three lists ranking the countries of the world in terms of their Gross Domestic Product (GDP), each list corresponding to a different source of the data. If you pick the list according to the CIA (let’s

Read more »

Information flows like water

April 16, 2012
By
Information flows like water

Guiding a ship, it takes more than your skill Spark David Rowe’s Risk column this month is about data leverage. The idea is that you are leveraging your data if you are using it to answer questions that are too demanding of information. The piece reminded me of a talk that Dave gave a few … Continue reading...

Read more »

Borrowing Ideas from Timely Portfolio

April 15, 2012
By
Borrowing Ideas from Timely Portfolio

I want to highlight two great Visualization techniques I discovered by reading the fine blog from Timely Portfolio. First method is based on the lm System on Nikkei with New Chart. Let’s visualize Strategy’s Long/Short/Not Invested periods by highlighting the underlying (i.e. buy & hold) with green/red/gray. Following is a sample code that implements this

Read more »

Significance Test for Kendall’s Tau-b

April 15, 2012
By
Significance Test for Kendall’s Tau-b

A variation of the standard definition of Kendall correlation coefficient is necessary in order to deal with data samples with tied ranks. It known as the Kendall’s tau-b coefficient and is more effective in determining whether two non-parametric data samples with ties are correlated. read more

Read more »

The Popularity of Statistical Packages

April 15, 2012
By

No matter what your favourite statistical package is, you'll find this post by Robert Muenchen highly informative.Robert concludes that:"By most of the measures discussed here, R is competing well with the commercial software vendors. However, I advise not over generalizing from this data. SAS and SPSS continue to dominate the corporate world and Stata is doing quite...

Read more »

ggplot2 Time Series Heatmaps

April 15, 2012
By
ggplot2 Time Series Heatmaps

How do you easily get beautiful calendar heatmaps of time series in ggplot2? E.g:From MarginTaleI was impressed by the lattice-based  implementation from Paul Bleicher of Humedica, which you can find referenced in http://blog.revolutionanalytics.c...

Read more »

The R-Podcast Episode 5: Basic Package Management

April 15, 2012
By

After a brief delay here’s episode 5 of the R-Podcast. In this episode: R 2.15.0 released, listener feedback, and discussion on basic package management. I discuss helpful resources for finding packages, installation procedures, and how to determine what packages are installed in your R system, among other considerations. If you are interested in providing a

Read more »

Registration for R/Finance 2012 is Open

April 15, 2012
By
Registration for R/Finance 2012 is Open

Registration has been open for a while, but I wanted to point out the pre-conference seminars. Registrations are strong this year, so if you’re interested you’ll need to sign up before they sell out. Register here… As you probably know by now, the fourth annual R/Finance conference for applied finance using R will be held

Read more »

Visualization of Reading Level Frequency by Congressional Bill Stage

April 15, 2012
By
Visualization of Reading Level Frequency by Congressional Bill Stage

  Here’s a fun example of how you might use my data on Congressional bill length and complexity.  Imagine you want to understand the empirical distribution of Flesch-Kincaid reading level for Congressional bills and how this distribution is related to … Continue reading →

Read more »

R can write R code, too

April 14, 2012
By

In a recent blog post by CMastication, a little meme puzzle is presented with the introduction that a preschooler could solve it in 5-10 minutes, a programmer in an hour. I took the bait. The original problem goes like this: … Continue reading →

Read more »

Linguistic Notation Inside of R Plots!

April 14, 2012
By
Linguistic Notation Inside of R Plots!

So, I've been playing around with learning knitr, which is a Sweave-like R package for combining LaTeX and R code into one document. There's almost no learning curve if you already use Sweave, and I find a lot of knitr's design and usage to be a lot nicer.I wasn't going to make a blog post or tutorial about...

Read more »

Sweeping through data in R

April 14, 2012
By
Sweeping through data in R

How do you apply one particular row of your data to all other rows?Today I came across a data set which showed the revenue split by product and location. The data was formated to show only the split by product for each location and the overall split by...

Read more »

Implementing the Exact Binomial Test in Julia

April 14, 2012
By

One major benefit of spending my time recently adding statistical functionality to Julia is that I’ve learned a lot about the inner guts of algorithmic null hypothesis significance testing. Implementing Welch’s two-sample t-test last week was a trivial task because of the symmetry of the null hypothesis, but implementing the exact binomial test has proven

Read more »

Instrumental Variables without Traditional Instruments

April 14, 2012
By
Instrumental Variables without Traditional Instruments

Typically, regression models in empirical economic research suffer from at least one form of endogeneity bias. The classic example is economic returns to schooling, where researchers want to know how much increased levels of education affect income. Estimation using a simple linear model, regressing income on schooling, alongside a bunch of control variables, will typically

Read more »

Plotting conditional densities

April 14, 2012
By
Plotting conditional densities

Recently I have read a post on Comparing all quantiles of two distributions simultaneously on R-bloggers. In the post author plots two conditional density plots on one graph. I often use such a plot to visualize conditional densities of score...

Read more »

Introduction to Markov Chains and modeling DNA sequences in R

April 13, 2012
By
Introduction to Markov Chains and modeling DNA sequences in R

Markov chains are probabilistic models which can be used for the modeling of sequences given a probability distribution and then, they are also very useful for the characterization of certain parts of a DNA or protein string given for example, a bias t...

Read more »

knitr Performance Report-Attempt 1

April 13, 2012
By
knitr Performance Report-Attempt 1

I get very excited about new R packages, but rarely is my excitement so fulfilled as with knitr.  Even with no skill, I have already been able to adapt the example Yihui Xie provides in his knitr Graphics Manual into a crude first version of a per...

Read more »

Floating Point Arithmetic and The Descent into Madness

April 13, 2012
By

While I should confess upfront that I’ve always had a weaker command of the details of floating point arithmetic than I feel I ought to have, this sort of thing still blows my mind when I stumble upon it. These moments invariably make me realize that floating point math will simply never satisfy my naive

Read more »

Case Study: Network visualization with data from a 360° feedback – often wasted potential!

April 13, 2012
By
Case Study: Network visualization with data from a 360° feedback – often wasted potential!

I assume that the reader of this paper knows the 360-degree method (also known as: multi-source feedback or management feedback). Reported is an authentic case. A total of 128 people participated as feedback receivers. Several thousand questionnaires were filled from … Weiterlesen →

Read more »

[not] Le Monde puzzle (solution)

April 13, 2012
By
[not] Le Monde puzzle (solution)

Following the question on dinner table permutations on StackExchange (mathematics) and the reply that the right number was six, provided by hardmath, I was looking for a constructive solution how to build the resolvable 2-(20,5,1) covering. A few hours later. hardmath again came up with an answer, found in the paper Equitable Resolvable Coverings by van

Read more »

R Statistics Mobile Console (iPhone)

April 13, 2012
By
R Statistics Mobile Console (iPhone)

I’m trying to make a mobile version of R-GUI, here is one (trying with iPhone emulator), not so beautiful, but still work. Try at CloudStat Mobile. Below are the screenshots: Homepage Web-based R Console Statistical Apps Directory ( R Apps ) ...

Read more »

RDieHarder 0.1.2

April 13, 2012
By
RDieHarder 0.1.2

RDieHarder is an R package providing access to the DieHarder battery of tests for random number generators developed by Robert G. Brown and others. DieHarder had been updated to version 3.1.1 a while back, and I had been a little behind with updating...

Read more »

R’s continued growth in academia

April 13, 2012
By
R’s continued growth in academia

Bob Muenchen has recently updated his report on the popularity of statistical software. With the updated analysis, we see that the R community remains as strong as ever: the number of contributed R packages continues its exponential growth rate, R maintains its dominance in online discussion, and has 20x the content of other statistics packages on social programming sites...

Read more »

CORRGRAM: Correlation Matrix (Wavelengths)

April 13, 2012
By
CORRGRAM: Correlation Matrix (Wavelengths)

With the "Corrgram" package we can see patterns that can help us to recognize possible inter-correlations in a big matrix. This could be the case to see the correlation to every wavelength respect to all others. This way we can see the high correlation...

Read more »

One app, three languages

April 13, 2012
By

This past week at work I had the opportunity to code the same algorithm using each of the three scientific programming/scripting languages I'm familiar with:MatlabPythonRThe list above is the order that the (re)-coding was done and serves as a beginnin...

Read more »

Surveys measure what people do, not what people think

April 13, 2012
By

In my previous post, I wrote about ways scale choice could distort the ways survey results portray the things they are supposed to measure. This certainly isn’t a new issue – researchers who use surveys often go to great lengths to ensure that their surveys are valid and reliable, which in this context usually means

Read more »

Oracle R Enterprise 1.1 Download Available

April 13, 2012
By

Oracle just released the latest update to Oracle R Enterprise, version 1.1. This release includes the Oracle R Distribution (based on open source R, version 2.13.2), an improved server installation, and much more.  The key new features include: ...

Read more »

Oracle R Distribution 2-13.2 Update Available

April 13, 2012
By

Oracle has released an update to the Oracle R Distribution, an Oracle-supported distribution of open source R. Oracle R Distribution 2-13.2 now contains the ability to dynamically link the following libraries on both Windows and Linux: The I...

Read more »