CLT Standard Normal Generator

April 2, 2010
By
CLT Standard Normal Generator

I’ve found this standard normal random number generator in a number of places, one of which being from one of Paul Wilmott’s books. The idea is that we can use the Central Limit Theorem (CLT) to easily generate values distributed according to a standard normal distribution by using the sum of 12 uniform random

Read more »

Lookup Performance in R

April 2, 2010
By

Rumor has it that Joe Adler, author of the O’Reilly Book R in a Nutshell, has joined Linked In as a data scientist.  But that does not keep him from still pumping out some interesting content over at OReilly.com. His latest article is about lookup performance in R. He does a great job giving code

Read more »

Opening Statements on Markov Chain Monte Carlo

April 1, 2010
By
Opening Statements on Markov Chain Monte Carlo

This quarter I am TAing UCLA’s Statistics 102C. Introduction to Monte Carlo Methods for Professor Qing Zhou. This course did not exist when I was an undergraduate, and I think it is pretty rare to teach Monte Carlo (minus the bootstrap if you count that) or MCMC to undergrads. I am excited about this class because to me, MCMC...

Read more »

Frank Harrell’s Regression Modeling Strategies Course Handouts

April 1, 2010
By

The previously mentioned Regression Modeling Strategies short course taught by Frank Harrell is nearly over. Here are the handouts (PDF) from the course. Keep an eye out here, I'll be writing a few more posts in the near future on topics Frank covered...

Read more »

Quantile LOESS – Combining a moving quantile window with LOESS (R function)

April 1, 2010
By
Quantile LOESS – Combining a moving quantile window with LOESS (R function)

In this post I will provide R code that implement’s the combination of repeated running quantile with the LOESS smoother to create a type of “quantile LOESS” (e.g: “Local Quantile Regression”). This method is useful when the need arise to fit robust and resistant (Need to be verified) a smoothed line for a quantile (an example for such a...

Read more »

Because it’s Thursday: Epidemiology of the Undead

April 1, 2010
By
Because it’s Thursday: Epidemiology of the Undead

Noted statistician Andrew Gelman has teamed up with occultist George Romero to address the most serious public-health threat of out time: Zombies. They've published a paper in the journal Biomastika, "How many zombies do you know?" to propose the use of indirect survey methods to measure outbreaks of the undead: Abstract: The zombie menace has so far been studied...

Read more »

Plots in R and the ImageJ visualization

April 1, 2010
By
Plots in R and the ImageJ visualization

If you plot data in R and you would like to display the same data in the ImageJ view it is necessary to transfer the data matrix to ImageJ. The first thing which can be noticed is that the image data is displayed rotated because of the Bio7 approach to transfer data forth and back

Read more »

abbreviating personality measures in R: a tutorial

March 31, 2010
By
abbreviating personality measures in R: a tutorial

A while back I blogged about a paper I wrote that uses genetic algorithms to abbreviate personality measures with minimal human intervention. In the paper, I promised to put the R code I used online, so that other people could download and use it. I put off doing that for a long time, because the

Read more »

Social Media Analytics Research Toolkit (SMART@znmeb) Is Moving Into Private Beta

March 31, 2010
By

Download "Getting Started with the Social Media Analytics Research Toolkit" (pdf, 1.25 megabytes) Download the Social Media Analytics Research Toolkit My Social Media Analytics Research Toolkit is about to move into private beta. What's in the release?...

Read more »

How ideological is Google?

March 31, 2010
By
How ideological is Google?

Adam Bonica, a grad student in political science at NYU, recently published a ranking of the political slant of various professions, based on the amount and recipient (Republican or Democratic) of political donations by lawyers, lobbyists, physicians and many other occupations. This paper (PDF) gives the complete analysis, but the chart below (created using the ggplot2 graphics package in...

Read more »

Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?

March 31, 2010
By
Why isn’t my 2X Ultra ETF keeping pace with the market and what is path asymmetry (R ex)?

I've been reading a few articles lately, lambasting ultra ETFs for not keeping up with markets and ascribing the problem to weird unexplainable reasons such as portfolio derivative re-balancing and negative drift. I thought it would be nice to revisit...

Read more »

Predicting April month return

March 31, 2010
By
Predicting April month return

Bespoke blogged about average monthly returns of the DJI and emphasized April. Before jumping on that information, let’s check some weak points. In that post, only average returns are presented. We need at least extreme points (min;max) and confidence ranges. Second problem – the normal market have upward trend and we need to get rid of

Read more »

Lotka-Volterra model ~ intro

March 30, 2010
By
Lotka-Volterra model ~ intro

So many know about the Lotka-Volterra model (i.e. the predator-prey model) in ecology. This model portrays two species, the predator (y) and the prey (x), interacting each other in limited space. The prey grows at a linear rate () and gets eaten by the predator at the rate of (). The predator gains a certain

Read more »

Some Code for Dumping Data from Twitter Gardenhose

March 30, 2010
By

Gardenhose is a Streaming API feed that continuously sends a sample (roughly 15% according to Ryan Sarver at the 140tc in September 2009) of all tweets to feed recipients. This is some code for dumping the tweets to files named by date and hour. It is in PHP which is not my favorite language, but works nonetheless. I received...

Read more »

TTR_0.20-2 on CRAN

March 30, 2010
By
TTR_0.20-2 on CRAN

An updated version of TTR is now on CRAN. It fixes a couple bugs and includes a couple handy tweaks. Here's the full contents of the CHANGES file:TTR version 0.20-2 Changes from version 0.20-1NEW FEATURES:Added VWAP and VWMA (thanks to Brian Peterson...

Read more »

Scientists misusing Statistics

March 30, 2010
By

In ScienceNews this month, there's controversial article exposing the fact that results claimed to be "statistically significant" in scientific articles aren't always what they're cracked up to be. The article -- titled "Odds Are, It's Wrong" is interesting, but I take a bit of an issue with the sub-headline, "Science fails to face the shortcomings of Statistics". As it...

Read more »

Example 7.30: Simulate censored survival data

March 30, 2010
By
Example 7.30: Simulate censored survival data

To simulate survival data with censoring, we need to model the hazard functions for both time to event and time to censoring. We simulate both event times from a Weibull distribution with a scale parameter of 1 (this is equivalent to an exponential ra...

Read more »

Smoothing time series with R

March 29, 2010
By
Smoothing time series with R

Smoothing is a statistical technique that helps you to spot trends in noisy data, and especially to compare trends between two or more fluctuating time series. It's a useful visualization tool that I'm pleased to see cropping up more and more in statistical graphics on the Web -- it's now a staple in econometric charts and is heavily used...

Read more »

Looking for Software Paths in Windows Registry

March 28, 2010
By
Looking for Software Paths in Windows Registry

hen we want to call external programs in R under Windows, we often need to know the paths of these programs. For instance, we may want to know where ImageMagick is installed, as we need the convert (convert.exe) utility to convert images to other formats, or where OpenBUGS is installed because we need this path

Read more »

Example 7.29: Bubble plots colored by a fourth variable

March 27, 2010
By
Example 7.29: Bubble plots colored by a fourth variable

In Example 7.28, we generated a bubble plot showing the relationship among CESD, age, and number of drinks, for women. An anonymous commenter asked whether it would be possible to color the circles according to gender. In the comments, we showed simp...

Read more »

Finance::YahooQuote 0.24

March 26, 2010
By

Having espoused rule number one in regression testing in the post about yesterday's bug fix upload 0.23, we can now add rule number zero: Do not introduce a new error by omitting the trailing semicolon. I guess it shows that I don't really program in...

Read more »

Finance::YahooQuote 0.24

March 26, 2010
By

Having espoused rule number one in regression testing in the post about yesterday's bug fix upload 0.23, we can now add rule number zero: Do not introduce a new error by omitting the trailing semicolon. I guess it shows that I don't really program in P...

Read more »

Finance::YahooQuote 0.24

March 26, 2010
By

Having espoused rule number one in regression testing in the post about yesterday's bug fix upload 0.23, we can now add rule number zero: Do not introduce a new error by omitting the trailing semicolon. I guess it shows that I don't really program in...

Read more »

Rcpp 0.7.11

March 26, 2010
By

A new versions 0.7.11 of Rcpp is awaiting inclusion into CRAN and Debian. It is also available from here. This version fixes a somewhat serious bug uncovered by Doug Bates when working with vectors of strings. We also added a few new accessor function...

Read more »

Rcpp 0.7.11

March 26, 2010
By

A new versions 0.7.11 of Rcpp is awaiting inclusion into CRAN and Debian. It is also available from here. This version fixes a somewhat serious bug uncovered by Doug Bates when working with vectors of strings. We also added a few new accessor func...

Read more »

‘R’ = dna.translate("AGG") . A custom C function for R, My notebook.

March 26, 2010
By

In the following post, I will show how I've implemented a custom C function for R. This C function will translate a DNA to a protein. I'm very new to 'R' so feel free to make any comment about the code.C codeThe data in 'R' are stored in an opaque stru...

Read more »

‘R’ = dna.translate("AGG") . A custom C function for R, My notebook.

March 26, 2010
By

In the following post, I will show how I've implemented a custom C function for R. This C function will translate a DNA to a protein. I'm very new to 'R' so feel free to make any comment about the code.C codeThe data in 'R' are stored in an opaque stru...

Read more »

Code Highlights in WordPress

March 26, 2010
By

I’ve come across a very useful plugin for WordPress which highlights code in posts using GeSHi called WP-Syntax. This plugin is easy to use and adds highlights simply by putting the appropriate tags around code blocks. For instance, we can make the following R code much more readable by using WP-Syntax. ## Generate 100

Read more »

Predicting Pizza

March 26, 2010
By
Predicting Pizza

What's the secret to the best pizza in New York? That's what statistical consultant and R user Jared Lander sought to find out, by analyzing the rankings of NY pizza joints at MenuPages.com, and building a regression model for ratings based on variables like localion, price, number of reviews, and pizza-oven type (gas, coal or wood)? Here's a scatterplot...

Read more »