Ripley Facts

June 10, 2013
By

Normally, this blog would only contain technical and scientific related posts. But this time I would like to share with you a very interesting phenomenon I came across on the R mailing list(s). I call it 'Ripley Facts' after the prolific statistician, ...

Read more »

Introduction to stable distributions for finance

June 10, 2013
By
Introduction to stable distributions for finance

A few basics about the stable distribution. Previously “The distribution of financial returns made simple” satirized ideas about the statistical distribution of returns, including the stable distribution. Origin As “A tale of two returns” points out, the log return of a long period of time is the sum of the log returns of the shorter … Continue reading...

Read more »

Measures of Absolute Variability

June 10, 2013
By

Measures of absolute variability deal with the dispersion of the data points. This include the following:Range - rangeInterquartile Range - IQRQuartile DeviationAverage DeviationStandard Deviation - sdThese measures of variability restrict to uniform u...

Read more »

Sobol Sensitivity Analysis

June 10, 2013
By
Sobol Sensitivity Analysis

Sensitivity analysis is the task of evaluating the sensitivity of a model output Y to input variables (X1,…,Xp). Quite often, it is assumed that this output is related to the input through a known function f :Y= f(X1,…,Xp). Sobol indices are generalizing the coefficient of the coefficient of determination in regression. The ith first order indice is the proportion of...

Read more »

You Do Not Need to Tell Me I Have A Typo in My Documentation

June 10, 2013
By
You Do Not Need to Tell Me I Have A Typo in My Documentation

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for y...

Read more »

Using Metadata to find Paul Revere

June 9, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty's subjects. This is in connection with the discussion of the role of "metadata" in

Read more »

Why are Birds Dinosaurs?

June 9, 2013
By
Why are Birds Dinosaurs?

Month after month, one of the most popular posts on the Paleocave blog is the How to Read a Cladogram post I did some time ago. I always intended to follow it up with more cladistic fun. So, hold onto your butts, we’re going to let the dinosaurs loose. Birds are dinosaurs. We’ve all heard

Read more »

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

June 9, 2013
By
Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1) The ‎ National Day of Civic Hacking took place …Continue reading »

Read more »

Improve The Efficiency in Joining Data with Index

June 9, 2013
By
Improve The Efficiency in Joining Data with Index

When managing big data with R, many people like to use sqldf() package due to its friendly interface or choose data.table() package for its lightening speed. However, very few would pay special attentions to small details that might significantly boost the efficiency of these packages by adding index to the data.frame or data.table. In my

Read more »

Mahout for R Users

June 9, 2013
By
Mahout for R Users

I have a few posts coming up on Apache Mahout so I thought it might be useful to share some notes. I came at it as primarily an R coder with some very rusty Java and C++ somewhere in the back of my head so that will be my point of reference. I’ve also included … Continue reading...

Read more »

How to read quickly large dataset in R?

June 9, 2013
By

Medal Allocations at the Comrades Marathon

June 9, 2013
By
Medal Allocations at the Comrades Marathon

Following up on my previous post regarding attrition rates at Comrades Marathon 2013, here are the statistics I have gathered for medal allocations. There is some interesting history behind the Comrades Marathon medals. For reference, the medals are allocated as follows: Gold medals to the first ten finishers in the men’s race and the ladies’ race;

Read more »

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Introduction Recently, I began a series on exploratory data analysis; so far, I have written about computing descriptive statistics and creating box plots in R for a univariate data set with missing values.  Today, I will continue this series by analyzing the same data set with kernel density estimation, a useful non-parametric technique for visualizing

Read more »

Quartiles, Deciles, and Percentiles

June 9, 2013
By

The measures of position such as quartiles, deciles, and percentiles are available in quantile function. This function has a usage,where:x - the data pointsprob - the location to measurena.rm - if FALSE, NA (Not Available) data points are not ignoredna...

Read more »

Estimating Finite Mixture Models with Flexmix Package

June 9, 2013
By
Estimating Finite Mixture Models with Flexmix Package

In my post on 06/05/2013 (http://statcompute.wordpress.com/2013/06/05/estimating-composite-models-for-count-outcomes-with-fmm-procedure), I’ve shown how to estimate finite mixture models, e.g. zero-inflated Poisson and 2-class finite mixture Poisson models, with FMM and NLMIXED procedure in SAS. Today, I am going to demonstrate how to achieve the same results with flexmix package in R. R Code R Output for 2-Class Finite Mixture

Read more »

Quick and Simple D3 Network Graphs from R

June 8, 2013
By
Quick and Simple D3 Network Graphs from R

Sometimes I just want to quickly make a simple D3 JavaScript directed network graph with data in R. Because D3 network graphs can be manipulated in the browser–i.e. nodes can be moved around and highlighted–they're really nice for data exploration. They're also really nice in HTML presentations. So I put together a...

Read more »

Mean and Median

June 8, 2013
By

Mean in R is computed using the function mean. Consider the scores of 20 MSU-IIT students in Stat 101 exam with a hundred items: 70, 78, 66, 65, 50, 53, 48, 88, 95, 80, 85, 84, 81, 63, 68, 73, 75, 84, 49, and 77. Compute and interpret the mean and medi...

Read more »

Bulk search for domain names using R

June 8, 2013
By

# There are several domain name servers that allow # for bulk searching of domain names.# http://www.godaddy.com/bulk-domain-search.aspx# http://www.namestation.com/bulk-domain-search# However, they do not provide any wildcard support # and instead exp...

Read more »

Matrix Operations

June 8, 2013
By

Matrix manipulation in R are very useful in Linear Algebra. Below are lists of common yet important functions in dealing operations with matrices:Transpose - tMultiplication - %*%Determinant - detInverse - solve, or ginv of MASS libraryEigenvalues and ...

Read more »

R and MongoDB

June 7, 2013
By
R and MongoDB

MongoDB is a document-based noSQL database. Different from the relational database storing data in tables with rigid schemas, MongoDB stores data in documents with dynamic schemas. In the demonstration below, I am going to show how to extract data from a MongoDB with R. Before starting the R session, we need to install the MongoDB

Read more »

Hey, I Just did a Significance Test!

June 7, 2013
By
Hey, I Just did a Significance Test!

I’ve seen it happens quite often. The sig test. Somebody simply needs to know the p-value and that one number will provide all of the information about the study that they need to know. The dataset is presented and the client/boss/colleague/etc invariably asks the question “is it significant?” and “what’s the correlation?”. To quote R.A.

Read more »

Robust logistic regression

June 7, 2013
By

Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers expected in the The post Robust...

Read more »

Crayfish or crawdad? Mapping US dialect variations with R

June 7, 2013
By
Crayfish or crawdad? Mapping US dialect variations with R

I grew up in Australia, where I learned to speak English. Or so I thought: when I moved overseas to the UK, and especially when I moved to the States, I soon learned these are distinct cultures separated by a common language. Words which I previously had no context for being different anywhere else, such as "runners" ("sneakers"), "lemonade"...

Read more »

The Rcpp Book is now shipping

My book about Rcpp (and its R and C++ integration) is now available from Springer. Amazon still lists it as not-yet-released; I expect this to change in the next few days.

Read more »

Happy Birthday rasterVis!

Happy Birthday rasterVis!

Two years ago the first version of rasterVis was submitted to R-Forge and some weeks after the first stable version was …Continuar leyendo »

Read more »

A Shiny App Goes Viral

June 7, 2013
By
A Shiny App Goes Viral

I am not sure how many of you have seen this Business Insider article.  It is basically about a shiny app created by Joshua Katz as NC State.  It is really fun playing with shiny app.With nearly a million facebook likes this web app buil...

Read more »

Income Distribution in London

June 7, 2013
By
Income Distribution in London

Inspired by the Institute of Fiscal Studies' "Where do you fit in" application, where people can find out their position in the UK's income distribution, I wanted to find out how the picture in London looks like. Quite different. If you are in a very high percentile nationwide, high incomes of mainly financial sector employees in London...

Read more »

Symmetric set differences in R

June 7, 2013
By

My .Rprofile contains a collection of convenience functions and function abbreviations. These are either functions I use dozens of times a day and prefer not to type in full:## my abbreviation of head() h Or problems that I'd rather figure out once, and only once: ## example: ## between( 1:10, 5.5, 6.5 ) between = low & x low & x...

Read more »

Comrades Marathon Attrition Rate

June 7, 2013
By
Comrades Marathon Attrition Rate

It is a bit of a mission to get the complete data set for this year’s Comrades Marathon. The full results are easily accessible, but come as an HTML file. Embedded in this file are links to the splits for individual athletes. So with a bit of scripting wizardry it is also possible to download

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.