Biostatisticians… beware of fuzzy researchers!

June 11, 2013
By
Biostatisticians… beware of fuzzy researchers!

In the last days I was thinking about about how researchers could collaborate efficiently with their experts in statistics. With the increasing complexity in science, interchanging information can be crucial to get the best results. But what happens when a … Sigue leyendo →

Read more »

Thursday: Webinar on video game analytics

June 11, 2013
By

Video games are big business today: Electronic Arts (EA) generated more than 4 billion dollars in revenue last year, and they're not even the biggest player on the block. In addition to big bucks, video games also generate Big Data: 50 terabytes per day at EA alone. So there's an obvious need to apply predictive analytics to these massive...

Read more »

R2leaflet (v0.1) – make interactive online maps from R

June 11, 2013
By
R2leaflet (v0.1) – make interactive online maps from R

I have been working on a simple R function to take latitude and longitude of points of interest, and text for pop-up labels, and produce an interactive online map. Interactive graphics are incredibly useful in getting people interested in your … Continue reading →

Read more »

Finally! Tracking CRAN packages downloads

June 11, 2013
By
Finally! Tracking CRAN packages downloads

The guys from RStudio now provide CRAN download logs (see also this blog post). Great work! I always asked myself, how many people actually download my packages. Now I finally can

Read more »

R package development

June 11, 2013
By
R package development

Building R packages is not particular hard, but it can be a bit of a daunting endeavour at the beginning, particularly if you are more of a statistician than a computer scientist or programmer. Some concepts may appear foreign or like red tape, yet man...

Read more »

Measures of Skewness and Kurtosis

June 10, 2013
By
Measures of Skewness and Kurtosis

Skewness and kurtosis in R are available in the moments package (to install an R package, click here), and these are:Skewness - skewnessKurtosis - kurtosisExample 1. Mirra is interested in the elapse time (in minutes) she spends on riding a tricycle fr...

Read more »

Scenario analysis for option strategies Pt. 2

June 10, 2013
By

No vivid improvements since the last post. However, I got it to the stage, when I can share the code and let you try it yourself: https://github.com/afraid2trade/SCENARIO_ANALYSIS.git Once you downloaded it, the only thing you need to open is "sa_work...

Read more »

Safe Loading of RData Files

June 10, 2013
By
Safe Loading of RData Files

Unless you have configured R not to ask, every time you close R or RStudio you are prompted to save your workspace. This saves an RData file to the working directory. The functions save.image() and save() offer a little more … Continue reading →

Read more »

Microsoft Office Metadata with R

June 10, 2013
By

Sometimes I need to retrieve various items of metadata from Microsoft Office files. For the 'old-style' (i.e. '.doc' and '.xls') files perhaps a solution in python, such as hachoir, was the best way to extract this data from the ole2 file format -...

Read more »

Bringing R to the Enterprise – new white paper available

June 10, 2013
By

Check out this new white paper entitled "Bringing R to the Enterprise -  A Familiar R Environment with Enterprise-Caliber Performance, Scalability, and Security." In this white paper, we begin with "Beyond the Laptop" exploring the ability to run R code in the database, working with CRAN packages at the database server, operationalizing R analytics, and...

Read more »

In case you missed it: May 2013 Roundup

June 10, 2013
By

In case you missed them, here are some articles from May of particular interest to R users: Billions of geotagged Tweets create a beautiful map of the world when plotted with the ggmap package. A review of Ryan Sheftel's talk at R/Finance, on how he uses R on the trading desk at Credit Suisse. Also, a quick take on...

Read more »

Where is the R Activity?

June 10, 2013
By
Where is the R Activity?

R has become one of the world’s most widely used

Read more »

The RStudio CRAN mirror

June 10, 2013
By
The RStudio CRAN mirror

RStudio maintains its own CRAN mirror, http://cran.rstudio.com. The server itself is a virtual machine run by Amazon’s EC2 service, and it syncs with the main CRAN mirror in Austria once per day. When you contact http://cran.rstudio.com, however, you’re probably not talking to our CRAN mirror directly. That’s because we use Amazon CloudFront, a content delivery

Read more »

Running time

June 10, 2013
By
Running time

Marta and I are doing some re-analysis of our Eurovision contest (some context here and here). We have slightly modified our original model (mostly, I have navigated the mess in Marta's notation $-$ it's OK: I'm not at risk of her mighty wrath, as I've...

Read more »

Le Monde puzzle [#822]

June 10, 2013
By
Le Monde puzzle [#822]

For once Le Monde math puzzle is much more easily solved on a piece of paper than in R, even in a plane from Roma: Given a partition of the set {1,…,N} in k groups, one considers the collection of all subsets of  the set {1,…,N} containing at least one element from each group. Show

Read more »

Measure of Relative Variability

June 10, 2013
By

The measure of relative variability is the coefficient of variation (CV). Unlike measures of absolute variability, the CV is unitless when it comes to comparisons between the dispersions of two distributions of different units of measurement. In R, CV ...

Read more »

Ripley Facts

June 10, 2013
By

Normally, this blog would only contain technical and scientific related posts. But this time I would like to share with you a very interesting phenomenon I came across on the R mailing list(s). I call it 'Ripley Facts' after the prolific statistician, ...

Read more »

Introduction to stable distributions for finance

June 10, 2013
By
Introduction to stable distributions for finance

A few basics about the stable distribution. Previously “The distribution of financial returns made simple” satirized ideas about the statistical distribution of returns, including the stable distribution. Origin As “A tale of two returns” points out, the log return of a long period of time is the sum of the log returns of the shorter … Continue reading...

Read more »

Measures of Absolute Variability

June 10, 2013
By

Measures of absolute variability deal with the dispersion of the data points. This include the following:Range - rangeInterquartile Range - IQRQuartile DeviationAverage DeviationStandard Deviation - sdThese measures of variability restrict to uniform u...

Read more »

Sobol Sensitivity Analysis

June 10, 2013
By
Sobol Sensitivity Analysis

Sensitivity analysis is the task of evaluating the sensitivity of a model output Y to input variables (X1,…,Xp). Quite often, it is assumed that this output is related to the input through a known function f :Y= f(X1,…,Xp). Sobol indices are generalizing the coefficient of the coefficient of determination in regression. The ith first order indice is the proportion of...

Read more »

You Do Not Need to Tell Me I Have A Typo in My Documentation

June 10, 2013
By
You Do Not Need to Tell Me I Have A Typo in My Documentation

So I just got yet yet another comment saying "you have a typo in your documentation". While I do appreciate these kind reminders, I think it might be a good exercise for those who want to try GIT and Github pull requests, which make it possible for y...

Read more »

Using Metadata to find Paul Revere

June 9, 2013
By
Using Metadata to find Paul Revere

London, 1772. I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty's subjects. This is in connection with the discussion of the role of "metadata" in

Read more »

Why are Birds Dinosaurs?

June 9, 2013
By
Why are Birds Dinosaurs?

Month after month, one of the most popular posts on the Paleocave blog is the How to Read a Cladogram post I did some time ago. I always intended to follow it up with more cladistic fun. So, hold onto your butts, we’re going to let the dinosaurs loose. Birds are dinosaurs. We’ve all heard

Read more »

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

June 9, 2013
By
Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1)

Better Neighborhoods with R: Exploring and Analyzing SeeClickFix Data (part 1) The ‎ National Day of Civic Hacking took place …Continue reading »

Read more »

Improve The Efficiency in Joining Data with Index

June 9, 2013
By
Improve The Efficiency in Joining Data with Index

When managing big data with R, many people like to use sqldf() package due to its friendly interface or choose data.table() package for its lightening speed. However, very few would pay special attentions to small details that might significantly boost the efficiency of these packages by adding index to the data.frame or data.table. In my

Read more »

Mahout for R Users

June 9, 2013
By
Mahout for R Users

I have a few posts coming up on Apache Mahout so I thought it might be useful to share some notes. I came at it as primarily an R coder with some very rusty Java and C++ somewhere in the back of my head so that will be my point of reference. I’ve also included … Continue reading...

Read more »

How to read quickly large dataset in R?

June 9, 2013
By

Medal Allocations at the Comrades Marathon

June 9, 2013
By
Medal Allocations at the Comrades Marathon

Following up on my previous post regarding attrition rates at Comrades Marathon 2013, here are the statistics I have gathered for medal allocations. There is some interesting history behind the Comrades Marathon medals. For reference, the medals are allocated as follows: Gold medals to the first ten finishers in the men’s race and the ladies’ race;

Read more »

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Exploratory Data Analysis: Kernel Density Estimation in R on Ozone Pollution Data in New York and Ozonopolis

Introduction Recently, I began a series on exploratory data analysis; so far, I have written about computing descriptive statistics and creating box plots in R for a univariate data set with missing values.  Today, I will continue this series by analyzing the same data set with kernel density estimation, a useful non-parametric technique for visualizing

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.