the little half (another Le Monde puzzle)

October 27, 2012
By
the little half (another Le Monde puzzle)

I found this Le Monde puzzle of June 16 I had stored and then somehow forgotten with my trips to Japan and Australia: There are n beans in a box, with 98≤n≤102). Two players take at each round either one bean from the box or “the little half” (i.e. the integral part of the half)

Read more »

Characterizing a new dataset

In my last post, I promised a further examination of the spacing measures I described there, and I still promise to do that, but I am changing the order of topics slightly.  So, instead of spacing measures, today’s post is about the DataframeSummary procedure to be included in the ExploringData package, which I also mentioned in my last post...

Read more »

Feature selection and linear modeling

October 27, 2012
By

(This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming...

Read more »

.Rhistory

October 27, 2012
By

Over the last couple of years I've been using R every now and then. When I stumbled upon an interesting topic, and I managed to get a hold of a data set, I tried to make sense of it using R.It's a bit like the Stat Labs approach: I might get started by...

Read more »

Book Review: R for Business Analytics, A Ohri

October 26, 2012
By
Book Review:    R for Business Analytics,    A Ohri

      I've added a recently released book to my list of recommendations (at the amazon carousel to the right), as I've reviewed a copy provided to me via Springer Publishers. The book is R for Business Analytics, authored by A Ohri.&nbsp...

Read more »

A Greedy ARMA/GARCH Model Selection

October 26, 2012
By

An idea that I have been toying for a while, has been to study the effect of a domain-specific optimization strategy in the ARMA+GARCH models. If you recall from this long tutorial, the implemented approach cycles through all models within a the specified ranges for the parameters and chooses the best model based on the

Read more »

R 2.15.2 now available

October 26, 2012
By

As promised, the source distribution for R 2.15.2 is now available for download from the master CRAN repository. (Binary distributions for Windows, MacOS and Linux will be available from the CRAN mirror network in the coming days.) This latest point-update — codenamed "Trick or Treat" — improves the performance of the R engine and adds a few minor but...

Read more »

Chris Hamm on using plot.new() for better combined plots

October 26, 2012
By
Chris Hamm on using plot.new() for better combined plots

At DRUG today, Chris Hamm (email (cahamm at ucdavis dot edu)) showed us an easier way to combine multiple figures into one plot using plot.new, rather than par(mfrow=...) Here’s his script: A Report Generated by knitr # plot.new() [email protected] #I discovered this plotting method when trying to add an inset figure # to a plot # plot.new is part of the traditional graphics....

Read more »

Javascript and D3 for R users, part 2: running off the R server instead of Python

October 26, 2012
By

Thank you all for the positive responses to Basics of JavaScript and D3 for R Users! Quick update: last time we had to dabble in a tiny bit of Python to start a local server, in order to actually run JavaScript … Continue reading →

Read more »

Plotting correlation ellipses

October 26, 2012
By
Plotting correlation ellipses

This is an oldie but a goodie. There are a lot of ways to plot multiple bivariate relationships, but this is one of my favorites, courtesy of the R Graph Gallery. https://gist.github.com/819111

Read more »

NSCB Sexy Stats Version 2

October 25, 2012
By
NSCB Sexy Stats Version 2

This was a revised version of my previous post about the NSCB article. With the suggestion from Tal Galili, below were the new pie charts and the R codes to produce these plots by directly scrapping the data from the webpage using XML and RColorBrewer ...

Read more »

Using FAFSA Data to study Competitors – Part 2

October 25, 2012
By
Using FAFSA Data to study Competitors – Part 2

I wanted to build upon my previous post and dive a little deeper into the sorts of questions we can answer using the FAFSA data supplied to us by applicants. As a quick overview, students completing the FAFSA for student aid can list up to ten institutions on the form. I consider this the student’s

Read more »

Modeling Couch Potato strategy

October 25, 2012
By
Modeling Couch Potato strategy

I first read about the Couch Potato strategy in the MoneySense magazine. I liked this simple strategy because it was easy to understand and easy to manage. The Couch Potato strategy is similar to the Permanent Portfolio strategy that I have analyzed previously. The Couch Potato strategy invests money in the given proportions among different

Read more »

Accelerating R code: Computing Implied Volatilities Orders of Magnitude Faster

October 25, 2012
By

This blog, together with Romain's, is one of the main homes of stories about how Rcpp can help with getting code to run faster in the context of the R system for statistical programming and analysis. By making it easier to get already existing C or C++ code to R, or equally to extend R with new C++...

Read more »

My Goodness. What a Fat Dataset!

October 25, 2012
By
My Goodness.  What a Fat Dataset!

Recently at work we got sent a data file containing information on donations to a specific charitable organization, ranging all the way back to the 80′s.  Usually, when we receive a dataset with a donation history in it, each row … Continue reading →

Read more »

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

October 25, 2012
By
Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop cluster Using open-source R (both on the full data...

Read more »

Notes on a Scandal – When Jimmy beat Katy

October 25, 2012
By
Notes on a Scandal  – When Jimmy beat Katy

No the title doesn’t refer to how Katy Perry suffered at another of Jimmy Savile’s sexual predelictions, although these are two of  the participants. I’ll get to the details later Just over a year ago, I reflected on the relative wiki searches of leading female singing celebrities, including Ms Perry. In the light of the

Read more »

Palettes in R

October 25, 2012
By
Palettes in R

In its simplest form, a palette in R is simply a vector of colors. This vector can be include the hex triplet or R color names.The default palette can be seen through palette(): > palette("default") # you'll only need this line if you've previ...

Read more »

NSCB Sexy Statistics (Unemployment)

October 25, 2012
By
NSCB Sexy Statistics (Unemployment)

Recently, my friend posted on her Facebook account about the article published by the National Statistical Coordination Board (NSCB) about poverty and unemployment in the country.  Looking at the report I saw a lot of tables, so I thought why not ...

Read more »

How fat are your tails?

October 25, 2012
By
How fat are your tails?

Lately I’ve been thinking about how to measure the fatness of the tails of a distribution. After some searching, I came across the Pareto Tail Index method. This seems to be used mostly in economics. It works by finding the decay rate of the tail. It’s complicated, both in formula and in it’s R implementation

Read more »

Congressional ideology by state

October 25, 2012
By
Congressional ideology by state

In a recent post, I illustrated how to add a background geom to your ggplot. While that code worked, and the plot looked fine, it was pointed out to me that I was missing an important aspect of plot layering with ggplot2. Namely, it is not, as I previ...

Read more »

R function: generate a panel data.table or data.frame to fill with data

October 25, 2012
By

I have started to work with R and STATA together. I like running regressions in STATA, but I do graphs and setting up the dataset in R. R clearly has a strong comparative advantage here compared to STATA. I was writing a function that will give me a (balanced) panel-structure in R. It then simply

Read more »

Rcpp modules more flexible

October 25, 2012
By

Rcpp modules just got more flexible (as of revision 3838 of Rcpp, to become 0.9.16 in the future). modules have allowed exposing C++ classes for some time now, but developpers had to declare custom wrap and as specializations if they wanted their classes to be used as return type or argument type of a C++ function or method....

Read more »

Nonnegative Matrix Factorization and Recommendor Systems

October 24, 2012
By
Nonnegative Matrix Factorization and Recommendor Systems

Albert Au Yeung provides a very nice tutorial on non-negative matrix factorization and an implementation in python. This is based very loosely on his approach. Suppose we have the following matrix of users and ratings on movies:If we use the information above to form a matrix R it can be decomposed into two matrices...

Read more »

Quick notes from Strata NYC 2012

October 24, 2012
By

The O'Reilly Strata conferences are always great fun to attend, and this latest installment in New York City is no exception. This one is super-busy though; the conference has been sold out for weeks -- and not just marketing-sold-out, it's fire-department-sold out. It's non-stop conversations and presentations, and it's tough to move through the hallways in between. Nonetheless, I...

Read more »

R for Ecologists: Permutation Analysis – t-tests

October 24, 2012
By
R for Ecologists: Permutation Analysis – t-tests

You’ve carefully designed your experiment, you’ve meticulously collected your data, and you have a hypothesis to test. Unfortunately, your data is typical of ecology data: small sample sizes, messy, and non-normal. Your ideal test, the t-test, won’t work because of the … Continue reading →

Read more »

Plotting the debate "Winner"

October 24, 2012
By
Plotting the debate "Winner"

As a Political Scientist, it could not be more gauche to talk about the Presidential debate in terms of a winner and a loser, but the occasion provides the opportunity to show how to do (at least) three really useful things: Directly load price and v...

Read more »

Displaying Your Data in Google Earth Using R2G2

October 24, 2012
By
Displaying Your Data in Google Earth Using R2G2

Have you ever wanted to easily visualize your ecology data in Google Earth? R2G2 is a new package for R, available via R CRAN and formally described in this Molecular Ecology Resources article, which provides a user-friendly bridge between R and the Google Earth interface. Here, we will provide a brief introduction to...

Read more »

Stan for Bayesian Analysis

October 23, 2012
By
Stan for Bayesian Analysis

Bayesian analysis has been growing in popularity among ecologists recently, largely due to accessible books such as Models for Ecological Data: An Introduction, Introduction to WinBUGS for Ecologists, and Bayesian Methods for Ecology. Most ecologists with limited programming background have … Continue reading →

Read more »

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.