R code to obtain and plot rainfall data for the whole world

If you want to create rainfall maps for the whole world in R there is no readily available code or package to do this. Moreover, data publicly available from research institutions is not generally in plain text format or other familiar formats. Hydrological and climatological studies sometimes require rainfall data over the entire world for long periods of time....

Read more »

Exploratory Data Analysis – Computing Descriptive Statistics in R for Data on Ozone Pollution in New York City

Exploratory Data Analysis – Computing Descriptive Statistics in R for Data on Ozone Pollution in New York City

Introduction This is the first of a series of posts on exploratory data analysis (EDA).  This post will calculate the common summary statistics of a univariate continuous data set – the data on ozone pollution in New York City that is part of the built-in “airquality” data set in R.  This is a particularly good data set

Read more »

Update to PSID panel builder for R: psidR

May 19, 2013
By

I just pushed the most recent version of the PSID panel data builder introduced a little while ago. Got some user feedback and made some improvements. The package is hosted on github.News:I added a reproducible example using artificial data which you c...

Read more »

Conversion between Factor and Dummies in R

May 18, 2013
By
Conversion between Factor and Dummies in R

Read more »

Sharing my R notes

May 18, 2013
By
Sharing my R notes

I started working with R 2 1/2 years ago. I remember opening R closing it and thinking it was the dumbest thing ever (command line to a non programmer is not inviting). Now it’s my constant friend. From the beginning … Continue reading →

Read more »

Using gdata, for MS Windows users

May 18, 2013
By
Using gdata, for MS Windows users

I use both GNU-Linux and Windows systems on a regular basis… so I’m aware of the advantages (more for GNU-Linux in my case) and disadvantages of both. Recently I needed to analyse a database from a remote location, an Excel … Sigue leyendo →

Read more »

R (Web Server) Solutions – Amplifying Artichokes

May 18, 2013
By
R (Web Server) Solutions – Amplifying Artichokes

Every month I see one or more new R based web server solutions coming into the market, sight seeing some of them thought of sharing one of my old architecture map manifested to the client long back in early 2009 (good to see quick spreading of scalable...

Read more »

What is probabilistic truth?

May 18, 2013
By
What is probabilistic truth?

I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outcomes that can take on either of two possible states (eg Dead/not dead, heads/tails, cat in picture/no cat in picture, etc.) The most commonly used metric for this class of models is AUC, which assesses the

Read more »

Recent Changes to caret

May 18, 2013
By

Here is a summary of some recent changes to caret. Feature Updates: train was updated to utilize recent changes in the gbm package that allow for boosting with three or more classes (via the multinomial distribution) The Yeo-Johnson power transformation was added. This is very similar to the Box-Cox transformation, but it does not require the data to be...

Read more »

Mining the last French presidential debate

May 18, 2013
By
Mining the last French presidential debate

After reading this post (thanks to him), I think it could be interesting to replicate this with some specific up of french language and to see and we can perform rapid view of the debate between Sarkozy and Hollande of the last 2nd round of presidentia...

Read more »

Bubble sort tuning

May 18, 2013
By

I was reading Paul Hiemsta's blogpost on Much more efficient bubble sort in R using the Rcpp and inline packages, went back to his first post  Bubble sort implemented in pure R and thought, surely we can do it better in pure R. So I...

Read more »

Interfacing XTide and R

May 17, 2013
By

XTide is an open-source program that predicts tide heights and current speeds for hundreds of tide and current stations around the United States. It can be used to produce tide predictions in the past and future for a site at your chosen interval (down...

Read more »

Unit conversion in R

May 17, 2013
By

Last weekend I submitted an update of my R package datamart to CRAN. It has been more than a half year since the last update, however there are only minor advances. The package is still in its early stages, and very experimental.One new feature is the function uconv. Think iconv, but instead of converting character vectors between different encodings,...

Read more »

Chutes & ladders: How long is this going to take?

May 17, 2013
By
Chutes & ladders: How long is this going to take?

I was playing Chutes & Ladders with my four-year-old daughter yesterday, and I thought, “How long is this going to take?” I saw an interesting mathematical analysis of the game a few years ago, but it seems to be offline, though you can read it via the wayback machine. But that didn’t answer my specific

Read more »

Which Torontonians Want a Casino? Survey Analysis Part 2

May 17, 2013
By
Which Torontonians Want a Casino?  Survey Analysis Part 2

In my last post I said that I would try to investigate the question of who actually does want a casino, and whether place of residence is a factor in where they want the casino to be built.  So, here … Continue reading →

Read more »

R 3.0.1 released

May 17, 2013
By

The R core group has quickly followed up with a patch to R version 3. Announced yesterday, R 3.0.1 (code name: "Good Sport") improves serialization performance with big objects, improves reliability for parallel programming and fixes a few minor bugs. (You can find the complete list of changes in the NEWS file.) The source distribution and Windows and Linux...

Read more »

Revolution Newsletter: May 2013

May 17, 2013
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full May edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Gaming Analytics FTW! Join us on 13Jun13 at 10:00 AM PDT for our webinar...

Read more »

Strategic Zombie Simulation – Animation

May 17, 2013
By
Strategic Zombie Simulation – Animation

# Escape Zombie Land! # This is a simulation an escape from a hot zombie zone. It freezes and gives an error if you get get killed so you had best not. You attempt to navigate the zone by constructing waypoints. # This is not a very clean s...

Read more »

Innovation Will Never Be At The Push Of A Button

May 17, 2013
By

@randyzwitch @benjamingaines @usujason I am envisioning the data science equivalent of an autonomous vehicle pileup. — Todd Belcher (@toddmetrics) May 16, 2013   Recently, I’ve been getting my blood pressure up reading (marketing) articles about “big data” and “data science”.  What saddens me about the whole discussion is that there is the underlying premise that Innovation Will Never...

Read more »

Preferential attachment applied to frequency of accessing a variable

May 17, 2013
By
Preferential attachment applied to frequency of accessing a variable

If, when writing code for a function, up to the current point in the code distinct local variables have been accessed for reading times (), will the next read access be from a previously unread local variable and if not what is the likelihood of choosing each of the distinct variables (global variables are ignored

Read more »

Analyzing a simple experiment with heterogeneous variances using asreml, MCMCglmm and SAS

May 17, 2013
By
Analyzing a simple experiment with heterogeneous variances using asreml, MCMCglmm and SAS

I was working with a small experiment which includes families from two Eucalyptus species and thought it would be nice to code a first analysis using alternative approaches. The experiment is a randomized complete block design, with species as fixed effect and family and block as a random effects, while the response variable is growth

Read more »

Finding patterns in time series using regular expressions

May 17, 2013
By
Finding patterns in time series using regular expressions

Regular expressions are a fantastic tool when you’re looking for patterns in time series. I wish I’d realised that sooner. Here’s a timely example: traditionally, when you have two successive quarters of negative GDP growth, you’re in recession. We have a quarterly GDP time series for Australia, and we want to know how many recessions

Read more »

Analyze More, Program Less: A Webinar about Using SciDB for Computational Finance

May 16, 2013
By

Paradigm4 presents a webinar about using SciDB for scalable financial analytics. You’ll see how SciDB reaches Big Data scale without forcing you to become a computer scientist—no mapping, no reducing, no concocting parallel algorithms by hand. The webinar will also demonstrate SciDB-R, an R package that lets you remain an R programmer while enjoying the scalable

Read more »

Social Network Analysis at New Frontiers in Computing 2013

May 16, 2013
By
Social Network Analysis at New Frontiers in Computing 2013

by Joseph Rickert This past Saturday, the New Frontiers in Computing Conference (NFIC 2013), held at Stanford University, explored the theme: Social Network Analysis: It’s Who You Know. The speakers were a well-chosen, eclectic lot who covered a remarkable array of issues in less than a full day. Ian Hersey, former CTO of Attensity spoke on Lessons from Large-Scale...

Read more »

A function for comparing groups on a set of variables

May 16, 2013
By

I'm often in the position of needing to compare groups of either items or participants on some set of variables. For example, I might want to compare recognition of words that differ on some measure of lexical neighborhood density but are matched on wo...

Read more »

Using RcppProgress to control the long computations in C++

May 16, 2013
By
Using RcppProgress to control the long computations in C++

Usually you write c++ code with R when you want to speedup some calculations. Depending on the parameters, and especially during the development, it is difficult to anticipate the execution time of your computation, so that you do not know if you have to wait for 1 minute or hours. RcppProgress is a tool to help you monitor the...

Read more »

Statistics vs Data Science vs BI

May 15, 2013
By
Statistics vs Data Science vs BI

As someone who trained as a statistician, I've always struggled with that title. I love the rigor and insight that Statistics brings to data analysis, but let's face it: Statistics — the name — has always had a bit of a branding problem. Telling someone I was a statistician was more likely to conjure up images of me counting...

Read more »

Even More JGB Yield Charts with R lattice

May 15, 2013
By
Even More JGB Yield Charts with R lattice

See the last post for all the details. I just could not help creating a couple more. Variations on Favorite Plot - Time Series Line of JGB Yields by Maturity p2 <- xyplot(value ~ date | indexname, data = jgb.melt, type = "l", layout = c(length(unique(jgb.melt$indexname)), ...

Read more »

Exponential Cache Behavior

May 15, 2013
By
Exponential Cache Behavior

Guerrilla alumnus Gary Little observed certain fixed-point behavior in simulations where disk IO blocks are updated randomly in a fixed size cache. For his python simulation with 10 million entries (corresponding to an allocation of about 400 MB of memory) the following results were obtained: Hit ratio (i.e., occupied) = 0.3676748 Miss ratio...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.