Estimate Probability and Quantile

January 25, 2011
Simple root finding and one dimensional integrals algorithms were implemented in previous posts. These algorithms can be used to estimate the cumulative probabilities and quantiles. Here, take normal distribution as an example. Read More: 281 Words Totally

Listening for trends in US baby names over 130 years

January 25, 2011
What happens when you mash together R‘s data crunching magic, Festival‘s speech synthesis power, and the audio wonders of the venerable music language Csound? You fall even more in love with free and open-source software, and you start hearing sounds like this: A single beat of the above sound represents the top 1000 baby names

Climate Time Series In a Single CSV File: Update 1

January 24, 2011
I am pleased to announce my CTS.csv file which includes 18 climate monthly time series in one easy to access csv file. This is part of  my goal of having a user friendly way for do-it-yourself citizen climate scientists to … Continue reading →

A twitter feed for new R packages

January 24, 2011
Want to keep up-to-date on the latest R packages released to CRAN? Dirk Eddelbuettel's CRANberries service now tweets the release of new R packages to @CRANberriesFeed, so all you need to do is follow that user on Twitter. R hackers may also be interested to see how this Twitter feed was implemented -- in R, of course. Dirk has...

Review of “R Graphs Cookbook” by Hrishi Mittal

January 24, 2011
Executive summary: Extremely useful for new users, informative to even quite seasoned users. Refereeing Once upon a time a publisher asked if I would referee a book (unspecified) about R.  In an instance that can only be described as psychotic I said yes.  That bit of insanity turned out to be a good thing. I … Continue reading...

Trends in partisanship by state

January 24, 2011
Matthew Yglesias discusses how West Virginia used to be a Democratic state but is now solidly Republican. I thought it would be helpful to expand this to look at trends since 1948 (rather than just 1988) and all 50 states...

Merge Me Baby One More Time!

January 24, 2011
OK – has this ever happened to you? You are working with a team of collaborators all using a common dataset – maybe from an Agency, and LTER, or someone else’s data altogether. Each of you has some task – incorporating new data, running fancy models and putting the results back into the data for

Example 8.22: latent class modeling using randomLCA

January 24, 2011
In Example 8.21 we described how to fit a latent class model to data from the HELP dataset using SAS and R. Subjects were classified based on their observed (manifest) status on the following variables (on street or in shelter in past 180 days [homele...

Pattern Matching for Transcription Factor Binding Sites

January 24, 2011
I'm trying to search for binding sites for the transcription factor MAF (i.e. TFBS for MAF) in the promoter regions of various genes. I initially started out looking at a precomputed database of binding sites MAPPER. However the TFBS models that have ...

R Tutorial Series: One-Way ANOVA with Pairwise Comparisons

January 24, 2011
When we have more than two groups in a one-way ANOVA, we typically want to statistically assess the differences between each group. Whereas a one-way omnibus ANOVA assesses whether a significant difference exists at all amongst the groups, pairwise com...

Hello world!

January 24, 2011
I suppose that “Hello World” is the first thing that any blogger should do when starts a blog. So here I go “HELLO WORLD!!!” The aim of this blog is to gather my thoughts and experience around learning R and hopefully to get a lot of insights from my readers. Officially this is my third attempt

Paying interest and the number e

January 24, 2011
Suppose I borrow a dollar from you and I’ll pay you 100% interest at the end of the year.  How much money will you have then? $1 * (1 + 1) =$2 What happens if instead the interest is calculated as  50% twice in the year? $1 * (1.5 * 1.5) =$2.25 After … Continue reading...

Using RClimate To Retrieve Climate Series Data

January 23, 2011
This post shows how to use RClimate.txt to retrieve a climate time series and write a csv file in 5 lines of R script. One of my readers, Robert, wants to be able to download climate time series data and … Continue reading →

Using R for Introductory Statistics, Chapter 5

January 23, 2011
Any good stats book has to cover a bit of basic probability. That's the purpose of Chapter 5 of Using R for Introductory Statistics, starting with a few definitions: Random variable A random number drawn from a population. A random variable is a variable for which we define a range of possible values and...

Blackbox trading Strategy using Rapidminer and R

January 23, 2011
This my first post in 2011. this post has cost me a bit more than usual, but I hope it meets expectations. The aim of this tutorial is to generate an algorithm based on black box trading, with all the necessary elements for evaluation. That is a first post of several, in order to explore the problems, features of...

CRANberries is now tweeting

January 23, 2011
The CRANberries service (which reports on new and updated CRAN packages for the R language and environment) is now tweeting about new packages. Simply follow @CRANberriesFeed to receive theses messages. For the technically minded, adding this to the...

STATA: Regular expressions

January 23, 2011
A regular expression allows you to do a moderately fancy search (and replace if you want). So say you wanted to replace all the "Dennis"s in a variable with "Awesome"s, but only if they're at the end of the line. You could try:-replace PBFnamevar = r...

Merging Multiple Data Frames in R

January 23, 2011
Earlier I had a problem that required merging 3 years of trade data, with about 12 csv files per year. Merging all of these data sets with pairwise left joins using the R merge statement worked (especially after correcting some errors pointed out by Ha...

The Art of Exploratory Data Analysis

This blog is about the art of exploratory data analysis, which is also the subject of my new book, Exploring Data in Engineering, the Sciences, and Medicine (http://www.oup.com/us/ExploringData).  This art is appropriate in situations where y...

Flexibility of R Graphics

January 21, 2011
(note scroll all the way down to see 'old code' and 'new more flexible code' Recall and older post that presented overlapping density plots using R (Visualizing Agricultural Subsidies by KY County) see image below.The code I used to produce this plot m...

Posted Question for R Users

January 21, 2011
I recently undertook a project where a colleague had about 12 .csv files that they wanted to merge. Each file had a common (key) variable 'Partner' (which is trading partner) with differing columns (variables) except for the common key variable. Actual...

Hard drive occupation prediction with R – part 2 – Getting the probability distribution

On the first article, we saw a quick-and-dirty method to predict disk space exhaustion when the usage pattern is rigorously linear. We did that by importing our data into R and making a linear regression. In this article we will see the problems with that method, and deploy a more robust solution. Besides robustness, we will also see how we can generate...

Hard drive occupation prediction with R – part 2

On the first article, we saw a quick-and-dirty method to predict disk space exhaustion when the usage pattern is rigorously linear. We did that by importing our data into R and making a linear regression. In this article we will see the problems wit...

Volcanic Solar Dimming, ENSO and Temperature Anomalies

January 21, 2011
In previous posts I have shown plots of global temperature anomaly, volcano and Nino34 trends (here , here). In this post , I want to further  explore the role of volcanic eruptions and Nino34 phases (El Nino, La Nina) on … Continue reading →