Articles by Wesley

Distribution of T-Scores

March 2, 2013 | Wesley

Like most of my post these code snippets derive from various other projects.  In this example it shows a simulation of how one can determine if a set of t statistics are distributed properly.  This can be useful when sampling known populations (e.g. U.S. census or hospital populations) ... [Read more...]

Bootstrap Confidence Intervals

February 1, 2013 | Wesley

Here is an example of nonparametric bootstrapping.  It’s a powerful technique that is similar to the Jackknife. With the bootstrap, however, the approach uses re-sampling. It’s clearly not as good as parametric approaches but it gets the job done. This can be used in a variety of situations ... [Read more...]

Binomial Confidence Intervals

January 22, 2013 | Wesley

This stems from a couple of binomial distribution projects I have been working on recently.  It’s widely known that there are many different flavors of confidence intervals for the binomial distribution.  The reason for this is that there is a coverage problem with these intervals (see Coverage Probability).  A 95% ... [Read more...]

Y2K38: Our Own Mayan Calendar…Again

December 21, 2012 | Wesley

It’s not quite the end of the world as we know it.  We made it through December 21, 2012 unscathed. It’s not going to be the last time we will make it through such a pseudo-calamity.  After all we have built our own end of the world before (e.g. ... [Read more...]

True Significance of a T Statistic

December 17, 2012 | Wesley

The example is more of a statistical exercise that  shows the true significance and the density curve of simulated random normal data.  The code can be changed to generate data using either a different mean and standard deviation or a different distribution altogether. This extends the idea of estimating pi ... [Read more...]

Estimating Pi

December 11, 2012 | Wesley

Recently I’ve been working on some jackknife and bootstrapping problems.  While working on those projects I figured it would be a fun distraction to take the process and estimate pi.  I’m sure this problem has been tackled countless times but I have never bothered to try it using ... [Read more...]

Mean Value from Grouped Data

December 7, 2012 | Wesley

Occasionally, I will get requests from clients to calculate the mean. Most of the time it’s a simple request but from time-to-time the data was originally from grouped data. A common approach is to take the midpoint of each of the groups and just assume that all respondents within ... [Read more...]

Importing Data Into R from Different Sources

December 6, 2012 | Wesley

I have found that I get data from many different sources.  These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files.  I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is ... [Read more...]

Plotting Likert Scales

December 4, 2012 | Wesley

Graphs can provide an excellent way to emphasize a point and to quickly and efficiently show important information. Sadly, poor graphs can be a good way to waste space in an article, take up time in a presentation, and waste a lot of ink all while providing little to no ... [Read more...]

Earthquakes Over the Past 7 Days

November 29, 2012 | Wesley

This is a brief example using the maps in R and to highlight a source of data.  This is real-time data and it comes from the U.S. Geological Society.  This shows the location of earthquakes with magnitude of at least 1.0 in the lower 48 states. library(maps) library(maptools) library(... [Read more...]

Hurricane Sandy Land Wind Speed and Kriging

November 28, 2012 | Wesley

NJ Hurricane Sandy Landfall Data These data come from the National Climatic Data Center (NCDC).  Using the above link will download all of the data collected by the NCDC on the day of Hurricane Sandy.  The data can also be obtained directly from the source at http://cdo.ncdc.noaa.... [Read more...]

What Time Is It?

November 28, 2012 | Wesley

A common scenario that I run into is time and how to deal with it. I often will do a  variety of summaries and analysis that need to be measured at different points in time. Whether I want to graph the data or review the results I need to be ... [Read more...]

Using R to Compare Hurricane Sandy and Hurricane Irene

November 3, 2012 | Wesley

Having just lived through two back to back hurricanes (Irene in 2011 and Sandy in 2012) that passed through the New York metro area I was curious how the paths of the hurricanes differed.  I worked up a quick graph in R using data from Unisys.  The data also includes wind speed ... [Read more...]

Mapping Capabilities in R

November 2, 2012 | Wesley

From time-to-time creating a basic map of the United States or other parts of the world to complement some statistical analysis is useful to emphasize a point. The maps package in R provide a good way to produce these these maps.  These maps axes are based on latitude and longitude ... [Read more...]

Text Mining

October 15, 2012 | Wesley

When it comes down to it R does a really good job handling structured data like matrices and data frames. However, its ability to work with unstructured data is still a work in progress. It can and it does handle text mining but the documentation is incomplete and the capabilities ... [Read more...]

Association Rule Learning and the Apriori Algorithm

September 26, 2012 | Wesley

Association Rule Learning (also called Association Rule Mining) is a common technique used to find associations between many variables. It is often used by grocery stores, retailers, and anyone with a large transactional databases. It’s the same way that Target knows your pregnant or when you’re buying an ... [Read more...]

Data Frames and Transactions

September 24, 2012 | Wesley

Transactions are a very useful tool when dealing with data mining.  It provides a way to mine itemsets or rules on datasets. In R the data must be in transactions form.  If the data is only available in a data.frame then to create (or coerce) the data frame to ... [Read more...]

Power Analysis and the Probability of Errors

September 22, 2012 | Wesley

Power analysis is a very useful tool to estimate the statistical power from a study. It effectively allows a researcher to determine the needed sample size in order to obtained the required statistical power. Clients often ask (and rightfully so) what the sample size should be for a proposed project. ... [Read more...]


September 15, 2012 | Wesley

N-Way ANOVA example Two-way analysis of variance is where the rubber hits the road, so to speak. This extends the concepts of ANOVA with only one factor to two factors. When there are two factors this means that there can be an interaction between the two factors that should be ... [Read more...]


September 11, 2012 | Wesley

One-Way ANOVA Analysis of variance is a tool used for a variety of purposes. Applications range from a common one-way ANOVA, to experimental blocking, to more complex nested designs. This first ANOVA example provides the necessary tools to analyze data using this technique. This example will show a basic one-way ... [Read more...]
1 2 3 4

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)