Exploratory Data Analysis: Combining Histograms and Density Plots to Examine the Distribution of the Ozone Pollution Data from New York in R

Introduction This is a follow-up post to my recent introduction of histograms.  Previously, I presented the conceptual foundations of histograms and used a histogram to approximate the distribution of the “Ozone” data from the built-in data set “airquality” in R.  Today, I will examine this distribution in more detail by overlaying the histogram with parametric

Easier Database Querying with R

July 29, 2013
By

I have a strong distaste for database connection management.  All I want to do when I want to query one of our many databases at work is to simply supply the query, and package the result into an R data.frame or data.table. R has many great database connection tools, including but not limited to RPostgreSQL,

analyze the youth risk behavior surveillance system (yrbss) with r

July 29, 2013
By

the youth risk behavior surveillance system is the high school edition of the behavioral risk factor surveillance system (brfss), a scientific study of good kids who do bad things.  questions are mostly about sex, drugs, rock and roll, and populat...

BCEA 2.0

July 28, 2013
By

I know that updating a package too often is not quite good practice, so, given we've released BCEA 1.3-1 just about a month ago, this is way too soon to move forward. But between the last release and now, I've been doing some reading and have made some...

Orthogonal Partial Least Squares (OPLS) in R

July 28, 2013
By

I often need to analyze and model very wide data (variables >>>samples), and because of this I gravitate to robust yet relatively simple methods. In my opinion partial least squares (PLS) is a particular useful algorithm. Simply put, PLS is an extension of principal components analysis (PCA), a non-supervised  method to maximizing  variance explained in X,

Classification of the Hyper-Spectral and LiDAR Imagery using R (mostly). Part 1: Result Evaluation

July 28, 2013
By

Introduction There was the EEEI Data Fusion Contest this spring. This year they wanted people to elaborate about hyper-spectral (142-bands imagery) and LiDAR data. The resolution of the data-set was about 5 m.  There were 2 nominations: best classification and  the best scientific paper.  I work with high-resolution imagery quite often, but classification is a very rear task for me...

Hopfield Networks in Julia

July 28, 2013
By

As a fun side project last night, I decided to implement a basic package for working with Hopfield networks in Julia. Since I suspect many of the readers of this blog have never seen a Hopfield net before, let me explain what they are and what they can be used for. The short-and-skinny is that

Modeling Match Results in La Liga Using a Hierarchical Bayesian Poisson Model: Part two.

July 28, 2013
By

In the last blog post I showed my initial attempt at modeling football results in La Liga using a Bayesian Poission model, but there was one glaring problem with the model; it did not consider the advantage of being the home team. In this post I will show how to fix this! I will also show a way...

A JAGS calculation on pattern of rain January 1906-1915 against 2003-2012

July 28, 2013
By

Two weeks ago I showed rain data from six stations in Netherlands years 1906 till now. Last week I showed that frequency of days with and without rain differed between December 1906-1915 and December 2003-2012. This week I am considering the ...

SAP HANA OData and R

As you might have discovered by now...I love R...it's just an amazing programming language...By now...I have integrate R and SAP HANA via ODBC and via the SAP HANA-R integration...but I have completely left out the SAP HANA OData capabilities.For this ...

H*Wind Analyses Using R

July 27, 2013
By

I did a (relatively) simple work-up over on Github showing how to load in AOML/NOAA H*Wind analysis data and display it. I used the R program, which is an open-source, community-maintained statistical platform capable of handling – well – just about anything. So what is H*Wind? From AOML/NOAA: Since 1996 the Hurricane Research Division has participated in... Continue reading » The...

The Secrets of Inverse Brogramming, reprise

July 27, 2013
By

Brogramming is the art of looking good while you write code. Inverse brogramming is a silly term that I’m trying to coin for the opposite, but more important, concept: the art of writing good looking code. At useR2013 I gave a talk on inverse brogramming in R – for those of you who weren’t there

Science link fest for the week of the 27th

July 27, 2013
By

Hello Paleoposse! This week I bring you Egyptians and iron meteorites, Neil deGrasse Tyson as the Carl Sagan of our generation and a mild rant about poor quality science reporting. I’m off to enjoy my 30th birthday. Hope you all have an awesome weekend! Solid science: Here’s a story that’s near and dear to my

Whilst reading John Hempton’s post on shorting \$HLF I…

July 27, 2013
By

only the most active trading days \$HLF (HerbaLife weight-loss supplements / MLM) \$HLF regular history big loss days and big volume days for \$HLF. "Ackman" should instead read "Einhorn".Whilst reading John Hempton’s post on shorting \$HLF I decided to follow along in quantmod. Bronte Capital: It was the night before Christmas… falsifying Bill Ackman’s...

Whilst reading John Hempton’s post on shorting \$HLF I…

July 27, 2013
By

only the most active trading days \$HLF (HerbaLife weight-loss supplements / MLM) \$HLF regular history big loss days and big volume days for \$HLF. "Ackman" should instead read "Einhorn".Whilst reading John Hempton’s post on shorting \$HLF I decided to follow along in quantmod. Bronte Capital: It was the night before Christmas… falsifying Bill Ackman’s...

Using Geany for programming in R

July 27, 2013
By

I like Geany as a no-nonsense Integrated Development Environment (IDE). It is fast, elegant, intuitive, and lets you get your programming job done. (I certainly find it superior to the more popular Gedit.) You can also use it to program in R, and this page will show off some tips for doing that. Execute commands

Network visualization – part 2: Gephi

July 26, 2013
By

In the second part of my “how to quickly visualize networks directly from R” series, I’ll discuss how to use R and the “rgexf” package to create network plots in Gephi. Gephi is a great network visualization tool that allows … Continue reading →

July 26, 2013
By

I’m very indebted to the ff and ffbase packages in R.  Without them, I probably would have to use some less savoury stats program for my bigger data analysis projects that I do at work. Since I started using ff … Continue reading →

Creating Catch Data from Individual Length Measurements II

July 26, 2013
By

Note that this is largely a repeat of a previous post (except that I have added a few plots at the bottom) as I am experimenting with being able to write posts here directly from R using the knit2wp() function … Continue reading →

Easy pictograms using R

July 26, 2013
By

I have been amazed for a while that there is no major stats software offering pictograms. You know the sort of classic infographic I mean: Well, I have been working on an R function to help with this. It’s at … Continue reading →

ggplot2 with Noam Ross theme

July 26, 2013
By

When I first saw Noam Ross' blog post "The null model for age effects with overdispersed infection", I immediately liked the look of his ggplot2 graphs. I was even more delighted when I discovered that he has made his theme available on github. Even though I am all into rCharts, I still love a beautiful publication...

Evolve your own beats — automatically generating music via algorithms

July 26, 2013
By

Update: you can find the next post in this series here. I recently went to an excellent music meetup where people spoke about the intersection of music and technology. One speaker in particular talked about how music is now being generated by computer. Music has always fascinated me. It can make us feel emotions in a way few...

Evolve your own beats: automatically generating music via algorithms

July 26, 2013
By

I recently went to an excellent music meetup where people spoke about the intersection of music and technology. One speaker in particular talked about how music is now being generated by computer. Music has always fascinated me. It can make us feel emotions in a way few media can. Sadly, I have always been unable...

Architect 0.9.3

July 26, 2013
By

Friday 26 July 2013 - 13:35 Architect is an Eclipse-based cross-platform IDE for R packed with features for advanced R users. Need convincing? Let's take a look at some of Architect's most popular features. Visual Debugger Architect comes eq...

Architect 0.9.3

July 26, 2013
By

Friday 26 July 2013 - 13:35 Architect is an Eclipse-based cross-platform IDE for R packed with features for advanced R users. Need convincing? Let's take a look at some of Architect's most popular features. Visual Debugger Architect comes eq...

Evolve your own beats: automatically generating music via algorithms

July 26, 2013
By

Update: you can find the next post in this series here. I recently went to an excellent music meetup where people spoke about the intersection of music and technology. One speaker in particular talked about how music is now being generated by computer. Music has always fascinated me. It can make us feel emotions in a way few...

Amount of end-user usage of code in Firefox

July 25, 2013
By

How much end-user usage does the code in Firefox receive over time? Short answer: The available data is very sparse and lots of hand waving is needed to concoct something. The longer answer is below as another draft section from my book Empirical software engineering with R. As always comments and pointers to more data

SAP HANA and R – Keep shining

Since I discovered Shiny and published my blog A Shiny example - SAP HANA, R and Shiny I always wanted to actually run a Shiny application from SAP HANA Studio, instead of having to call it from RStudio and having to use an ODBC connection.A couple of ...

Revolution Analytics Supports the R Community

July 25, 2013
By

by Joseph Rickert Early on, Revolution Analytics realized that R is more than just a tool for statistical computing — it is also the culture that has grown up around the use of the tool. The R culture is open and inclusive, competitive but also nourishing. There is a strong sense of community that encourages contribution and growth. We...