# 2660 search results for "ggplot2"

March 1, 2014
By

OECD.Stat is a commonly used statistics portal in the research world but there are no easy ways (that I know of) to query it straight from R. There are two main benefits of querying OECD.Stat straight from R: 1. Create reproducible analysis (something that is easily lost if you have to download excel files) 2.

## Who knows the Oscar winners? The betting markets, probably.

February 28, 2014
By

This is the time of year when everyone likes to speculate on the winners of the Academy Awards, to be announced on Sunday. There are plenty of ways to try and predict which movie is going to win Best Picture or who'll win Best Actress. You could look at the various betting markets and see who the speculators are...

## Simply creating various scatter plots with ggplot #rstats

February 28, 2014
By

Inspired by these two postings, I thought about including a function in my package for simply creating scatter plots. In my package, there’s a function called sjp.scatter for creating scatter plots. To reproduce these examples, first load the package and then attach the sample data set: The simplest function call is by just providing two

## The tf-idf-Statistic For Keyword Extraction

February 27, 2014
By

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first on joy...

## Easily generate correlated variables from any distribution

February 27, 2014
By

In this post I will demonstrate in R how to draw correlated random variables from any distributionThe idea is simple.  1. Draw any number of variables from a joint normal distribution. 2. Apply the univariate normal CDF of variables to derive pro...

## Type I error rates in test of normality by simulation

February 26, 2014
By

This simulation tests the type I error rates of the Shapiro-Wilk test of normality in R and SAS. First, we run a simulation in R. Notice the simulation is vectorized: there are no "for" loops that clutter the code and slow the simulation. # type I erroralpha <- 0.05# number of simulationsn.simulations <- 10000#...

## Unemployment revisited

February 23, 2014
By

Approximately a year ago I made a post graphing unemployment in Europe and other locations. I have always wanted to do this again, not because the R-code would be so interesting, but just because I wanted to see the plots. As time progressed I attempte...

## The gap between data mining and predictive models

February 20, 2014
By

The Facebook data science blog shared some fun data explorations this Valentine’s Day in Carlos Greg Diuk’s “The Formation of Love”. They are rightly receiving positive interest in and positive reviews of their work (for example Robinson Meyer’s Atlantic article). The finding is also a great opportunity to discuss the gap between cool data mining Related posts:

## Shapefile Polygons Plotted on Google Maps Using ggmap in R – Throw some, throw some STATS on that map…(Part 2)

February 20, 2014
By

Well it’s been long enough since my last post. Had a few things on my plate (vacation, holidays, another holiday, some more holidays, and quite a lot of research). March is almost here but the good news is that I have plenty of work stored up to start serving out some intuitive approaches for learning

February 20, 2014
By

One of the more tedious parts of working with R is maintaining my R library. To make my R scripts reproducible and sharable, I will install packages if they are not available. For example, the top of my R scripts tend to look something like this: if(!require(devtools) | !require(ggplot2) | !require(psych) | !require(lme4) | !require(benchmark)) { install.packages(c('devtools','ggplot2','psych','lme4','benchmark')) } This has worked fine for...