Monthly Archives: May 2013

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Introduction Last week, I wrote the first post in a series on exploratory data analysis (EDA).  I began by calculating summary statistics on a univariate data set of ozone concentration in New York City in the built-in data set “airquality” in R.  In particular, I talked about how to calculate those statistics when the data

Read more »

Using R to visualize geo optimization algorithms

May 26, 2013
By
Using R to visualize geo optimization algorithms

Site optimization is the process of finding an optimal location for a plant or a warehouse to minimize transportation costs and duration. A simple model only consists of one good and no restrictions regarding transportation capacities or delivery time. The optimizing algorithms are often hard to understand. Fortunately, R is a great tool to make them more comprehensible.The basic...

Read more »

Creating a typical textbook illustration of statistical power using either ggplot or base graphics

May 26, 2013
By
Creating a typical textbook illustration of statistical power using either ggplot or base graphics

A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis and the alternative hypothesis. Typically, these illustrations highlight the regions that correspond to making a type II error, type I error and correctly rejecting the null hypothesis (i.e. the test's power). In this post...

Read more »

Creating a typical textbook illustration of statistical power using either ggplot or base graphics

May 26, 2013
By
Creating a typical textbook illustration of statistical power using either ggplot or base graphics

A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis ($ H_0 $) and the alternative hypothesis ($ H_A $). Typically, these illustrations highlight the regions that correspond to making a type II error ($ beta $), type I...

Read more »

More bubble sort tuning

May 26, 2013
By

After last week's post bubble sort tuning I got an email from Berend Hasselman noting that my 'best' function did not protect against cases n<=2 and a speed improvement was possible. That made me realize that I should have been profiling t...

Read more »

Test Drive of Parallel Computing with R

May 25, 2013
By
Test Drive of Parallel Computing with R

Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function. In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to

Read more »

Revisiting text processing with R and Python

May 25, 2013
By

  Back in 2011, I covered the relative performance difference of the most popular libraries for text processing in R and Python.   In case you can’t guess the answer, Python and NLTK  won by a significant margin over R and… Read more ›

Read more »

Speed trick: Assigning large object NULL is much faster than using rm()!

May 25, 2013
By

When processing large data sets in R you often also end up creating large temporary objects. In order to keep the memory footprint small, it is always good to remove those temporary objects as soon as possible. When done, removed objects will be deallocated from memory (RAM) the next time the garbage collection runs. Better: Use rm(list="x")...

Read more »

HOWTO: X11 Forwarding for Oracle R Enterprise

May 25, 2013
By
HOWTO: X11 Forwarding for Oracle R Enterprise

v\:* {behavior:url(#default#VML);} o\:* {behavior:url(#default#VML);} w\:* {behavior:url(#default#VML);} .shape {behavior:url(#default#VML);} Normal 0 false false false EN-US X-NONE X-NONE ...

Read more »

Sentiment analysis finds trouble in the Enron emails

May 24, 2013
By
Sentiment analysis finds trouble in the Enron emails

The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)