Using R to communicate via a socket connection

May 28, 2013
By

Occasionally, the need arises to communicate with R via another process. There are packages available to facilitate this communication, but for simple problems, a socket connection may be the answer. Nearly all software languages have a socket communic...

Read more »

Converting Existing R Scripts to ORE – Getting Started

May 28, 2013
By
Converting Existing R Scripts to ORE – Getting Started

Normal 0 false false false EN-US X-NONE X-NONE ...

Read more »

R/Finance 2013 Review

May 28, 2013
By

It's been one week since the 5th Annual R/Finance conference, and I finally feel sufficiently recovered enough to share my thoughts. The conference is a two-day whirlwind of applied quantitative finance, fantastic networking, and general geekery.The comments below are based on my personal experience.  If I don't comment on a seminar or presentation, it doesn't mean I...

Read more »

Value at Risk with exponential smoothing

May 28, 2013
By
Value at Risk with exponential smoothing

More accurate than historical, simpler than garch. Previously We’ve discussed exponential smoothing in “Exponential decay models”. The same portfolios were submitted to the same sort of analysis in “A look at historical Value at Risk”. Issue Markets experience volatility clustering.  As the previous post makes clear, historical VaR suffers dramatically from this.  An alternative is … Continue reading...

Read more »

Interactive presentation with slidify and googleVis

May 28, 2013
By
Interactive presentation with slidify and googleVis

Last week I was invited to give an introduction to googleVis at Lancaster University. This time I decided to use the R package slidify for my talk. Slidify, like knitr, is built on Markdown and makes it very easy to create beautiful HTML5 presentations...

Read more »

The heat is on…. or is it? Trend Analysis of Toronto Climate Data

May 27, 2013
By
The heat is on…. or is it? Trend Analysis of Toronto Climate Data

The following is a guest post from Joel Harrrison, PhD, consulting Aquatic Scientist.For a luddite like me, this is a big step – posting something on the inter-web.  I’m not on Facebook.  I don’t know what Twitter is.  Hell, I don’t even own a smartphone.  But, I’ve been a devoted follower of Myles’ blog for some time,...

Read more »

Updates to the Social Science Starter Kit

May 27, 2013
By

The Emacs Social Science Starter Kit is a drop-in collection of packages and settings for Emacs 24 aimed at people like me: that is, people doing social science data analysis and writing, using some combination of tools like R, git, LaTeX, Pandoc, perh...

Read more »

Updates to the Social Science Starter Kit

May 27, 2013
By

The Emacs Social Science Starter Kit is a drop-in collection of packages and settings for Emacs 24 aimed at people like me: that is, people doing social science data analysis and writing, using some combination of tools like R, git, LaTeX, Pandoc, perh...

Read more »

Updates to the Social Science Starter Kit

May 27, 2013
By

The Emacs Social Science Starter Kit is a drop-in collection of packages and settings for Emacs 24 aimed at people like me: that is, people doing social science data analysis and writing, using some combination of tools like R, git, LaTeX, Pandoc, perh...

Read more »

Writing a Minimal Working Example (MWE) in R

May 27, 2013
By
Writing a Minimal Working Example (MWE) in R

How to Ask for Help using R How to Ask for Help using R The key to getting good help with an R problem is to provide a minimally working reproducible example (MWRE). Making an MWRE is really easy with R, and it will help ensure that...

Read more »

Bayesian model II regression

May 27, 2013
By
Bayesian model II regression

Regression is a mainstay of ecological and evolutionary data analysis. For example, a disease ecologist may use body size (e.g. a weight from a scale with measurement error) to predict infection. Classical linear regression assumes no error in covariates; they are known exactly. This is rarely the case in ecology, and ignoring error in covariates can bias regression coefficient...

Read more »

(Another) introduction to R

May 27, 2013
By
(Another) introduction to R

It’s Memorial Day and my dissertation defense is tomorrow. This week I’m phoning in my blog. I had the opportunity to teach a short course last week that was part of a larger workshop focused on ecosystem restoration. A fellow grad student and I taught a session on Excel and R for basic data analysis.

Read more »

useR! 2013 conference update

May 27, 2013
By

Been trying to reach the website for the useR! 2013, the R user conference that will be held July 10-12 2013 at University of Castilla-La Mancha in Spain? There have been some server problems recently, but you can now get access at the official URL, www.r-project.org/useR-2013/. If you're going, be sure to check out the newly-announced Data Analysis Contest,...

Read more »

Log Transformations for Skewed and Wide Distributions

May 27, 2013
By
Log Transformations for Skewed and Wide Distributions

This is a guest article by Nina Zumel and John Mount, authors of the new book Practical Data Science with R. For readers of this blog, there is a 50% discount off the “Practical Data Science with R” book, simply by using the code pdswrblo when reaching checkout (until …Read more »

Read more »

Combinatorial optimization with gaoptim package

May 27, 2013
By

My recent update of gaoptim package brings up a new function, GAPerm, which can be used to perform combinatorial optimization using the Genetic Algorithm approach. The example below solves a TSP instance with 10 points around a circumference, the...

Read more »

R / Finance 2013 Recap — and Presentation Slides

The fifth internation R/Finance conference was held last weekend. As one of the founding co-organizers, I may well be accussed of a little bias, but we think we once again pulled off a very nice and successful weekend-long event. Participants had ki...

Read more »

Creating a presence-absence raster from point data

May 27, 2013
By
Creating a presence-absence raster from point data

I’m working on generating species distribution models at the moment for a few hundred species. Which means that I’m trying to automate as many steps as possible in R to avoid having to click buttons hundreds of times in ArcView. … Continue reading →

Read more »

BISON USGS species occurrence data

May 27, 2013
By
BISON USGS species occurrence data

The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the...

Read more »

BISON USGS species occurrence data

May 27, 2013
By
BISON USGS species occurrence data

The USGS recently released a way to search for and get species occurrence records for the USA. The service is called BISON (Biodiversity Information Serving Our Nation). The service has a web interface for human interaction in a browser, and two APIs (application programming interface) to allow machines to interact with their database. One of the...

Read more »

Import All Text Files in A Folder with Parallel Execution

May 26, 2013
By
Import All Text Files in A Folder with Parallel Execution

Sometimes, we might need to import all files, e.g. *.txt, with the same data layout in a folder without knowing each file name and then combine all pieces together. With the old method, we can use lapply() and do.call() functions to accomplish the task. However, when there are a large number of such files and

Read more »

Logging Data in R Loops: Applied to Twitter.

May 26, 2013
By

A problem that many users face in R is storing the output from loop operations. In the case of Twitter, we may be requesting the last specified number of Tweets from a number of Twitter users. Several methods exist for … Continue reading →

Read more »

Pairwise distances in R

May 26, 2013
By
Pairwise distances in R

For a recent project I needed to calculate the pairwise distances of a set of observations to a set of cluster centers. In MATLAB you can use the pdist function for this. As far as I know, there is no equivalent in the R standard packages. So I looked into writing a fast implementation for

Read more »

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Exploratory Data Analysis: Variations of Box Plots in R for Ozone Concentrations in New York City and Ozonopolis

Introduction Last week, I wrote the first post in a series on exploratory data analysis (EDA).  I began by calculating summary statistics on a univariate data set of ozone concentration in New York City in the built-in data set “airquality” in R.  In particular, I talked about how to calculate those statistics when the data

Read more »

Using R to visualize geo optimization algorithms

May 26, 2013
By
Using R to visualize geo optimization algorithms

Site optimization is the process of finding an optimal location for a plant or a warehouse to minimize transportation costs and duration. A simple model only consists of one good and no restrictions regarding transportation capacities or delivery time. The optimizing algorithms are often hard to understand. Fortunately, R is a great tool to make them more comprehensible.The basic...

Read more »

Creating a typical textbook illustration of statistical power using either ggplot or base graphics

May 26, 2013
By
Creating a typical textbook illustration of statistical power using either ggplot or base graphics

A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis and the alternative hypothesis. Typically, these illustrations highlight the regions that correspond to making a type II error, type I error and correctly rejecting the null hypothesis (i.e. the test's power). In this post...

Read more »

Creating a typical textbook illustration of statistical power using either ggplot or base graphics

May 26, 2013
By
Creating a typical textbook illustration of statistical power using either ggplot or base graphics

A common way of illustrating the idea behind statistical power in null hypothesis significance testing, is by plotting the sampling distributions of the null hypothesis ($ H_0 $) and the alternative hypothesis ($ H_A $). Typically, these illustrations highlight the regions that correspond to making a type II error ($ beta $), type I...

Read more »

More bubble sort tuning

May 26, 2013
By

After last week's post bubble sort tuning I got an email from Berend Hasselman noting that my 'best' function did not protect against cases n<=2 and a speed improvement was possible. That made me realize that I should have been profiling t...

Read more »

Test Drive of Parallel Computing with R

May 25, 2013
By
Test Drive of Parallel Computing with R

Today, I did a test run of parallel computing with snow and multicore packages in R and compared the parallelism with the single-thread lapply() function. In the test code below, a data.frame with 20M rows is simulated in a Ubuntu VM with 8-core CPU and 10-G memory. As the baseline, lapply() function is employed to

Read more »

Revisiting text processing with R and Python

May 25, 2013
By

  Back in 2011, I covered the relative performance difference of the most popular libraries for text processing in R and Python.   In case you can’t guess the answer, Python and NLTK  won by a significant margin over R and… Read more ›

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



http://www.eoda.de









ODSC

CRC R books series











Contact us if you wish to help support R-bloggers, and place your banner here.