Overfitting

October 13, 2012
By
Overfitting

What do you think when you see a model like the one below? Does this strike you as a good model? Or as a bad model? There’s no right or wrong answer to this question, but I’d like to argue that models that are able to match white noise are typically bad things, especially when

Read more »

How to choose the right *apply function

October 13, 2012
By

How to choose the right *apply function: This is an amazing stackoverflow answer to help you decide which of the many *apply functions (apply, lapply, sapply, vapply, mapply, rapply, tapply) is appropriate for the task at hand. I’m planning on doing...

Read more »

Compound Poisson and vectorized computations

October 12, 2012
By
Compound Poisson and vectorized computations

Yesterday, I was asked how to write a code to generate a compound Poisson variables, i.e. a series of random variables  where  is a counting random variable (here Poisson disributed) and where the 's are i.i.d (and ind...

Read more »

Minute by Minute Twitter Sentiment Timeline from the VP debate

October 12, 2012
By
Minute by Minute Twitter Sentiment Timeline from the VP debate

Click on above graph to enlarge. Background The data for this graph was collected automatically every ~60 seconds of the VP debate on 10/11/2012, with an ending aggregate sample size of 363,163 tweets.  From this dataset duplicate tweets were removed (because of bots), which gave a final dataset of 81,124 remaining unique tweets (52,303-Biden, 28,821-Ryan).

Read more »

Color Palettes in HCL Space

October 12, 2012
By
Color Palettes in HCL Space

This is a quick follow-up to my previous post about Color Palettes in RGB Space. Achim Zeileis had commented that, perhaps, it would be more informative to evaluate the color palettes in HCL (polar LUV) space, as that spectrum more accurately describes how humans perceive color. Perhaps more clear trends would emerge in HCL space,

Read more »

Creating SVG Plots from R

October 12, 2012
By

I recently wanted to create a ggplot that I could then 'tweak' furthur. This is my solution, to create an .svg file which can be loaded into a suitable application (I prefer Inkscape) and furthur edited / tweaked. # Build an example Plotlibrary(ggplot2...

Read more »

Nine lightning talks on R

October 12, 2012
By

At Tuesday's Bay Area R User Group meetup, nine speakers gave five-minute talks on various aspects of R. Revolution Analytics' Luba Gloukhov was one of the presenters, and also provides the summary of the talks below. Links to the slides are included where available for you to check out. Ariel Faigon: Chrestomathy with R Ariel walked us through his...

Read more »

Overlay of design matrices in genetic analysis

October 12, 2012
By
Overlay of design matrices in genetic analysis

I’ve ignored my quantitative geneticist side of things for a while (at least in this blog) so this time I’ll cover some code I was exchanging with a couple of colleagues who work for other organizations. It is common to … Continue reading →

Read more »

Using cairographics with ggsave()

October 12, 2012
By
Using cairographics with ggsave()

Whenever possible, I try to save R graphic output in a vector format, typically pdf(). I also like to use the handy ggsave() function to do so, as it streamlines the process, and makes it easy to be consistent across formats. However, at times it is n...

Read more »

Loading SPSS (.sav) into Stata

October 11, 2012
By

Most statistical softwares nowadays are able to convert their files to a wide range of other packages. Perhaps it is the reason for the discontinuity of old converter bundles like SAS Transport and DBMS. Interesting, however, Stata, a quite popular statistical package, still lack built-in support for exporting and importing files among concurrent softwares like

Read more »

Download Stock Price Online with R

October 11, 2012
By
Download Stock Price Online with R

Read more »

Revolution Newsletter: September/October 2012

October 11, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full September/October edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. New R Courses Announced: Two new courses presented by Bob Muenchen (author of R...

Read more »

Random Name Generator in R

October 11, 2012
By

Just for the heck of it, let's recreate my Reality TV Show Name Generator in R. This isn't really the sort of thing you'd normally do in R, but we can try out a bunch of different functions this way: random integers/sampling, concatenation, sorting, an...

Read more »

Pilot Study: Small Town Land Surface Temperature

October 11, 2012
By
Pilot Study: Small Town Land Surface Temperature

Introduction Zhang and Imhoff (2010)  pdf here utilized NLCD impervious surface area (ISA), Olson biomes, and MODIS Land Surface temperature (LST) to estimate the magnitude of UHI in large cities across the US.  Peng  employed a   similar approach in studying 419 large cities ( population greater than 1m ) around world. Peng’s work suggests a limit

Read more »

From holey polygons to convex hulls

October 11, 2012
By
From holey polygons to convex hulls

I only rarely have the occasion to need the convex hull of a set of points, but I love chull(), so I’d like to share an example of how to use it. This Gist also offers a pretty straightforward application of the Split-Apply-Combine strategy (see...

Read more »

Curved arrows in R

October 10, 2012
By
Curved arrows in R

I briefly investigated how to draw curved arrows in R. Here’s a small piece of the figure that I ultimately created: A google search for “curved arrows in R” revealed three options: curvedarrow in the diagram package The internal function igraph.Arrows within the igraph package (mentioned by Gabor Csardi in R help) Using xspline for

Read more »

Know Your Dataset: Specifying colClasses to load up an ffdf

October 10, 2012
By
Know Your Dataset: Specifying colClasses to load up an ffdf

When I finally figured out how to successfully use the ff package to load data into R, I was apparently working with relatively pain free data to load up through read.csv.ffdf (see my previous post).  Just this past Sunday, I … Continue reading →

Read more »

analyze the current population survey (cps) annual social and economic supplement (asec) with r

October 10, 2012
By

the annual march cps-asec has been supplying the statistics for the census bureau's report on income, poverty, and health insurance coverage since 1948.  wow.  the us census bureau and the bureau of labor statistics (bls) tag-team on this one...

Read more »

R amongst most popular languages, according to GitHub/StackOverflow data

October 10, 2012
By
R amongst most popular languages, according to GitHub/StackOverflow data

Data Scientist Drew Conway tackles the problem of deciding which programming languages are the most popular in an interesting way: by comparing the number of projects tagged in GitHub with each language, and the number of questions in StackOverflow about the language. The former is a measure of how often a language is used (though, mainly for open source...

Read more »

2012-6 Working with the gridSVG Coordinate System

October 10, 2012
By

The gridSVG package exports grid images to an SVG format for viewing on the web. This article describes new features in gridSVG that allow grid coordinate system information to be exported along with the image. This allows the SVG image … Continue reading →

Read more »

What lens should I buy next ?; Analysing and graphing a Digikam database using R

October 10, 2012
By

I use the Open Source photo management Software Digikam (along with other tools such as Gimp and DarkTable).  I obviously need very little encouragement to combine my geeky hobbies, so I quickly tried to interrogate Digikam with R, which is easy, ...

Read more »

Summarizing Circular Data in R: Aspect Angle

October 10, 2012
By
Summarizing Circular Data in R: Aspect Angle

The orientation of terrain surface (aspect) can have dramatic effects on landscape-scale variation in soil temperature and moisture. Summarizing aspect angle is complicated by the fact that sampled values are measured on a circular scale. The circular ...

Read more »

Simple marimekko/mosaic plots

October 10, 2012
By
Simple marimekko/mosaic plots

I don’t really care for the name “marimekko” or “mosaic,” but I do like this type of plot as a means of illustrating proportions in nested categorical data, or as an alternative to the parallel time series plots discussed...

Read more »

S&P 500 sector strengths

October 10, 2012
By
S&P 500 sector strengths

Which sectors are coherent, and which aren’t? Previously The post “S&P 500 correlations up to date” looked at rolling mean correlations among stocks.  In particular it looked at rolling mean correlations of stocks within sectors. Of importance to this post is that the sectors used are taken from Wikipedia. Relative correlations The thought is that … Continue reading...

Read more »

Review: Kölner R Meeting 5 October 2012

October 10, 2012
By
Review: Kölner R Meeting 5 October 2012

The third Cologne R user meeting took place last Friday, 5 October 2012, at the Institute of Sociology. The evening was sponsored by Revolution Analytics, who provided funding which went towards the Kölner R user group Meetup page. We had a good tur...

Read more »

Exploring phylogenetic tree balance metrics

October 10, 2012
By
Exploring phylogenetic tree balance metrics

I need to simulate balanced and unbalanced phylogenetic trees for some research I am doing. In order to do this, I do rejection sampling: simulate a tree -> measure tree shape -> reject if not balanced or unbalanced enough. But what is enough? We ne...

Read more »

Making Color Ramps in Matlab

October 9, 2012
By

When visualizing an array of data in a heatmap, a good color map makes a world of difference.Thanks to my work in 'omics (i.e. transcriptomics - microarrays and RNASeq) I've looked at a lot of heatmaps over the past couple of years, and generated quite...

Read more »

A brief script on Geographical data analysis in R

A brief script on Geographical data analysis in R

I saw this post and I decided to replicated that good example but with data closer to me, particulary data of my country. So, I've got the shape data of the capital of my country (You can download the data from here). The data comes from the 2002 CENSO...

Read more »

Permanent Portfolio – Transaction Cost and better Risk Parity

October 9, 2012
By
Permanent Portfolio – Transaction Cost and better Risk Parity

I want to address comments that were asked in my last post, Permanent Portfolio – Simple Tools, about Permanent Portfolio strategy. Specifically: The impact of transaction costs on the perfromance and Create a modified version of risk allocation portfolio that distributes weights across 3 asset classes: stocks(SPY), gold(GLD), and treasuries(TLT), and only invests into cash(SHY)

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.