March 2013

Using Norms to Understand Linear Regression

March 22, 2013 | John Myles White

Introduction In my last post, I described how we can derive modes, medians and means as three natural solutions to the problem of summarizing a list of numbers, \((x_1, x_2, \ldots, x_n)\), using a single number, \(s\). In particular, we measured the quality of different potential summaries in three ... [Read more...]

Split, Apply, and Combine for ffdf

March 22, 2013 | inkhorn82

Call me incompetent, but I just can’t get ffdfdply to work with my ffdf dataframes.  I’ve tried repeatedly and it just doesn’t seem to work!  I’ve seen numerous examples on stackoverflow, but maybe I’m applying them incorrectly.  Wanting to do some … Continue reading → [Read more...]

Are you a Type I or Type II Data Scientist?

March 22, 2013 | mhornick

The role of Data Scientist has been getting a lot of attention lately. Brendan Tierney's blog post titled Type I and Type II Data Scientists adds an interesting perspective by defining and characterizing two key types of Data Scientist, both of which are needed in an organization. Tierney writes about ... [Read more...]

Modes, Medians and Means: A Unifying Perspective

March 22, 2013 | John Myles White

Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that situation ... [Read more...]

Plotting lm and glm models with ggplot #rstats

March 22, 2013 | Daniel

Update I followed the advice from Tim’s comment and changed the scaling in the sjPlotOdds-function to logarithmic scaling. The screenshots below showing the plotted glm’s have been updated. Summary In this posting I will show how to plot results from … Weiterlesen → [Read more...]

Maximum Sharpe Portfolio

March 21, 2013 | systematicinvestor

Maximum Sharpe Portfolio or Tangency Portfolio is a portfolio on the efficient frontier at the point where line drawn from the point (0, risk-free rate) is tangent to the efficient frontier. There is a great discussion about Maximum Sharpe Portfolio or Tangency Portfolio at quadprog optimization question. In general case, finding ... [Read more...]

workshop a Padova

March 21, 2013 | xi'an

Needless to say, it is with great pleasure I am back in beautiful Padova for the workshop Recent Advances in statistical inference: theory and case studies, organised by Laura Ventura and Walter Racugno. Esp. when considering this is one of the last places I met with George Casella, in June 2010. ... [Read more...]

Using R: Correlation heatmap with ggplot2

March 21, 2013 | mrtnj

Just a short post to celebrate that I learned today how incredibly easy it is to make a heatmap of correlations with ggplot2 (and reshape2, of course). So, what is going on in that short passage? cor makes a correlation matrix with all the pairwise correlations between variables (twice; plus ... [Read more...]

RMark: data.table merge vs core merge

March 21, 2013 | Xachriel

This is the third post concerning fast merging in R, first here and second here. This time we are going to look at how the merge function from data.table package works in our case, requested by Uwe Block. As a reminder the first post concerns doing a... [Read more...]

R’s Garden of Probability Distributions

March 21, 2013 | Joseph Rickert

by Joseph Rickert If you type ?Distributions at the R console you get a list of the 21 probability distributions included in the stats package that ships with base R. The same list appears in the Introduction to R Manual on CRAN and in most of the many fine introductory books ... [Read more...]

RserveCLI2, a .net client for Rserve

March 20, 2013 | Suraj Gupta

RserveCLI is a .net/cli client for Rserve, created by Oliver M. Haynold. Oliver has done a great job with this project. I forked this project to add features, fix bugs, and do some restructuring. I thought it was a significant enough depature to cre... [Read more...]

NCAA Basketball Visualization

March 20, 2013 | Andy

It is time for the NCAA Basketball Tournament. Sixty-four teams dream big (er…I mean 68…well actually by now, 64) and schools like Iona and Florida Gulf Coast University (go Eagles!) are hoping that Robert Morris astounding victory in the N.I.T. … Continue reading → [Read more...]

Normalized Frequency of Terrorism in the US

March 20, 2013 | Frank Portman

I’ve been using the Global Terrorism Database a lot lately so I decided to share an interesting plot I made with the data. The GTD provides over 100,000 observations of terrorist incidents between 1970 and 2011. Of these, there are about 2400 observations in the USA. While this is not a large number, ... [Read more...]

Violin plots and regional income distribution

March 20, 2013 | Michael kao

While preparing my slides for statistical graphics, a plot really caught my eye when I was playing around with the data. I started off by plotting the time seriesof GNI per capita by country, and as expected it got quite messy and incomprehensible.
## Download and manipulate the data<br>library(FAOSTAT)<br>raw.lst = getWDItoSYB(indicator = c("NY.GNP.PCAP.CD", "SP.POP.TOTL"))<br>raw.df = raw.lst[["entity"]]<br>traw.df = translateCountryCode(raw.df, from = "ISO2_WB_CODE", to = "UN_CODE")<br>mraw.df = merge(traw.df, FAOregionProfile[, c("UN_CODE", "UNSD_MACRO_REG")])<br>final.df = mraw.df[!is.na(mraw.df$UNSD_MACRO_REG), ]<br><br>## Simple ugly time series plot<br>ggplot(data = final.df, aes(x = Year, y = NY.GNP.PCAP.CD)) +<br>    geom_line(aes(col = Country)) +<br>    labs(x = NULL, y = "GNI per capita")<br>
So I decided to compute the ... [Read more...]
1 3 4 5 6 7 14

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)