No Statistical Panacea, Hierarchical or Otherwise

February 13, 2013
By
No Statistical Panacea, Hierarchical or Otherwise

Everyone in academia knows how painful the peer-review publication process can be. It’s a lot like Democracy, in that it’s the worst system ever invented, except for all the others. The peer-review process does a fair job at promoting good … Continue reading →

Read more »

Apply Yourself !

February 13, 2013
By
Apply Yourself !

Hello. Welcome to my debut post ! Check the About link to see what this Blog intends to accomplish. In this article I discuss a general approach for dealing with the problem of splitting a data frame based on a grouping variable and then doing some more operations per group. A secondary goal is to

Read more »

Multiple Stocks Plot Shiny web application

February 13, 2013
By
Multiple Stocks Plot Shiny web application

Today, I want to share the Multiple Stocks Plot application (code at GitHub). This is the second application in the series of examples (I plan to share 5 examples) that will demonstrate the amazing Shiny framework and Systematic Investor Toolbox to analyze stocks, make back-tests, and create summary reports. The motivation for this series of

Read more »

In case you missed it: January 2103 Roundup

February 13, 2013
By

In case you missed them, here are some articles from January of particular interest to R users. Anthony Damico created an amusing and useful flowchart for finding resources for learning R, especially for survey analysis. All R users: please be counted for the 2013 Rexer Data Miner Survey (R was the #1 software reported in the last survey). Relatedly,...

Read more »

Out-of-sample one-step forecasts

February 13, 2013
By

It is common to fit a model using training data, and then to evaluate its performance on a test data set. When the data are time series, it is useful to compute one-step forecasts on the test data. For some reason, this is much more commonly done by people trained in machine learning rather than statistics. If you are...

Read more »

Mason Earles on interfacing R with the Forest Vegetation Simulator

February 13, 2013
By

Mason Earles gave a great presentation this week at Davis R Users’ Group about linking R with the Forest Vegetation Simulator (FVS). FVS is a model developed by the US Forest Service to simulate forest growth over time. It’s written in FORTRAN and has been around since the 1970s. FVS has recently gone open-source (its...

Read more »

Parallel execution of randomForestSRC

February 13, 2013
By
Parallel execution of randomForestSRC

I guess I’m the resident expert on resampling methods at work. I’ve been using bagged predictors and random forests for a while, and have recently been using the randomForestSRC (RF-SRC) package in R (http://cran.r-project.org/web/packages/randomForestSRC). This package merges the two randomForest… Continue reading →

Read more »

Large claims, and ratemaking

February 13, 2013
By
Large claims, and ratemaking

During the course, we have seen that it is natural to assume that not only the individual claims frequency can be explained by some covariates, but individual costs too. Of course, appropriate families should be considered to model the distribution of the cost , given some covariates .Here is the dataset we’ll use, > sinistre=read.table("http://freakonometrics.free.fr/sinistreACT2040.txt", + header=TRUE,sep=";") > sinistres=sinistre...

Read more »

A Shiny example – SAP HANA, R and Shiny

February 13, 2013
By
A Shiny example – SAP HANA, R and Shiny

As you may already know...I love R...a fancy, open source statistics programming language. So today, I decided to learn something new using R.There aren't much Web Servers for R, but there's one that I really like called Rook, that I covered on my blog...

Read more »

igraph degree distribution: count elements

February 13, 2013
By

Unfortunately, the degree.distribution() function of the igraph library returns the intensities of the distribution:> g > plot(g) > summary(g)IGRAPH U--- 10 10 -- Ring graphattr: name (g/c), mutual (g/x), circular (g/x) So instead of having the number of elements, the density/intensities value is returned:> degree.distribution(g) 0 0 1 You can easily verify this in the source code...

Read more »

Stadium / home team effects in making field goals

February 13, 2013
By
Stadium / home team effects in making field goals

We take on a reader question of whether the stadium / home team matters for making a field goal. We pulled up the data on every field goal since 2002 (over 10,000) of them and plotted the probability of scoring as a function of the stadium in which the field goal was kicked. The post Stadium / home...

Read more »

A must-read paper on statistical analysis of experimental data

February 13, 2013
By
A must-read paper on statistical analysis of experimental data

Russ Lyons points to an excellent article on statistical experimentation by Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, Ya Xu, a group of software engineers (I presume) at Microsoft. Kohavi et al. write: Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft . . . deployment and mining The post A...

Read more »

#14 A New GGPLOT Template

February 13, 2013
By
#14 A New GGPLOT Template

  So the opts() has now been given the boot, and all the cool kids are using theme() to customise their ggplots. If you’re still on an old version of R then theme() will still work, but if you update (which you should) then it’ll stop working and you’ll have to edit all your code

Read more »

Sharing my work for “Advanced Methods III”

February 13, 2013
By
Sharing my work for “Advanced Methods III”

This semester I’m taking the live version of the Data Analysis class by Jeff Leek. His more popular version of the course is available through Coursera.  One of the things that Jeff promotes is reproducibility and sharing code. I share that tendency and thus created a Git repository for my homework and code for the class: lcollado753. I’m...

Read more »

Single Stock Plot Shiny web application

February 12, 2013
By
Single Stock Plot Shiny web application

Today, I want to share the Single Stock Plot application (code at GitHub). This is the first application in the series of examples (I plan to share 5 examples) that will demonstrate the amazing Shiny framework and Systematic Investor Toolbox to analyze stocks, make back-tests, and create summary reports. The motivation for this series of

Read more »

Basic R: rows that contain the maximum value of a variable

February 12, 2013
By
Basic R: rows that contain the maximum value of a variable

File under “I keep forgetting how to do this basic, frequently-required task, so I’m writing it down here.” Let’s create a data frame which contains five variables, vars, named A – E, each of which appears twice, along with some measurements: Now, let’s say we want only the rows that contain the maximum values of

Read more »

Having fun with rgefx package and sigmajs in R

Having fun with rgefx package and sigmajs in R

The las week I knew the r package rgexf made by George Vega Yon. Rgexf is a R library to work with GEXF graph files. This type of files allow represent networks in a xml. So, if you have a list of nodes and a data frame of edges (source-target) you can obtain a gexf file with write.gexf...

Read more »

Of Meteorites and Men

February 12, 2013
By
Of Meteorites and Men

Hello Paleoposse! My name is Ryan Brown and I’m one of the newest victims bloggers here at the Paleocave. I made an appearance on Episode 134 where I talked a bit about meteorites and the asteroid mining company, Planetary Resources. I blog over at Glacial Till where, confusingly enough, I do not actually talk about glaciers.

Read more »

Another Experiment with R and Sweave

February 12, 2013
By

The R package PApages is a great start towards addressing the very common problem of internal and external reporting in the money management industry.  Advent's APX, Axys, and Black Diamond and the up and coming extremely well-connected and well-f...

Read more »

Using R — Package Installation Problems

February 12, 2013
By

This entry is part 3 of 12 in the series Using RThe post titled Installing Packages described the basics of package installation with R.  The process is wonderfully simple when everything goes well.  But it can be maddening when it …   read more ...

Read more »

Review: Kölner R Meeting 6 February 2013

February 12, 2013
By
Review: Kölner R Meeting 6 February 2013

The fourth Cologne R user meeting took place last Wednesday at the Institute of Sociology. Thanks to Bernd Weiß for hosting the event and Revolution Analytics for their sponsorship. We had two fantastic talks by Klaus Jacobi and M.eik Michalke. Klaus talked about Eliminating cloud pixels in satellite images via chronological interpolation and Meik presented...

Read more »

"Document Design and Purpose, Not Mechanics"

February 12, 2013
By

If you ever write code for scientific computing (chances are you do if you're here), stop what you're doing and spend 8 minutes reading this open-access paper:Wilson et al. Best Practices for Scientific Computing. arXiv:1210.0530 (2012). (Direct link t...

Read more »

What Analytic Software are People Discussing?

February 12, 2013
By
What Analytic Software are People Discussing?

by Robert A. Muenchen How can we measure the popularity or market share of analytic software? One way is to see what people are discussing. I’m in the process of updating my annual article, The Popularity of Data Analysis Software. Below … Continue reading →

Read more »

R for finance and other upcoming events

February 12, 2013
By
R for finance and other upcoming events

Featured R for Finance Workshop 2013 March 5-6 in London. The target audience are professionals and academics, who wish to learn the basics of the statistical software R and its use in Finance. The workshop is led by Ron Hochreiter, Pat Burns and Michael Sun. Details are on the Unicom website.  Please reference Burns Statistics … Continue reading...

Read more »

The Problem with Testing for Heteroskedasticity in Probit Models

February 12, 2013
By
The Problem with Testing for Heteroskedasticity in Probit Models

A friend recently asked whether I trusted the inferences from heteroskedastic probit models. I said no, because the heteroskedastic probit does not allow a researcher to distinguish between non-constant variance and a mis-specified mean function. In particular, my friend had a hypothesis that the variance of the latent outcome (commonly called "y-star") should increase with an

Read more »

R: Barplot with absolute and relative values

February 12, 2013
By
R: Barplot with absolute and relative values

In this short tutorial I will show how you can add the relative amount over the barplots, such that you have both, the absolute and relative Information in the plot. First, I create some artificial SNPs and TPMT-genotype.

Read more »

A handy concatenation operator

February 12, 2013
By
A handy concatenation operator

It may be useful for you to define a concatenation operator for characters. Sometimes, I find this is more intuitive and handy than using paste0 or paste. Also, it makes your code look better when you have nested paste, e.g.paste0("Y~",paste0("z",1:3, "*x",1:3,collapse="+"). The drawback is that it may reduce the readability of your code to other

Read more »

More visualisation of 2012 NFL Quarterback performance with R

February 12, 2013
By
More visualisation of 2012 NFL Quarterback performance with R

In last week’s post I used R heatmaps to visualise the performance of NFL Quarterbacks in 2012. This was done in a 2 step process, Clustering QB performance based on the 12 performance metrics using hierarchical clustering Plotting the performance clusters using R’s pheatmap library An output from the step 1 is the cluster dendrogram

Read more »

Compute the self excluded sample mean by group

February 12, 2013
By
Compute the self excluded sample mean by group

egen(stata cmd) compute a summary statistics by groups and store it in to a new variable. For example, the data has three variables, id, time and y, we want to compute the mean of y by for each id and then store it as a new variable mean_y. In stata, the command would be In

Read more »

Sponsors