## New Rcpp page on upcoming events — including Master Class in New York

February 5, 2013
By

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-lon...

## MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

## 2011 Census Open Atlas Project

February 5, 2013
By

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for

## Tables from R into Word

February 5, 2013
By

A good looking table matters! This tutorial is on how to create a neat table in Word by combining knitr and R Markdown. I'll be using my own function, htmlTable, from the Gmisc package. Background: Because most journals that I submit to want...

## Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
By
$Proposed techniques for communicating the amount of information contained in a statistical result$

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

## 2011 Census Open Atlas Project

February 5, 2013
By

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

## Next Kölner R User Meeting: 6 February 2013

February 5, 2013
By

Quick reminder: The next Cologne R user group meeting is scheduled for tomorrow, 6 February 2013. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to...

## Tracking Number of Historical Clusters in DOW 30 and S&P 500

February 4, 2013
By

In the Tracking Number of Historical Clusters post, I looked at how 3 different methods were able to identify clusters across the 10 major asset universe. Today, I want to share the impact of clustering on the larger universe. Below I examined the historical time series of number of clusters in the DOW 30 and

## Visualizing networks in R: arc diagrams and hive plots

February 4, 2013
By

Arc diagrams are an alternate way of representing two-dimensional graphs. Rather than scattering the nodes across the page connected by straight edges, you can instead arrange the nodes along a one-dimensional axis, and replace the straight edges with arcs between the nodes. While an arc diagram might not give as good a sense of the connections between the nodes...

## 2011 Census Open Atlas Project

February 4, 2013
By

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

## Convenience Sample, SRS, and Stratified Random Sample Compared

February 4, 2013
By

In class today we were discussing several types of survey sampling and we split into groups and did a little investigation. We were given a page of 100 rectangles with varying areas and took 3 samples of size 10. Our first was a convenience sample. We...

## Help needed with sample selection biases

February 4, 2013
By

We are searching for a graduate student to assist us on a very short assignment about sample selection biases and Heckman Probit models. The help is not needed for estimating the models, but instead for reviewing the scenarios where the use of such models is theoretically appropriate or otherwise. For instance, we are particularly interested in determining if Heck...

## Generating Labels for Supervised Text Classification using CAT and R

February 4, 2013
By

The explosion in the availability of text has opened new opportunities to exploit text as data for research. As Justin Grimmer and Brandon Stewart discuss in the above paper, there are a number of approaches to reducing human text to … Continue reading →

## Landmine detection revisited; the inverse unicorn problem

February 4, 2013
By

A couple weeks ago I wrote about an interesting idea to clear landmines using the power of the wind. A reader asked me to comment more on the value of using these wind-powered “Kafons” to do an initial assay of a suspected minefield, an idea I mentioned at the end of my video on the

## An infelicity with Value at Risk

February 4, 2013
By

More risk does not necessarily mean bigger Value at Risk. Previously “The incoherence of risk coherence” suggested that the failure of Value at Risk (VaR) to be coherent is of little practical importance. Here we look at an attribute that is not a part of the definition of coherence yet is a desirable quality. Thought … Continue reading...

## analyze the survey of income and program participation (sipp) with r

February 4, 2013
By

if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp).  it's giant.  it's rich with variables.  it's monthly.  it follows households over three, four, now five year panels.  the congressional budget office uses it for their health insurance simulation.  analysts read that sipp has...

## Proposed techniques for communicating the amount of information contained in a statistical result

February 4, 2013
By
$Proposed techniques for communicating the amount of information contained in a statistical result$

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

## Data Visualization for Education

February 3, 2013
By

Recently I was invited to give a talk to two cohorts of Strategic Data Project fellows. I was asked to speak about using data visualization to help inform decision-making of policy makers. At the same time, the group had a lot of variation in their int...

## A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

February 3, 2013
By

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it is very tempting for a modeler to over-parameterize the neural network by using too many hidden layers or/and hidden units. As pointed out by Brian

## Japanese Government Bonds (JGB) Total Return Series

February 3, 2013
By

In a follow up to Yen and JGBs Short-Term vs Long Term and a series of posts on Japan, I thought the Bloomberg article "Japan Pension Fund’s Bonds Too Many If Abe Succeeds, Mitani Says" was particularly interesting.  It is difficult to find a to...

## An Example of Seasonality Analysis

February 3, 2013
By

Today, I want to demonstrate how easy it is to create a seasonality analysis study and produce a sample summary report. As an example study, I will use S&P Annual Performance After a Big January post by Avondale Asset Management. The first step is to load historical prices and find Big Januaries. All the hard

## The Rcpp Gallery and my <em>Seinfeld Streak</em>

February 3, 2013
By

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Ha...

## The Rcpp Gallery and my Seinfeld Streak

February 3, 2013
By

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Ha...

## Checking validation statistics (Monitor function 030220139)

February 3, 2013
By

(This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

## Clustering using dynamic tree cut

February 3, 2013
By

Summary: Two methods for hierarchical clustering are introduced: (i) dynamic tree cut; and (ii) dynamic hybrid cut. Dynamic tree cut is a top-down algorithm that relies solely on the dendrogram. The algorithm implements an adaptive, iterative process of cluster decomposition … Continue reading →

## Arc Diagrams in R: Les Miserables

February 3, 2013
By

In this post we will talk about the R package “arcdiagram” for plotting pretty arc diagrams like the one below: Arc Diagrams An arc diagram is a graphical display to visualize graphs or networks in a one-dimensional layout. The main idea is to display nodes along a single axis, while representing the edges or connections … Continue reading...

## XLConnect 0.2-4

February 3, 2013
By

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-4, which is available from CRAN. This newest release comes along with a number of new features: Ability to read cached cell values. There is a new … Continue reading →

## For descriptive statistics, values below LLOQ set to …

February 3, 2013
By

That is what I read the other day. For calculation of descriptive statistics, values below the LLOQ (lower limit of quantification)  were set to.... Then I wondered, wasn't there a trick in JAGS to incorporate the presence of missing data while es...