New Rcpp page on upcoming events — including Master Class in New York

February 5, 2013
By

Lots of exciting things are happening with and around Rcpp. I just added a new page about Upcoming Events to the recently-created Rcpp site. This events page has lots to cover: an upcoming talk at Columbia on March 8 (details still TBD), a day-lon...

Read more »

MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By
MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

Read more »

2011 Census Open Atlas Project

February 5, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for

Read more »

Tables from R into Word

February 5, 2013
By
Tables from R into Word

A good looking table matters! This tutorial is on how to create a neat table in Word by combining knitr and R Markdown. I'll be using my own function, htmlTable, from the Gmisc package. Background: Because most journals that I submit to want...

Read more »

Proposed techniques for communicating the amount of information contained in a statistical result

February 5, 2013
By
Proposed techniques for communicating the amount of information contained in a statistical result

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

Read more »

2011 Census Open Atlas Project

February 5, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

Read more »

Next Kölner R User Meeting: 6 February 2013

February 5, 2013
By
Next Kölner R User Meeting: 6 February 2013

Quick reminder: The next Cologne R user group meeting is scheduled for tomorrow, 6 February 2013. All details and the agenda are available on the KölnRUG Meetup site. Please sign up if you would like to come along. Notes from the last Cologne R user group meeting are available here. Thanks also to...

Read more »

Tracking Number of Historical Clusters in DOW 30 and S&P 500

February 4, 2013
By
Tracking Number of Historical Clusters in DOW 30 and S&P 500

In the Tracking Number of Historical Clusters post, I looked at how 3 different methods were able to identify clusters across the 10 major asset universe. Today, I want to share the impact of clustering on the larger universe. Below I examined the historical time series of number of clusters in the DOW 30 and

Read more »

Visualizing networks in R: arc diagrams and hive plots

February 4, 2013
By
Visualizing networks in R: arc diagrams and hive plots

Arc diagrams are an alternate way of representing two-dimensional graphs. Rather than scattering the nodes across the page connected by straight edges, you can instead arrange the nodes along a one-dimensional axis, and replace the straight edges with arcs between the nodes. While an arc diagram might not give as good a sense of the connections between the nodes...

Read more »

2011 Census Open Atlas Project

February 4, 2013
By
2011 Census Open Atlas Project

This month has seen the release of the 2011  census data for England and Wales at Output Area Level. This offers the possibility to map various attributes about people and places for very small geographic areas. Output Areas represent the most detailed geography for which Census data are released and are the building blocks for many popular products...

Read more »

Convenience Sample, SRS, and Stratified Random Sample Compared

February 4, 2013
By
Convenience Sample, SRS, and Stratified Random Sample Compared

In class today we were discussing several types of survey sampling and we split into groups and did a little investigation. We were given a page of 100 rectangles with varying areas and took 3 samples of size 10. Our first was a convenience sample. We...

Read more »

Help needed with sample selection biases

February 4, 2013
By

We are searching for a graduate student to assist us on a very short assignment about sample selection biases and Heckman Probit models. The help is not needed for estimating the models, but instead for reviewing the scenarios where the use of such models is theoretically appropriate or otherwise. For instance, we are particularly interested in determining if Heck...

Read more »

Generating Labels for Supervised Text Classification using CAT and R

February 4, 2013
By
Generating Labels for Supervised Text Classification using CAT and R

The explosion in the availability of text has opened new opportunities to exploit text as data for research. As Justin Grimmer and Brandon Stewart discuss in the above paper, there are a number of approaches to reducing human text to … Continue reading →

Read more »

Landmine detection revisited; the inverse unicorn problem

February 4, 2013
By
Landmine detection revisited; the inverse unicorn problem

A couple weeks ago I wrote about an interesting idea to clear landmines using the power of the wind. A reader asked me to comment more on the value of using these wind-powered “Kafons” to do an initial assay of a suspected minefield, an idea I mentioned at the end of my video on the

Read more »

An infelicity with Value at Risk

February 4, 2013
By
An infelicity with Value at Risk

More risk does not necessarily mean bigger Value at Risk. Previously “The incoherence of risk coherence” suggested that the failure of Value at Risk (VaR) to be coherent is of little practical importance. Here we look at an attribute that is not a part of the definition of coherence yet is a desirable quality. Thought … Continue reading...

Read more »

analyze the survey of income and program participation (sipp) with r

February 4, 2013
By

if the census bureau's budget was gutted and only one complex sample survey survived, pray it's the survey of income and program participation (sipp).  it's giant.  it's rich with variables.  it's monthly.  it follows households over three, four, now five year panels.  the congressional budget office uses it for their health insurance simulation.  analysts read that sipp has...

Read more »

Proposed techniques for communicating the amount of information contained in a statistical result

February 4, 2013
By
Proposed techniques for communicating the amount of information contained in a statistical result

A couple of weeks ago, I posted about how much we can expect to learn about the state of the world on the basis of a statistical significance test. One way of framing this question is: if we’re trying to come to scientific conclusions on the basis of statistical results, how much can we update

Read more »

Data Visualization for Education

February 3, 2013
By

Recently I was invited to give a talk to two cohorts of Strategic Data Project fellows. I was asked to speak about using data visualization to help inform decision-making of policy makers. At the same time, the group had a lot of variation in their int...

Read more »

A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

February 3, 2013
By
A Grid Search for The Optimal Setting in Feed-Forward Neural Networks

The feed-forward neural network is a very powerful classification model in the machine learning content. Since the goodness-of-fit of a neural network is majorly dominated by the model complexity, it is very tempting for a modeler to over-parameterize the neural network by using too many hidden layers or/and hidden units. As pointed out by Brian

Read more »

Japanese Government Bonds (JGB) Total Return Series

February 3, 2013
By
Japanese Government Bonds (JGB) Total Return Series

In a follow up to Yen and JGBs Short-Term vs Long Term and a series of posts on Japan, I thought the Bloomberg article "Japan Pension Fund’s Bonds Too Many If Abe Succeeds, Mitani Says" was particularly interesting.  It is difficult to find a to...

Read more »

An Example of Seasonality Analysis

February 3, 2013
By
An Example of Seasonality Analysis

Today, I want to demonstrate how easy it is to create a seasonality analysis study and produce a sample summary report. As an example study, I will use S&P Annual Performance After a Big January post by Avondale Asset Management. The first step is to load historical prices and find Big Januaries. All the hard

Read more »

The Rcpp Gallery and my <em>Seinfeld Streak</em>

February 3, 2013
By

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Ha...

Read more »

The Rcpp Gallery and my Seinfeld Streak

February 3, 2013
By

A good three weeks ago, we introduced the Rcpp Gallery. While this is a joint effort by several of us on the Rcpp team, the backend was conceived and implemented entirely by JJ who also bootstrapped it with same first content, drawing on posts by Ha...

Read more »

Checking validation statistics (Monitor function 030220139)

February 3, 2013
By

(This article was first published on NIR-Quimiometria, and kindly contributed to R-bloggers) To leave a comment for the author, please follow the link and comment on their blog: NIR-Quimiometria. R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL,...

Read more »

Clustering using dynamic tree cut

February 3, 2013
By
Clustering using dynamic tree cut

Summary: Two methods for hierarchical clustering are introduced: (i) dynamic tree cut; and (ii) dynamic hybrid cut. Dynamic tree cut is a top-down algorithm that relies solely on the dendrogram. The algorithm implements an adaptive, iterative process of cluster decomposition … Continue reading →

Read more »

Arc Diagrams in R: Les Miserables

February 3, 2013
By
Arc Diagrams in R: Les Miserables

In this post we will talk about the R package “arcdiagram” for plotting pretty arc diagrams like the one below: Arc Diagrams An arc diagram is a graphical display to visualize graphs or networks in a one-dimensional layout. The main idea is to display nodes along a single axis, while representing the edges or connections … Continue reading...

Read more »

XLConnect 0.2-4

February 3, 2013
By
XLConnect 0.2-4

Mirai Solutions GmbH (http://www.mirai-solutions.com) is very pleased to announce the release of XLConnect 0.2-4, which is available from CRAN. This newest release comes along with a number of new features: Ability to read cached cell values. There is a new … Continue reading →

Read more »

For descriptive statistics, values below LLOQ set to …

February 3, 2013
By
For descriptive statistics, values below LLOQ set to …

That is what I read the other day. For calculation of descriptive statistics, values below the LLOQ (lower limit of quantification)  were set to.... Then I wondered, wasn't there a trick in JAGS to incorporate the presence of missing data while es...

Read more »

A package for agricultural statistic: FAOSTAT

February 3, 2013
By

After 8 years of using R, today I finally become a contributor to the community and released my first package, FAOSTAT.The package is designed to provide user with direct access to the FAOSTAT data base via R and to support the...

Read more »

Sponsors