Assignment operators in R: ‘=’ vs. ‘<-’

November 16, 2010
By
Assignment operators in R: ‘=’ vs. ‘<-’

In R, you can use  both ‘=’ and ‘<-’ as assignment operators. So what’s the difference between them and which one should you use? What’s the difference? The main difference between the two assignment operators is scope. It’s easiest to see the difference with an example: ##Delete x (if it exists) > rm(x) > mean(x=1:10)

Read more »

Data Science meets Humanities

November 16, 2010
By

There's an interesting article in the NYT today about the emerging discipline of "digital humanities": extracting digital data from historical archives to answer questions from the Arts and Humanities. From the article: Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start...

Read more »

In case you missed it: October Roundup

November 16, 2010
By

In case you missed them, here are some articles from October of particular interest to R users. Reviews of the winners and finalists of the 2010 ggplot2 case study competition. We have published a new article "R is Hot", with interviews from a dozen R users in industry and academia. A new code highlighting tool for displaying R code...

Read more »

Postdoc in Wharton

November 16, 2010
By
Postdoc in Wharton

Just received this email from José Bernardo about an exciting postdoc position in Wharton: POST-DOCTORAL FELLOW – DEPARTMENT OF STATISTICS, THE WHARTON SCHOOL The Department of Statistics at The Wharton School of the University of Pennsylvania is seeking candidates for a Post-Doctoral Fellowship. This research fellowship provides full funding without any teaching requirements at a

Read more »

Loops in R: Think different

November 15, 2010
By

Especially for programmers that come to R from other languages, R sometimes gets dinged about the speed of its for loops. But a lot of the time, where you might have needed an iterative loop in another language to solve a specific task, you don't need a for loop in R at all. Often, there's a pre-build function to...

Read more »

Example 8.14: generating standardized regression coefficients

November 15, 2010
By
Example 8.14: generating standardized regression coefficients

Standardized (or beta) coefficients from a linear regression model are the parameter estimates obtained when the predictors and outcomes have been standardized to have variance = 1. Alternatively, the regression model can be fit and then standardized ...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis:...

Read more »

Feature selection: All-relevant selection with the Boruta package

November 15, 2010
By
Feature selection: All-relevant selection with the Boruta package

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. There are two main approaches to selecting the features (variables) we will use for the analysis:...

Read more »

Isarithmic History of the Two-Party Vote

November 15, 2010
By
Isarithmic History of the Two-Party Vote

A few weeks ago, I shared a series of choropleth maps of U.S. presidential election returns, illustrating the relative support for Democratic, Republican, and third Party candidates since 1920. The granularity of these county level results led me to wonder whether it would be possible to develop an isarithmic map of presidential voting using the … Continue reading →

Read more »

Introducing Monte Carlo in PaRis

November 14, 2010
By
Introducing Monte Carlo in PaRis

As already announced on Statisfaction, I will start a short course in English based on Introducing Monte Carlo Methods with R at ENSAE next Tuesday. The slides were written by George Casella for a course he gave in Italy last spring and he kindly agreed on making them available on slideshare: Filed under:

Read more »

ZAT! 2010

November 13, 2010
By

Tomorrow is the last day to enjoy the first edition of Montpellier's ZAT! (Zones Artistiques Temporaires). I was there this afternoon and tonight, but I found it much more picture worthy tonight: Other people have also taken pictures and sha...

Read more »

Reporting Standard Errors for USL Coefficients

November 13, 2010
By

In a recent Guerrilla CaP Group discussion, Baron S. wrote:....BS> Using gnuplot against the dataset I gave, I get BS>    sigma   0.0207163 +/- 0.001323 (6.385%) BS>    kappa   0.000861226 +/- 5.414e-05 (6.287%) The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of...

Read more »

Reporting Standard Errors for USL Coefficients

November 13, 2010
By

In a recent Guerrilla CaP Group discussion, Baron S. wrote:....BS> Using gnuplot against the dataset I gave, I get BS>    sigma   0.0207163 +/- 0.001323 (6.385%) BS>    kappa   0.000861226 +/- 5.414e-05 (6.287%) The Gnuplot output includes the errors for each of the universal scalability law (USL) coefficients. A question about the magnitude of...

Read more »

My Day at ACM Data Mining Camp III

November 13, 2010
By
My Day at ACM Data Mining Camp III

My first time at ACM Data Mining Camp was so awesome, that I was thrilled the make the trip up to San Jose for the November 2010 version. In July, I gave a talk at the Emerging Technologies for Online Learning Symposium conference with a faculty member in the Department of Statistics, at the Fairmont. The place was amazing,...

Read more »

New R Users Group for University of Utah and Research Park

November 13, 2010
By

I’m organizing a new R Users Group for the University of Utah and Research Park sponsored by the Study Design and Biostatistics Center. We welcome all to come. The first meeting will be dedicated to finding out what users needs and abilities are. We also welcome all skill levels. But I will also give a

Read more »

Know any R blogs in your own language?

November 13, 2010
By
Know any R blogs in your own language?

A happy announcement Hello everyone. After playing around with the idea of extending R-bloggers to languages other then English, today I went a head and did it. The new sub-site can be found at: http://www.r-bloggers.com/lang/ So far it offers the content of only 3 4 bloggers, writing posts about R in Indonesian, Italian, Dutch and Korean. Asking for help As opposed to...

Read more »

Programming with R – Checking Data Types

November 13, 2010
By

There are a number of useful functions in R that test the variable type or convert between different variable types. These can be used to validate function input to ensure that sensible answers are returned from a function or to ensure that the function doesn’t fail. Following on from a previous post on a simple function

Read more »

Because it’s Friday: Asteroids

November 12, 2010
By

A huge mass of rock hurtling in from space could really make a mess of your weekend plans. So it's comforting to know that the world's astronomers are out there keeping an eye for any potential earth-grazers. See their discoveries over the past 30 years in this beautifully-designed animation: Earth crossers are in red; earth approachers are in yellow;...

Read more »

New R User Group in Cincinnati / Dayton

November 12, 2010
By

The latest local R user group to join the fold is CinDay RUG, serving the Cincinnati/Dayton area in Ohio. The group was founded by Stu Rodgers, who decided to set it up after posting a query on LinkedIn and finding several other R users in the area. Even if you think there's not enough likeminded folks in your area...

Read more »

Update: Forbes wants your R stories by Nov 17

November 12, 2010
By

I mentioned recently that Forbes is seeking stories about R for a forthcoming issue. Well, the story will now be in the December issue (bumped up from the January issue), so be sure to get your post your stories about R to the Mean Business blog by November 17. Forbes: Names You Need to Know in 2011: R Data...

Read more »

Risk-Opportunity Analysis

November 12, 2010
By
Risk-Opportunity Analysis

I will be attending Ralph Vince's risk-opportunity analysis workshop in Tampa this weekend.  Drop me a note if you're in the area and would like to meet for coffee / drinks.

Read more »

What would impressionnists do with R ?

November 12, 2010
By
What would impressionnists do with R ?

I've been playing with images recently, probably inspired from my trip in San Francisco. There was an exhibit at the De Young museum of fine arts with pieces borrowed from the Musée d'Orsay. I did not go to the exhibit because it is easy enough fo...

Read more »

Bayesian Inference for Latent Gaussian Models

November 12, 2010
By
Bayesian Inference for Latent Gaussian Models

An exciting conference in Zurich next February, 02-05. (I think I will attend! And not for skiing reasons!) Latent Gaussian models have numerous applications, for example in spatial and spatio-temporal epidemiology and climate modelling. This workshop brings together researchers who develop and apply Bayesian inference in this broad model class. One methodological focus is on

Read more »

Speeding up Optmatch while improving match quality

November 12, 2010
By

“Fast, cheap, correct: Pick two.” Does this phrase apply to statistical matching algorithms? In the case of Optmatch, you can have all three. “Cheap” is easy: it is open source. You can download it for free. Today I’m going to explain how to make the matching process both faster and more substantively relevant using a technique we call...

Read more »

How to calculate confidence intervals of correlations with R

November 11, 2010
By

This post sets out how to calculate confidence intervals for correlations using R. Because I often get this question from people unfamiliar with R, it assumes no prior knowledge of R. Formulas Online Statsbookhas formulas for calculating the confiden...

Read more »

How to calculate confidence intervals of correlations with R

November 11, 2010
By

This post sets out how to calculate confidence intervals for correlations using R. Because I often get this question from people unfamiliar with R, it assumes no prior knowledge of R. Formulas Online Statsbookhas formulas for calculating the confiden...

Read more »

RcppArmadillo 0.2.9

November 11, 2010
By

The new version 0.2.9 of RcppArmadillo has been uploaded to CRAN. The only change is an update of the included Armadillo template library to version 0.9.92 which Conrad released this week. RcppArmadillo makes it easy to write highly efficient and hi...

Read more »

Remembering on 11/11

November 11, 2010
By
Remembering on 11/11

Today is Veterans Day in the US, and Remembrance Day or Armistice Day elsewhere in the world. Whatever you know this day as, it's a day for remembering the sacrifices of those who served. Drew Conway has commemorated the day in a touching yet saddening way, by visualizing with R the sparse distribution of funds to support the many...

Read more »

Help Mozilla visualize how people use Firefox

November 11, 2010
By
Help Mozilla visualize how people use Firefox

You might recall we posted a couple of weeks ago this chart summarizing the times of the day Firefox users switch on Private Browsing mode: The chart, based on data from the Mozilla Test Pilot program tells an interesting story about the habits of Web users. But what other interesting stories could be told, to reveal more insights into...

Read more »