(This article was first published on ** Revolutions**, and kindly contributed to R-bloggers)

by Joseph Rickert

The Joint Statistical Meetings (JSM) get underway this weekend in Boston and Revolution Analytics is again proud to be a sponsor. More than 6,000 statisticians and data scientists from around the world are expected to attend and listen to thousands of presentations. It is true that many talks will be on specialized topics that only statisticians working in particular a field will have the interest and patience to sit through. However, there is evidence that the conference will have something exciting to offer data scientists and statisticians working in industry. Keyword searches yield 79 presentations for Big Data, 29 on Machine Learning, 17 on Data Science, 17 on Data Mining and 19 related to R. There is more than enough here to fill a data scientist’s dance card.

Three must-see presentations under the Big Data keyword are: Michael Franklin's presentation on Analyzing Data at Scale with the Berkeley Data Analytics Stack; Hui Jiang et al. on Implementation of Statistical Algorithms in Big Data Platforms and Tim Hesterberg's talk on Simulation-Based Methods in Statistics Education, and Google Tools. Under the Data Science label, Bill Ruh’s invited talk Industrial Internet, an Opportunity for Statisticians to Become Data Scientists looks most inviting. There are also quite a few Data Science talks that indicated some soul searching within the academic community as to how the statistics curriculum ought to be changed. See, for example, Michael Rappa’s talk on Data Scientists: How Do We Prepare for the Future? and Johanna Hardin’s talk: Data Science and Statistics: How Should They Fit into Our Curriculum?

Here is the list of R related presentations:

**Saturday, August 2**

- 8:00 AM - 12:00 PM: Adaptive Tests of Significance Using R and SAS — Professional Development Continuing Education Course ASA Instructor: Tom O'Gorman

**Sunday, August 3**

- 8:30 AM - 5:00 PM: Adaptive Methods in Modern Clinical Trials — Professional Development Continuing Education Course ASA , Biometrics Section Instructors: Frank Bretz, Byron Jones, and Guosheng Yin
- 4:20 PM: Glassbox: An R Package for Visualizing Algorithmic Models: Max Ghenis and Ben Ogorek and Estevan Flores
- 4:45 PM: Bayesian Enrollment and Event Predictions in Clinical Trials Leveraging Literature Data: Aijun Gao and Fanni Natanegara and Govinda Weerakkody

**Monday, August 4**

- 8:55 AM: Thinking with Data in the Second Course: Nicholas J. Horton and Ben S. Baumer and Hadley Wickham
- 8:30 AM to 10:20 AM: Do You See What I See? Formal Usability Testing and Statistical Graphics: Marie C. Vendettuoli and Matthew Williams and Susan Ruth VanderPlas
- 8:35 AM: Preparing Students for Big Data Using R and Rstudio: Randall Pruim
- 8:35 AM: Does R Provide What Customer Need?: Vipin Arora
- 8:55 AM: Doing Reporducible Research Unconscously: Higher Standard, but Less Work: Yihui Xie
- 12:30 PM: to 1:50 PM: Analyzing Umpire Performance Using PITCHf/x: Andrew Swift
- 3:30 PM: The Perfect Bracket: Machine Learning in NCAA Basketball: Sara Stoudt and Loren Santana and Ben S. Baumer

**Tuesday, August 5**

- 10:35 AM: Tools for Teaching R and Statistics Using Games Brad Luen and Michael Higgins
- 2:00 PM: Multiple Treatment Groups: A Case Study with Health Care Practice and Policy Implications Alexandra Hanlon and Karen Hirschman and Beth Ann Griffin and Mary Naylor
- 2:05 PM: glmmplus: An R Package for Messy Longitudinal Data Ben Ogorek and Caitlin Hogan
- 3:30 PM: Give Me an Old Computer, a Blank DVD, and an Internet Connection and I'll Give You World-Class Analytics Ty Henkaline

**Wednesday, August 6**

- 9:35 AM: Testing Packages for the R Language: Stephen Kaluzny and Lou Bajuk-Yorgan
- 9:50 AM: Using R Analytics on Streaming Data: Lou Bajuk-Yorgan and Stephen Kaluzny
- 10:35 Shiny: Easy Web Applications in R:Joseph Cheng
- 10:30 AM to 12:20 PM: Classroom Demonstrations of Big Data: Eric A. Suess
- 11:00 AM: ggvis: Moving Toward a Grammar of Interactive Graphics: Hadley Wickham
- 3:05 PM: Accessing Data from the Census Bureau API: Alex Shum and Heike Hofmann

**Thursday, August 7**

- 9:20 AM: Predicting Dangerous E. Coli Levels at Erie, Pennsylvania, Beaches with Random Forests in R: Michael Rutter
- 9:25 AM: Beyond the Black Box: Flexible Programming of Hierarchical Modeling Algorithms for BUGS-Compatible Models Using NIMBLE: Perry de de Valpine and Daniel Turek and Christopher J. Paciorek and Rastislav Bodik and Duncan Temple Lang

If you are going to JSM please come by booth #303 to say hello. You may also find the mobile apps (Apple or Android) that Revolution Analytics is sponsoring useful, and don't forget to fill out the survey for a chance to win an Apple TV.

Finally, I will be the program chair for Session 401, Monte Carlo Methods to be held Tuesday, 8/5/2014, from 2:00 PM to 3:50 PM in room CC-101. If you are interested in simulation be sure to drop in. I have seen the presentations and think they are well worth attending.

To **leave a comment** for the author, please follow the link and comment on his blog: ** Revolutions**.

R-bloggers.com offers

(This article was first published on ** RStudio Blog**, and kindly contributed to R-bloggers)

httr 0.4 is now available on CRAN. The httr packages makes it easy to talk to web APIs from R.

The most important new features are two new vignettes to help you get started and to help you make wrappers for web APIs. Other important improvements include:

- New
`headers()`

and`cookies()`

functions to extract headers and cookies from responses.`status_code()`

returns HTTP status codes. `POST()`

(and`PUT()`

, and`PATCH()`

) now have an`encode`

argument that determine how the`body`

is encoded. Valid values are “multipart”, “form” or “json”, and the`multipart`

argument is now deprecated.`GET(..., progress())`

will display a progress bar, useful if you’re doing large uploads or downloads.`verbose()`

gives you considerably more control over degree of verbosity, and defaults have been selected to be more helpful for the most common cases.- NULL
`query`

parameters are now dropped automatically.

There are number of other minor improvements and bug fixes, as described by the release notes.

To **leave a comment** for the author, please follow the link and comment on his blog: ** RStudio Blog**.

R-bloggers.com offers

(This article was first published on ** TorinoR.net**, and kindly contributed to R-bloggers)

On **17 September 2014** – 14:00 there will be a **free R for data analysis course** and starting at 16:30 there will be the **Eighth Torino R net meeting**. Events will take place at **Campus Luigi Einaudi**, Università degli Studi di Torino.

Torino R net will be a satellite event of the Italian conference on excellence in quality, statistical control and customer satisfaction. Turin, Italy, September 18-19, 2014

To **leave a comment** for the author, please follow the link and comment on his blog: ** TorinoR.net**.

R-bloggers.com offers

(This article was first published on ** TorinoR.net**, and kindly contributed to R-bloggers)

Presentations of the seventh Torino R net meeting are now available on line, section Downloads.

Thank you to all who attended the meeting on Thursday 27th March in Asti and special thanks to presenters.

To **leave a comment** for the author, please follow the link and comment on his blog: ** TorinoR.net**.

R-bloggers.com offers

(This article was first published on ** Paleocave Blog » R**, and kindly contributed to R-bloggers)

I was recently writing a function which was going to need to deal with NAs in some kind of semi-intelligent way. I wanted to test it with some fake data, meaning that I was going to need a vector with some random NAs sprinkled in. After a few disappointing google searches and a stack overflow post or two that left something to be desired, I sat down, thought for a few minutes, and came up with this.

#create a vector of random values

foo <- rnorm(n=100, mean=20, sd=5)

#randomly choose 15 indices to replace #this is the step in which I thought I was clever #because I use which() and %in% in the same line ind <- which(foo %in% sample(foo, 15))

#now replace those indices in foo with NA foo[ind]<-NA

#here is our vector with 15 random NAs foo

Not especially game changing but more elegant than any of the solutions I found on the interwebs, so there it is FTW.

To **leave a comment** for the author, please follow the link and comment on his blog: ** Paleocave Blog » R**.

R-bloggers.com offers

(This article was first published on ** Xi'an's Og » R**, and kindly contributed to R-bloggers)

**S**econd day at the Indo-French Centre for Applied Mathematics and the workshop. Maybe not the most exciting day in terms of talks (as I missed the first two plenary sessions by (a) oversleeping and (b) running across the campus!). However I had a neat talk with another conference participant that led to [what I think are] interesting questions… (And a very good meal in a local restaurant as the guest house had not booked me for dinner!)

**T**o wit: given a target like

the simulation of λ can be demarginalised into the simulation of

where **z** is a latent (and artificial) variable. This means a Gibbs sampler simulating λ given z and z given λ can produce an outcome from the target (*). Interestingly, another completion is to consider that the z_{i}‘s are U(0,y_{i}) and to see the quantity

as an unbiased estimator of the target. What’s quite intriguing is that the quantity remains the same but with different motivations: (a) demarginalisation versus unbiasedness and (b) z_{i} Exp(λ) versus z_{i} U(0,y_{i}). The stationary is the same, as shown by the graph below, the core distributions are [formally] the same, … but the reasoning deeply differs.

**O**bviously, since unbiased estimators of the likelihood can be justified by auxiliary variable arguments, this is not in fine a big surprise. Still, I had not though of the analogy between demarginalisation and unbiased likelihood estimation previously.**H**ere are the R procedures if you are interested:

n=29 y=rexp(n) T=10^5 #MCMC.1 lam=rep(1,T) z=runif(n)*y for (t in 1:T){ lam[t]=rgamma(1,shap=2,rate=1+sum(z)) z=-log(1-runif(n)*(1-exp(-lam[t]*y)))/lam[t] } #MCMC.2 fam=rep(1,T) z=runif(n)*y for (t in 1:T){ fam[t]=rgamma(1,shap=2,rate=1+sum(z)) z=runif(n)*y }

Filed under: pictures, R, Running, Statistics, Travel, University life, Wines Tagged: auxiliary variable, Bangalore, demarginalisation, Gibbs sampler, IFCAM, Indian Institute of Science, unbiased estimation

To **leave a comment** for the author, please follow the link and comment on his blog: ** Xi'an's Og » R**.

R-bloggers.com offers

(This article was first published on ** Econometrics by Simulation**, and kindly contributed to R-bloggers)

Several blog posts have made mention of the 'magrittr' package which allows functional arguments to be passed to functions in a pipes style fashion (David Smith ).

This stylistic option has several advantages:

1. Reduced requirements of nested parenthesizes

2. Order of functional operations now read from left to right

3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

Created by Pretty R at inside-R.org

This stylistic option has several advantages:

1. Reduced requirements of nested parenthesizes

2. Order of functional operations now read from left to right

3. Organizational style of the code may be improved

The library uses a new operator %>% which basically tells R to take the value of that which is to the left and pass it to the right as an argument. Let us see this in action with some text functions.

require('magrittr')

# Let's play with some strings

str1 = "A scratch? Your arm's off."

str2 = "I've had worse."

str1 %>% substr(3,9)

#[1]Evaluates to "scratch"

str1 %>% strsplit('?',fixed=TRUE)

#[[1]]

#[1] "A scratch" " Your arm's off."

# Pipes can be chained as well

str1 %>% paste(str2) %>% toupper()

# [1] "A SCRATCH? YOUR ARM'S OFF. I'VE HAD WORSE."

# Let's see how pipes might work with drawing random variables

# I like to define a function that allows an element by element maximization

vmax <- function(x, maximum=0) x %>% cbind(0) %>% apply(1, max)

-5:5 %>% vmax

# [1] 0 0 0 0 0 0 1 2 3 4 5

# This is identical to defining the function as:

vmax <- function(x, maximum=0) apply(cbind(x,0), 1, max)

vmax(-5:5)

# Notice that the latter formation uses the same number of parenthsize

# and be more readable.

# However recently I was drawing data for a simulation in which I wanted to

# draw Nitem values from the quantiles of the normal distribution, censor the

# values at 0 and then randomize their order.

Nitem <- 100

ctmean <- 1

ctsd <- .5

draws <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%

qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem)

# While this looks ugly, let's see how worse it would have been without pipes

draws <- sample(vmax(qnorm(seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)]

,ctmean,ctsd)),Nitem)

# Both functional sequences are ugly though I think I prefer the first which

# I can easily read as seq is passed to qnorm passed to vmax passed to sample

# A few things to note with the %>% operator. If you want to send the value to

# an argument which is not the first or is a named value, use the '.'

mydata <- seq(0, 1, length.out = Nitem+2)[-c(1,Nitem+2)] %>%

qnorm(ctmean,ctsd) %>% vmax %>% sample(Nitem) %>%

data.frame(index = 1:Nitem , theta = .)

# Also not that the operator is not as slow as you might think it should be.

# Thus:

1 + 8 %>% sqrt

# Returns 3.828427

# Rather than

(1 + 8) %>% sqrt

# [1] 3

To **leave a comment** for the author, please follow the link and comment on his blog: ** Econometrics by Simulation**.

R-bloggers.com offers

(This article was first published on ** Revolutions**, and kindly contributed to R-bloggers)

R is a very powerful language for creating custom data visualizations, but during the development process sometimes you make a mistake and things go horribly wrong. But sometime serendipity intervenes, and the (unintended) result can be quite beautiful. Accidental aRt, if you will. Curated by Kara Woo and Erika Mudrak, this fantastic Tumblr captures beautiful but unintended examples from R and other data visualization tools. Here are a couple of recent favourites (click for the originals):

I could definitely see some of these hanging on my wall. (Hmm ... I have an idea for this impressive-looking gadget.) Check out the full gallery at the link below.

Tumblr: Accidental aRt

To **leave a comment** for the author, please follow the link and comment on his blog: ** Revolutions**.

R-bloggers.com offers