Boxplots are a good way to get some insight in your data, and while R provides a fine ‘boxplot’ function, it doesn’t label the outliers in the graph. However, with a little code you can add labels yourself:The numbers plotted next to ...

I teach some of my lab sections using R, and so I need to create lab handouts that include nicely formatted R commands and R output as an example for the students. These handouts will also include exercises where the students will be writing their own R code, or interpreting the results, or generating figures.

As most people realize, this is probably one of the most data-rich primary campaigns in history, with hundreds of professional pollsters poring over every data-point trying to understand voter’s intention. So here is another data-rich post to that end. I was glad to discover the University of California at Santa Barbara’s webpage with tons of high-quality data related to the...

Previously in this series Cleaning and visualizing genomic data: a case study in tidy analysis In the last post, we examined an available genomic dataset from Brauer et al 2008 about yeast gene expression under nutrient starvation. We learned to tidy it with the dplyr and tidyr packages, and saw how useful this tidied form is for visualizing...

In my course Learn to Map Census Data in R I provide people with a handful of interesting demographics to analyze. This is convenient for teaching, but people often want to search for other demographic statistics. To address that, today I will work through an example of starting with a simple demographic question and using R The post

Last month I ran my first webinar (“Make a Census Explorer with Shiny”). About 100 people showed up, and feedback from the participants was great. I also had a lot of fun myself. Because of this, I’ve decided to do one more webinar before my free trial with the webinar service ends. Here are the The post

by Joseph Rickert In his new book, The Master Algorithm, Pedro Domingos takes on the heroic task of explaining machine learning to a wide audience and classifies machine learning practitioners into 5 tribes*, each with its own fundamental approach to learning problems. To the 5th tribe, the analogizers, Pedro ascribes the Support Vector Machine (SVM) as it's master algorithm....

Last week, in our mathematical statistics course, we’ve seen the law of large numbers (that was proven in the probability course), claiming that given a collection of i.i.d. random variables, with To visualize that convergence, we can use > m=100 > mean_samples=function(n=10){ + X=matrix(rnorm(n*m),nrow=m,ncol=n) + return(apply(X,1,mean)) + } > B=matrix(NA,100,20) > for(i in 1:20){ + B=mean_samples(i*10) + } > colnames(B)=as.character(seq(10,200,by=10)) > boxplot(B) It is...

e-mails with the latest R posts.

(You will not see this message again.)