## Convergence and Asymptotic Results

September 24, 2015
$\overline{X}_n\ \xrightarrow{\text{a.s.}}\ \mathbb{E}(X)$

Last week, in our mathematical statistics course, we’ve seen the law of large numbers (that was proven in the probability course), claiming that given a collection  of i.i.d. random variables, with To visualize that convergence, we can use > m=100 > mean_samples=function(n=10){ + X=matrix(rnorm(n*m),nrow=m,ncol=n) + return(apply(X,1,mean)) + } > B=matrix(NA,100,20) > for(i in 1:20){ + B=mean_samples(i*10) + } > colnames(B)=as.character(seq(10,200,by=10)) > boxplot(B) It is...

## Rentrez 1.0 released

September 24, 2015
A new version of rentrez, our package for the NCBI's EUtils API, is making it's way around the CRAN mirrors. This release represents a substantial improvement to rentrez, including a new vignette that documents the whole package. This posts describes some of the new things in rentrez, and gives us a chance to thank some of the people that have contributed to...

## Chinese R conference

September 24, 2015
I will be speaking at the Chinese R conference in Nanchang, to be held on 24–25 October, on “Forecasting Big Time Series Data using R”. Details (for those who can read Chinese) are at china-r.org.

## Running Back and Wide Receiver Gold Mining – Week 3

September 23, 2015
The graphs below summarize the projections from a variety of sources. This week’s summary includes projections from: CBS: CBS Average, Yahoo Sports, NFL, FOX Sports, NumberFire, FantasySharks, ESPN and FantasyFootballNerd. The post Running Back and Wide Receiver Gold Mining – Week 3 appeared first on Fantasy Football Analytics.

## subsetting data in ggtree

September 23, 2015
Subsetting is commonly used in ggtree as we would like to for example separating internal nodes from tips. We may also want to display annotation to specific node(s)/tip(s). Some software may stored clade information (e.g. bootstrap value) as internal node labels. Indeed we want to manipulate such information and taxa labels separately. Read More: 962 Words...

## Are you headed to Strata? It’s next week!

September 23, 2015
RStudio will again teach the new essentials for doing (big) data science in R at this year’s Strata NYC conference, September 29 2015 (http://strataconf.com/big-data-conference-ny-2015/public/schedule/detail/44154).  You will learn from Garrett Grolemund, Yihui Xie, and Nathan Stephens who are all working on fascinating new ways to keep the R ecosystem apace of the challenges facing those who work with data. Topics include: R Quickstart: Wrangle,

## Interpolation and smoothing functions in base R

September 23, 2015
by Andrie de Vries Every once in a while I try to remember how to do interpolation using R. This is not something I do frequently in my workflow, so I do the usual sequence of finding the appropriate help page: ?interpolate Help pages: stats::approx Interpolation Functions stats::NLSstClosestX Inverse Interpolation stats::spline Interpolating Splines So, the help tells me to...

## Kasseler useR Group: Data Science and Networking

September 23, 2015
From October, the Kasseler useR Group meeting will be held on the second Wednesday of each month at 6.30 pm. The events will take place at Science Park Kassel. The Kasseler useR Group supports active exchange of information between R users. Discussions about experiences with R and news of R are appreciated as well as

## Using mutate from dplyr inside a function: getting around non-standard evaluation

September 23, 2015
To edit or add columns to a data.frame, you can use mutate from the dplyr package: Here, dplyr uses non-standard evaluation in finding the contents for mpg and wt, knowing that it needs to look in the context of… See more ›

## Simulating backtests of stock returns using Monte-Carlo and snowfall in parallel

September 23, 2015
You could say that the following post is an answer/comment/addition to Quintuitive, though I would consider it as a small introduction to parallel computing with snowfall using the thoughts of Quintuitive as an example. A quick recap: Say you create a model that is able to forecast 60% of market directions (that is, in 6

## Fitting a neural network in R; neuralnet package

September 23, 2015
Neural networks have always been one of the most fascinating machine learning model in my opinion, not only because of the fancy backpropagation algorithm, but also because of their complexity (think of deep learning with many hidden layers) and structure inspired by the brain. Neural networks have not always been popular, partly because they were,

## Post-doc Researcher in Big Data Analytics!

September 23, 2015
DESPINA Big Data Lab at the Department of Economics and

## More on the Heteroscedasticity Issue

September 22, 2015
In my last post, I dsciussed R software, including mine, that handles heteroscedastic settings for linear and nonlinear regression models. Several readers had interesting comments and questions, which I will address here. To review: Though most books and software assume homoscedasticity, i.e. constancy of the variance of the response variable at all levels of the … Continue reading...

## Version 0.9.0 of eeptools released!

September 22, 2015
A long overdue overhaul of my eeptools package for R was released to CRAN today and should be showing up in the mirrors soon. The release notes for this version are extensive as this represents a modernization of the package infrastructure and the reim...

## How do you know if your model is going to work?

September 22, 2015
Authors: John Mount (more articles) and Nina Zumel (more articles). Our four part article series collected into one piece. Part 1: The problem Part 2: In-training set measures Part 3: Out of sample procedures Part 4: Cross-validation techniques “Essentially, all models are wrong, but some are useful.” George Box Here’s a caricature of a data … Continue reading...

## How do you know if your model is going to work? Part 4: Cross-validation techniques

September 22, 2015
by John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this...

## Parsing a large amount of characters into a POSIXct object

September 22, 2015
When trying to parse a large amount of datetime characters into POSXIct objects, it struck me that strftime and as.POSIXct where actually quite slow. When using the parsing functions from lubridate, these where a lot faster. The following benchmark shows… See more ›

## Drug Interaction Studies – Statistical Analysis

September 22, 2015
This post is actually a continuation of the previous post, and is motivated by this article that discusses the graphics and statistical analysis for a two treatment, two period, two sequence (2x2x2) crossover drug interaction study of a new treatment versus the standard. Whereas the previous post was devoted to implementing some of the graphics

## Rummaging through dusty books: Maucha diagrams in R

September 22, 2015
Do you know the Maucha diagram? If you are not an Hungarian limnologist, probably not! This diagram was proposed by Rezso Maucha in 1932 as a way to vizualise the relative ionic composition of water samples. However, as far I … Lire la suite →

## Upcoming talks in California

September 22, 2015
I’m back in California for the next couple of weeks, and will give the following talk at Stanford and UC-Davis. Optimal forecast reconciliation for big time series data Time series can often be naturally disaggregated in a hierarchical or grouped structure. For example, a manufacturing company can disaggregate total demand for their products by country of

## Notes from the Kölner R meeting, 18 September 2015

September 22, 2015
Last Friday the Cologne R user group came together for the 15th time. Since its inception over three years ago the group evolved from a small gathering in a pub into an active data science community, covering wider topics than just R. Still, R is the link and clue between the different interests. Last Friday's agenda was a...

## How do you know if your model is going to work? Part 4: Cross-validation techniques

September 21, 2015
Authors: John Mount (more articles) and Nina Zumel (more articles). In this article we conclude our four part series on basic model testing. When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it’s better than the models that … Continue reading...

## EARL London 2015: Our Highlights

September 21, 2015
We were overwhelmed by the positive comments from attendees at last week’s EARL conference in London. We are in the process of collecting survey responses from all delegates, but in the meantime a quick straw poll at Mango … Continue reading →

## Applications of R at EARL 2015

September 21, 2015
The Effective Applications of R (EARL) Conference (held last week in London) is well-named. At the event I saw many examples of R being used to solve real-world industry problems with advanced statistics and data visualization. Here are just a few examples: AstraZeneca, the pharmaceutical company, uses R to design clinical trials, and to predict the ending date of...

## 4 new R jobs (from R-users.com ; 2015-09-21)

September 21, 2015
4 new R jobs (from R-users.com ; 2015-09-21)

## Warsaw Meetings of R Users / Warszawskie Spotkania Entuzjastów R

September 20, 2015
With the summer holiday season coming to an end, we are back with Warsaw Meetings of R Users (Warszawskie Spotkania Entuzjastów R). Three meetings ahead: September 26 th (this Saturday) – let’s start with data-hack-day (DHD). Having data from Polish Seym (votes and transcripts), we are going to prepare some nice summaries of last cadency. … Czytaj dalej...

## Working With SEM Keywords in R

September 20, 2015
The following post is taken from two previous posts from an older blog of mine that is no longer available. These are from several years ago, and related to two critical questions that I encountered. One, how can I automatically generate hundreds of thousands of keywords for a search engine marketing campaign. Two, how can I

## Six lines to install and start SparkR on Mac OS X Yosemite

September 20, 2015
I know there are many R users who like to test out SparkR without all the configuration hassle. Just these six lines and you can start SparkR from both RStudio and command line. One line for Spark and SparkR Apache Spark is a fast and general-purpose cluster computing system SparkR is an R package that...