## Reproducible Research at ENAR

March 11, 2013
I gave a talk at the Spring ENAR meetings this morning on some of the technical aspects of creating the book. The session was on reproducible research and the slides are here. I was dinged for not using git for version control (we used dropbox for simp...

## Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race

March 11, 2013
If you follow me on Twitter, you know that I’m a big fan of RuPaul’s Drag Race. The transformation, the glamour, the sheer eleganza extravanga is something my life needs to interrupt the monotony of grad school. I was able to catch up on nearly four seasons in a little less than a month, and I’ve been watching the… Continue reading →

## Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation

March 11, 2013
$Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation$

In the previous post, I forgot to show an example of Box-Cox transformation when there’s a lack of normality. The Box-Cox procedure computes values of which best “normalises” the errors. value Transformed value of Y 2 1 0.5 0 -0.5 -1 -2 For example: The plot indicates a log transformation. Matrix Representation We can use

## Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need

## Hexadecimal literals in GNU R

March 11, 2013
Recently I have used hexadecimal numbers in GNU R. The way they are parsed surprised me and is inconsistent with Java. As R Language Definition pdf only briefly mentions hexadecimal numbers here is what I have found.First I have checked the following c...

## FBit: GitHub repo for posts with R code for this blog

March 11, 2013
This is a test post since I want to improve upon Jeffrey Horner’s strategy for posting R code in Tumblr. The only minor improvement I wanted to try out is hosting the images directly on the web. I mean, right now the images won’t show in RSS readers. I’m not doing anything new at all, just using the...

## Discovering Argon with the 2-Sample t-Test

I learned about Lord Rayleigh’s discovery of argon in my 2nd-year analytical chemistry class while reading “Quantitative Chemical Analysis” by Daniel Harris.  (William Ramsay was also responsible for this discovery.)  This is one of my favourite stories in chemistry; it illustrates how diligence in measurement can lead to an elegant and surprising discovery.  I find

## Is CTA trend following Dead?

March 10, 2013
This i...

## More sequential testing for triangle tests

March 10, 2013
I looked before at triangle tests and at sequential testing in triangle tests (blog entry). In the latter post it was demonstrated that a sequential test is possible, without costs in desired error of the first kind. The latter because t...

## Analyse Quandl data with R – even from the cloud

March 10, 2013
I have read two thrilling news about the really promising time-series data provider called Quandl recently: Quandl: A Wikipedia for Time Series DataQuandl package released to CRANWith the help of the Quandl R package* (development version...

## Better logging in R (aka futile.logger 1.3.0 released)

March 10, 2013
In many languages logging is now part of the batteries included with a language. This isn’t yet the case in …Continue reading »

## Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government

March 10, 2013
Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government in conjunction with PAKDD 2013, Gold Coast, Australia, April 14, 2013 http://dmapps2013.rdatamining.com To attend the workshop, you need to register for PAKDD 2013 … Continue reading →

## Notes on my R / Git workflow

March 10, 2013
These are some notes on my current R git work flow, which is quite fluid, and git has enough quirks that I usually forget part of it ! Creating Projects I've used both RStudio and Eclipse.  RStudio seems easier to create a 'project' and add a loca...

## Calculating Custom Fantasy Football Projections for Your League using R

March 9, 2013
In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and NFL.com.  In this post, I will demonstrate how to take the projected points from these sources and The post Calculating Custom Fantasy Football Projections for Your League using R appeared first on Fantasy Football Analytics.

## Calculating Custom Fantasy Football Projections for Your League using R

March 9, 2013
In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and NFL.com.  In this post, I will demonstrate how to take the projected points from these sources and calculate the projected points for your custom league ...

## Getting flexible with SAP HANA

Most of you might not be aware of a feature introduced on SAP HANA SPS5. This new feature is called "Flexible Tables", which means that you can define a table that will grow depending on your needs. Let's see an example...You define a table with ID, NA...

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #4)

March 9, 2013
More news about MCMSki IV! Remember, the call is still open for contributed sessions for a few more weeks, till March. 20 to be precise (make sure to contact me at [email protected] if you are considering putting one session together). To all those who already submitted a session, thanks a lot, please stay tuned, and

## Analyzing Monthly Expenses with a Pareto Chart

March 9, 2013
This month, ASQ CEO Paul Borawski encourages us to share stories about “quality solutions in unexpected places.” This is such a fun question, because now I’ll be noticing these unexpected gems all

March 9, 2013
## The Gambling Machine Puzzle

March 9, 2013
This puzzle came up in the New York Times Number Play blog. It goes like this: An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him

## GSOC 2013: IID Assumptions in Performance Measurement

March 9, 2013
Google Summer of Code for 2013 has been announced and organizations such as R are beginning to assemble ideas for student projects this summer. If you’re an interested student, there’s a list of project proposals on the R wiki. If you’re considering being a mentor, post a project idea on the site soon – project

## Visualizing Risky Words — Part 2

March 9, 2013
This is a follow-up to my Visualizing Risky Words post. You’ll need to read that for context if you’re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what

## Analyzing SimplyStatistics visits info

March 9, 2013
Recently we had to analyze the data of the number of visits per day to SimplyStatistics.org. There were two goals: Estimate the fraction of visitors retained after a spike in the number of visitors Identify (if any) any factors that influence the fraction estimated in 1. For me it was a fun project in part because I like SimplyStatistics but also...

## A bit more on sample size

March 8, 2013
In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls Related posts:

## R vs. Perl/mySQL – an applied genomics showdown

March 8, 2013
R vs. Perl/mySQL - an applied genomics showdown Recently I was given an assignment for a class I'm taking that got me thinking about speed in R. This isn't something I'm usually concerned with, but the first time I tried to run my solution (ussing plyr's ddply() it was going to take all night to compute. I consulted the professor that taught...

## Quandl package released to CRAN

March 8, 2013
In a guest post here on February 20, Tammer Kamel introduced us to Quandl, a kind of "wikipedia" of time series data. In the post, Tammer (the founder of Quandl) noted that they were working on an R package to give R users access to Quandl as a data source. That package is now available. It includes the Quandl...

## Comparing quantiles for two samples

March 8, 2013
Recently, for a research paper, I some samples, and I wanted to compare them. Not to compare they means (by construction, all of them were centered) but there dispersion. And not they variance, but more their quantiles. Consider the following boxplot type function, where everything here is quantile related (which is not the case for standard boxplot, see http://freakonometrics.hypotheses.org/4138,...

## Data Visualization: Shiny Democratization

March 8, 2013
In organizing Data Visualization DC we focus on three themes: The Message, The Process, The Psychology. In other words, ideas and examples of what can be communicated, the tools and know-how to get it done, and how best to communicate. … Continue reading → The post Data Visualization: Shiny Democratization appeared first on Data Community DC.

## Publishing Stats for Analytic Reuse – FAOStat Website and R Package

March 8, 2013
How can stats and data publishers, from NGOs and (inter)national statistics agencies to scientific researchers, publish their data in a way that supports its analysis directly, as well as in combination with other datasets? Here’s one approach I learned about from Michael Kao of the UN Food and Agriculture Organisation statistics division, FAOStat. At first