## High Resolution Figures in R

March 12, 2013
By

As I was recently preparing a manuscript for PLOS ONE, I realized the default resolution of R and RStudio images are insufficient for publication. PLOS ONE requires 300 ppi images in TIFF or EPS (encapsulated postscript) format. In R plots … Continue reading →

## R 101

March 11, 2013
By

as.character() is your friendas.character() is your friend Sometimes when you open a data file (lets say a .csv), variables will be recognized as factor whereas it should be numeric. It is therefore tempting to simply convert the variable to numeric using as.numeric(). Big mistake! If...

## Simulating Random Multivariate Correlated Data (Categorical Variables)

March 11, 2013
By

This is a repost of the second part of an example that I posted last year but at the time I only had the PDF document (written in ). This is the second example to generate multivariate random associated data. This example shows how to generate ordinal, categorical, data. It is a little more complex than generating continuous

## Interview with Boulder BI Brain Trust

March 11, 2013
By

On Friday I traveled to Boulder, CO to update the Boulder BI Brain Trust on the latest news and updates from Revolution R Enterprise. While I was there, I was interviewed by BBBT president Claudia Imhoff. In a wide-ranging chat, we discussed: What's behind the Revolution Analytics momentum over the past year? How Business Intelligence relates to Data Science...

## Simulating Allele Counts in a population using R

March 11, 2013
By

This post is inspired by the Week 7 lectures of the Coursera course "Introduction to Genetics and Evolution" (I highly recommend this course for anyone interested in genetics, BTW.) Professor Noor uses a Univ Washington software called AlleleA1 for try...

## Reproducible Research at ENAR

March 11, 2013
By

I gave a talk at the Spring ENAR meetings this morning on some of the technical aspects of creating the book. The session was on reproducible research and the slides are here. I was dinged for not using git for version control (we used dropbox for simp...

## Lipsyncing for your life: a survival analysis of RuPaul’s Drag Race

March 11, 2013
By

If you follow me on Twitter, you know that I’m a big fan of RuPaul’s Drag Race. The transformation, the glamour, the sheer eleganza extravanga is something my life needs to interrupt the monotony of grad school. I was able to catch up on nearly four seasons in a little less than a month, and I’ve been watching the… Continue reading →

## Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation

March 11, 2013
By
$Veterinary Epidemiologic Research: Linear Regression Part 3 – Box-Cox and Matrix Representation$

In the previous post, I forgot to show an example of Box-Cox transformation when there’s a lack of normality. The Box-Cox procedure computes values of which best “normalises” the errors. value Transformed value of Y 2 1 0.5 0 -0.5 -1 -2 For example: The plot indicates a log transformation. Matrix Representation We can use

## Simulating Random Multivariate Correlated Data (Continuous Variables)

March 11, 2013
By

This is a repost of an example that I posted last year but at the time I only had the PDF document (written in ).  I’m reposting it directly into WordPress and I’m including the graphs. From time-to-time a researcher needs to develop a script or an application to collect and analyze data. They may also need

## Hexadecimal literals in GNU R

March 11, 2013
By

Recently I have used hexadecimal numbers in GNU R. The way they are parsed surprised me and is inconsistent with Java. As R Language Definition pdf only briefly mentions hexadecimal numbers here is what I have found.First I have checked the following c...

## FBit: GitHub repo for posts with R code for this blog

March 11, 2013
By

This is a test post since I want to improve upon Jeffrey Horner’s strategy for posting R code in Tumblr. The only minor improvement I wanted to try out is hosting the images directly on the web. I mean, right now the images won’t show in RSS readers. I’m not doing anything new at all, just using the...

## Discovering Argon with the 2-Sample t-Test

I learned about Lord Rayleigh’s discovery of argon in my 2nd-year analytical chemistry class while reading “Quantitative Chemical Analysis” by Daniel Harris.  (William Ramsay was also responsible for this discovery.)  This is one of my favourite stories in chemistry; it illustrates how diligence in measurement can lead to an elegant and surprising discovery.  I find

## Is CTA trend following Dead?

March 10, 2013
By

This i...

## More sequential testing for triangle tests

March 10, 2013
By

I looked before at triangle tests and at sequential testing in triangle tests (blog entry). In the latter post it was demonstrated that a sequential test is possible, without costs in desired error of the first kind. The latter because t...

## Analyse Quandl data with R – even from the cloud

March 10, 2013
By

I have read two thrilling news about the really promising time-series data provider called Quandl recently: Quandl: A Wikipedia for Time Series DataQuandl package released to CRANWith the help of the Quandl R package* (development version...

## Better logging in R (aka futile.logger 1.3.0 released)

March 10, 2013
By

In many languages logging is now part of the batteries included with a language. This isn’t yet the case in …Continue reading »

## Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government

March 10, 2013
By

Call for participation: DMApps 2013 – an International Workshop on Data Mining Applications in Industry and Government in conjunction with PAKDD 2013, Gold Coast, Australia, April 14, 2013 http://dmapps2013.rdatamining.com To attend the workshop, you need to register for PAKDD 2013 … Continue reading →

## Notes on my R / Git workflow

March 10, 2013
By

These are some notes on my current R git work flow, which is quite fluid, and git has enough quirks that I usually forget part of it ! Creating Projects I've used both RStudio and Eclipse.  RStudio seems easier to create a 'project' and add a loca...

## Calculating Custom Fantasy Football Projections for Your League using R

March 9, 2013
By

In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and NFL.com.  In this post, I will demonstrate how to take the projected points from these sources and The post Calculating Custom Fantasy Football Projections for Your League using R appeared first on Fantasy Football Analytics.

## Calculating Custom Fantasy Football Projections for Your League using R

March 9, 2013
By

In prior posts, I have shown how to download fantasy football projections from ESPN, CBS, and NFL.com.  In this post, I will demonstrate how to take the projected points from these sources and calculate the projected points for your custom league ...

## Getting flexible with SAP HANA

Most of you might not be aware of a feature introduced on SAP HANA SPS5. This new feature is called "Flexible Tables", which means that you can define a table that will grow depending on your needs. Let's see an example...You define a table with ID, NA...

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #4)

March 9, 2013
By

More news about MCMSki IV! Remember, the call is still open for contributed sessions for a few more weeks, till March. 20 to be precise (make sure to contact me at [email protected] if you are considering putting one session together). To all those who already submitted a session, thanks a lot, please stay tuned, and

## Analyzing Monthly Expenses with a Pareto Chart

March 9, 2013
By

This month, ASQ CEO Paul Borawski encourages us to share stories about “quality solutions in unexpected places.” This is such a fun question, because now I’ll be noticing these unexpected gems all

March 9, 2013
By

## The Gambling Machine Puzzle

March 9, 2013
By

This puzzle came up in the New York Times Number Play blog. It goes like this: An entrepreneur has devised a gambling machine that chooses two independent random variables x and y that are uniformly and independently distributed between 0 and 100. He plans to tell any customer the value of x and to ask him

## GSOC 2013: IID Assumptions in Performance Measurement

March 9, 2013
By

Google Summer of Code for 2013 has been announced and organizations such as R are beginning to assemble ideas for student projects this summer. If you’re an interested student, there’s a list of project proposals on the R wiki. If you’re considering being a mentor, post a project idea on the site soon – project

## Visualizing Risky Words — Part 2

March 9, 2013
By

This is a follow-up to my Visualizing Risky Words post. You’ll need to read that for context if you’re just jumping in now. Full R code for the generated images (which are pretty large) is at the end. Aesthetics are the primary reason for using a word cloud, though one can pretty quickly recognize what

## Analyzing SimplyStatistics visits info

March 9, 2013
By

Recently we had to analyze the data of the number of visits per day to SimplyStatistics.org. There were two goals: Estimate the fraction of visitors retained after a spike in the number of visitors Identify (if any) any factors that influence the fraction estimated in 1. For me it was a fun project in part because I like SimplyStatistics but also...

## A bit more on sample size

March 8, 2013
By

In our article What is a large enough random sample? we pointed out that if you wanted to measure a proportion to an accuracy “a” with chance of being wrong of “d” then a idea was to guarantee you had a sample size of at least: This is the central question in designing opinion polls Related posts: