More on JSM

August 5, 2011
By

While my time at the 2011 Joint Statistical Meetings was short--I unfortunately missed some presentations I would have like to have attended--it was a great experience. The collection of academics and professionals is very different from the other con...

Read more »

Image Data from ImageJ to R and Vice Versa

August 5, 2011
By

In recent years many R packages have been developed to enable image analysis in R. As an alternative the combination of R with a powerful image analysis software like ImageJ offers many advanced image analysis interfaces and algorithms not yet available in R. Bio7 integrates both applications in a Rich Client Plattform based on Eclipse

Read more »

Outlier Detection with DPM Slides from JSM 2011

August 5, 2011
By
Outlier Detection with DPM Slides from JSM 2011

Here are the 14 slides I used during my talk at the Joint Statistical Meetings 2011: shotwell-jsm-2011.pdf. I'm trying hard to minimize the text in my presentation slides. But, this usually requires that I practice more. Hence, you will know which talks I have practiced thoroughly by the amount of text in the slides .

Read more »

Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing

August 5, 2011
By
Friday Links: R, OpenHelix Bioinformatics Tips, 23andMe, Perl, Python, Next-Gen Sequencing

I haven't posted much here recently, but here is a roundup of a few of the links I've shared on Twitter (@genetics_blog) over the last two weeks.Here is a nice tutorial on accessing high-throughput public data (from NCBI) using R and Bioconductor.Cloud...

Read more »

New Rcpp master classes scheduled for New York and San Francisco

August 4, 2011
By

Together with Revolution Analytics, I will be offering two more one-day classes on the Rcpp package for seamless integration of R and C++. The format will follow the workshop Romain and I gave during the tutorial day preceding this year's R/Financ...

Read more »

Aug 4, 2011 "plunge" headlines are in the air tonight

August 4, 2011
By
Aug 4, 2011 "plunge" headlines are in the air tonight

Today's financial headlines are littered with the word 'plunge.'  Considering today's (cl-cl) drop on the S&P500 was just about -5%, I don't know that I would exactly call that a plunge.         &nb...

Read more »

CHCN: Canadian Historical Climate Network

August 4, 2011
By
CHCN: Canadian Historical Climate Network

A reader asked a question about data from   environment canada.  He wanted to know if that data could somehow be integrated into the RGhcnV3 package.  That turned out to be a bit more challenging that I expected.  In short order I’d found a couple other people who had done something similar.  DrJ of course was

Read more »

Statisticians at JSM consider themselves "Data Scientists"

August 4, 2011
By
Statisticians at JSM consider themselves "Data Scientists"

At the JSM 2011 conference in Miami earlier this week, we conducted an informal poll of attendees on their attitudes to respect to Big Data, statistical software, and data science. JSM is the largest gathering of statisticians in North America, and attendees were invited to complete a survey after logging into the Wi-Fi network. Of the 190 respondents to...

Read more »

Lattice-xyplot without Border/Box, with Axes at Bottom & Left Side Only, with Custom Ablines/Grid & Axis-Labelling

August 4, 2011
By
Lattice-xyplot without Border/Box, with Axes at Bottom & Left Side Only, with Custom Ablines/Grid & Axis-Labelling

Here's how you do a lattice-xyplot without border/box, with axes at bottom & left side only, with custom ablines/grid & axis-labelling Read more »

Read more »

Does Jon Skeet have mental powers that make us upvote his answers? (The effect of reputation on upvotes)

August 4, 2011
By
Does Jon Skeet have mental powers that make us upvote his answers? (The effect of reputation on upvotes)

Of course since we all know Jon Skeet does have various powers, I will move onto unanswered questions, whether a users reputation makes them receive more upvotes for answers. I’ve seen this theory mentioned in multiple places (see any of the comments to Jon Skeet’s answer that are along the lines of “If this was

Read more »

Q-Q Plots for Multi-modal Performance Data

August 3, 2011
By
Q-Q Plots for Multi-modal Performance Data

I'm in the process of putting together some slides on how to apply Quantile-Quantile plots to performance data. Q-Q plots are a handy tool for visually inspecting how well your data matches a known probability distribution (prob dsn). If the match is g...

Read more »

Hotness

August 3, 2011
By
Hotness

We have an internal image that floated around work several years ago that details network utilization of TCP over a wide variety of configurations. It is a heatmap created in matlab that is just sweet, sweet eye candy. We actually hung it on the outside of a cube for a short while and people couldn't help but stop and...

Read more »

How Google uses R to make online advertising more effective

August 3, 2011
By

At JSM 2011 today, three Google employees (amongst the more than 20 Google delegates there) gave a little insight into how statistical analysis with R yields better results for companies using Google's various advertising products. Bill Heavlin from Google kicked off the session with a talk about conditional regression models, a statistical technique at Google used to evaluate the...

Read more »

A Bayesian Guessing Game

August 3, 2011
By
A Bayesian Guessing Game

You, the player, must think of some set, eg "odd numbers" or "perfect squares," and that'll be your little secret. Now think of some numbers that live in the intersection of your set and the integers {1, 2, ... , 100} -- for example, if you've chosen ...

Read more »

Faster files in R

August 3, 2011
By

R is fairly slow in reading files. read.table() is slow, scan() a bit faster, and readLines() fastest.But all these are nowhere as fast as other tools that scan through files. Let us look at an example. I have in front of me a 283M file.(Small update: the timings where off before. First because R hashes strings, one has to...

Read more »

Tomboy Notes: Personal R Help File

August 3, 2011
By
Tomboy Notes: Personal R Help File

When learning R it is helpful to have your own personal help file. One you create for yourself, with the notes, links, and language you understand (sometimes the help files are not very helpful). Let me introduce you to Tomboy Notes.Tomboy Notes is a l...

Read more »

Data Driven Story Discovery: Working Up a Multi-Layered Chart

August 3, 2011
By
Data Driven Story Discovery: Working Up a Multi-Layered Chart

How many different dimensions (or “columns” in a dataset where each row represents a different sample and each column a different measurement taken as part of that sample) can you plot on a chart? Two are obvious: X and Y values, which are ideal for representing continuous numerical variables. If you’re plotting points, as in

Read more »

WordPress WordCloud with R

August 3, 2011
By
WordPress WordCloud with R

These days one can frequently read about wordclouds created with R, initiated by the release of the wordcloud package by Ian Fellows on July 23rd. So here I am to put in my two cents. I thought about creating a wordcloud of a complete blog history, so I build a script that connects to a

Read more »

RTextTools v1.1 Released

A major upgrade of RTextTools has been released, including many optimizations, UI changes, and features based on feedback from the 2011 CAP Conference in Catania. Changes include the addition of a new low-memory algorithm GLMNET, full user documentation, simplification of the user interface, bundled datasets, better analytics for both virgin a

Read more »

Are students’ teaching evaluations influenced by instructors’ looks?

August 3, 2011
By
Are students’ teaching evaluations influenced by instructors’ looks?

Are students' teaching evaluations influenced by instructors' looks? ggplot2 may help find the answer.The recent release of RcmdrPlugin.KMggplot2 has made ggplot2 available to those who prefer GUI to the command line interface. With the new plugin for ...

Read more »

RcppArmadillo 0.2.28

August 2, 2011
By

Armadillo 2.2.1 came out today (and it looks like 2.2.0 was skipped, tst, tst). It has now been wrapped into release 0.2.28 of RcppArmadillo which is already on CRAN. The NEWS entry is below; a number of these changes were already in the preceding 0...

Read more »

Syntax highlighting of roxygen documentation in TextMate

August 2, 2011
By

With roxygen now on github, the release of roxygen2 and Hadley’s might now behind the project I expect roxygen to gain even more momentum. R development in TextMate is great with the R bundle. Unfortunately the R bundle does not support highlighting of roxygen documentation by default. That was always a sticking point for me

Read more »

Using Emacs to work with R

August 2, 2011
By
Using Emacs to work with R

A simple yet efficient way to work with R consists in writing R code with your favorite text editor and sending it to the R console. This allows to build efficient R code in an incremental fashion. A good editor might even provide syntax highlighting, parenthesis matching, and a way to send a selected portion of code to R

Read more »

With byte compiler, R 2.14 will be even faster

August 2, 2011
By

In a presentation at the JSM 2011 conference in Miami yesterday, R core member Luke Tierney revealed that the next major update to R, R 2.14, will feature improved speed when processing interpreted R code, thanks to standard use of the new byte compiler feature. The byte compiler was introduced with R 2.13, but while R developers could use...

Read more »

Summarizing Returns with R

August 2, 2011
By
Summarizing Returns with R

Often I like to see the performance of a trading strategy summarized annually, quarterly or by month. In R, we start off with the summary function: Given a series xx, usually a chunk of the original, this function returns the accumulative returns for the period. The leverage is useful to somewhat simulate leveraged ETFs. The

Read more »

Merging Two Different Datasets Containing a Common Column With R and R-Studio

August 2, 2011
By
Merging Two Different Datasets Containing a Common Column With R and R-Studio

Another way for the database challenged (such as myself!) for merging two datasets that share at least one common column… This recipe using the cross-platform stats analysis package, R. I use R via the R-Studio client, which provides an IDE wrapper around the R environment. So for example, here’s how to merge a couple of

Read more »

JSM 2011 [3]

August 2, 2011
By
JSM 2011 [3]

Monday August 01 was the first full day of JSM 2011 and full is the appropriate word to describe the day! It started for me at 7am with a round table run by Marc Suchard on parallel computing (or at 3am if I am considering the time I woke up!). I was rather out of

Read more »

Dividend Quartiles with Kenneth French Data

August 1, 2011
By
Dividend Quartiles with Kenneth French Data

Based on my perception of the last 3 years, I would have expected high dividend stocks to have substantially underperformed low and zero dividend stocks.  Fortunately, just like with size and momentum in Beating Kenneth French Small – High, we c...

Read more »

LaTeX Typesetting –Basic Mathematics

August 1, 2011
By
LaTeX Typesetting –Basic Mathematics

LaTeX is very strong for typesetting mathematical equations. Fast Tube by Casper Other useful resources are provided on the Supplementary Material page.

Read more »