High-Powered Statistical Computing On the iPad

December 17, 2012
By
High-Powered Statistical Computing On the iPad

It's winter break… and as any academic knows, breaks are “a good time to get work done.” For the Christmas break, many of us have to travel home to see family members. One of the great privileges of being an academic is that you don't necessarily need to be in your office to get research

Read more »

Fractional Logit Model with Python

December 16, 2012
By
Fractional Logit Model with Python

Read more »

Dark matter top 10, but an hour too late

December 16, 2012
By
Dark matter top 10, but an hour too late

Well, that’s embarrassing. A little tweak to my dark matter model resulted in a leaderboard score in the top 10. The only problem is that the contest closed about an hour ago. I ran this final prediction earlier today but then simply forgot to go back to it and submit!! On the bright side, I

Read more »

Quick Start R

December 16, 2012
By
Quick Start R

I outlined some personal suggestions for those interested in learning R. Refer to the new page  titled “Quick Start R“. …Continue reading »

Read more »

Taxonomy with R: Exploring the Taxize-Package

December 16, 2012
By
Taxonomy with R: Exploring the Taxize-Package

First off, I'd really like to give a shout-out to the brave people who have created and maintain this great package - the fame is yours!So, while exploring the capabilities of the package some issues with the ITIS-Server arose and with large datasets things weren't working out quite well for me.I then...

Read more »

The Eye of the World as word cloud

December 16, 2012
By
The Eye of the World as word cloud

The Eye of the World is the first book of Robert Jordan's Wheel of Time books. As the last of these books will be published soon, I was wondering if natural language processing can be used to examine books like these. For this purpose I downloaded a co...

Read more »

Matrix Algebra Useful for Statistics

December 16, 2012
By
Matrix Algebra Useful for Statistics

I was having a conversation with an acquaintance about courses that were particularly useful in our work. My forestry degree involved completing 50 compulsory + 10 elective† courses; if I had to choose courses that were influential and/or really useful they would be Operations Research, Economic Evaluation of Projects, Ecology, 3 Calculus and 2 Algebras.

Read more »

Possibly slightly better text analysis with lme4

December 16, 2012
By
Possibly slightly better text analysis with lme4

lme4 and its cousin arm are extremely useful for a huge variety of modeling applications (see Gelman and Hill’s book), but today we’re going to do something a little frivolous with them. Namely, we’re going to extend our Denver Debat...

Read more »

The R journal – Volume 4/2, December 2012

December 16, 2012
By

Download complete issue Refereed articles may be downloaded individually using the links below. Table of Contents Editorial 3   Contributed Research Articles What’s in a Name? Paul Murrell 5 It’s Not What You Draw,It’s What You Don’t Draw Paul Murrell 13 Debugging grid Graphics Paul Murrell and Velvet Ly 19 frailtyHL: A Package for Fitting...

Read more »

Making Data Visually Appealing

December 16, 2012
By
Making Data Visually Appealing

I’ve recently been considering the graphical presentation of data. I get the feeling that we, ecologists/scientsits, could be better at data presentation. Graphs must be informative, but they don’t have to be ugly. I think that making visually appealing charts … Continue reading →

Read more »

Building R packages: missing path to pdflatex

December 15, 2012
By
Building R packages: missing path to pdflatex

Recently whiling trying to build an R package for generalized estimating equation model selection (QICpack on github), I was getting an error related to latex creating the PDF package manuals. It seems like this is a relatively common problem on … Continue reading →

Read more »

Data Science, Data Analysis, R and Python

The October 2012 issue of Harvard Business Review prominently features the words “Getting Control of Big Data” on the cover, and the magazine includes these three related articles:“Big Data: The Management Revolution,” by Andrew McAfee and Erik Brynjolfsson, pages 61 – 68;“Data Scientist: The Sexiest Job of the 21st Century,” by Thomas H. Davenport and D.J. Patil, pages...

Read more »

Text analysis made too easy with the tm package

December 15, 2012
By
Text analysis made too easy with the tm package

Today’s Gist takes the CNN transcript of the Denver Presidential Debate, converts paragraphs into a document-term matrix, and does the absolute most basic form of text analysis: a raw word count. There are actually quite a few steps in this proc...

Read more »

Le Monde puzzle (#800)

December 14, 2012
By
Le Monde puzzle (#800)

Here is the mathematical puzzle of the weekend edition of Le Monde: Consider a sequence where the initial number is between 1 and 10³, and each term in the sequence is derived from the previous term as follows: if the last digit of the previous term is between 6 and 9, multiply it by 9;

Read more »

Predictive models in R: a new book in Polish

December 14, 2012
By
Predictive models in R: a new book in Polish

Together with Mateusz Zawisza I have just published a new book in Polish on building predictive models in GNU R. It can be bought at Oficyna Wydawnicza SGH. The book presents complete examples of basic data mining processes.Although the book is in Poli...

Read more »

d3, Shiny, and R Reporting Performance

December 14, 2012
By
d3, Shiny, and R Reporting Performance

I thought it would be interesting to offer a little different example of how we can use d3, R, and Rstudio Shiny.  This time we will offer a simple example to report portfolio or index performance.  Just as a test of my progress, I also threw...

Read more »

2D MODPATH particle tracking animations with R and ImageMagick

December 14, 2012
By
2D MODPATH particle tracking animations with R and ImageMagick

The PMPATH particle tracking output, with a file format similar to the pathline output mode of MODPATH (see above), can be transformed easily into a GIF animation using R and ImageMagick (see below for a simple example).First of all, you...

Read more »

Revolution Newsletter: December 2012

December 14, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full December edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Tell us what you're looking for in R training. 2013 is the International Year...

Read more »

R Journal Volume 4/2, December 2012

December 14, 2012
By
R Journal Volume 4/2, December 2012

The 'Winter edition' of the R Journal is out! Get it from here.

Read more »

What is Correctness for Statistical Software?

December 14, 2012
By
What is Correctness for Statistical Software?

Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the following: This data set has

Read more »

How I learned to stop worrying and really love lists

December 14, 2012
By
How I learned to stop worrying and really love lists

One of the first weird things to get used to in R is unlearning some of the things that you think you know. As often happens, this reminds me of a quote I once read about Zen, which went about like this (I’m paraphrasing), “When I knew nothing of Zen, mountains were mountains, rivers were

Read more »

Let it snow!

December 14, 2012
By

A couple days ago I noticed a fun piece of R code by Allan Roberts, which lets you create a digital snowflake by cutting out virtual triangles. Go give it a try. Roberts inspired me to create a whole night sky of snowflakes. I tried to make the snowfall look as organic as possible. There

Read more »

Computing for Data Analysis Returns

December 14, 2012
By

I'm happy to announce that my course Computing for Data Analysis will return to Coursera on January 2nd, 2013. While I had previously announced that the course would be presented again right here, it made more sense to do it … Continue reading →

Read more »

When R, or any other language, is not enough

December 14, 2012
By
When R, or any other language, is not enough

This post is tangential to R, although R has a fair share of the issues I mention here, which include research reproducibility, open source, paying for software, multiple languages, salt and pepper. There is an increasing interest in the reproducibility … Continue reading →

Read more »

Sending commands from Notepad++ to a remote R session

December 14, 2012
By

If you have your working environment set up in a Windows operating system, it can be a bit of a hassle to work with R sessions on remote Linux servers.I use WinSCP + Notepad++ to handle my projects and Putty + screen to handle the R sessions. It become...

Read more »

Everything is a Network, featuring the sna package

December 14, 2012
By
Everything is a Network, featuring the sna package

We’ve gotten some requests, through the Ask us anything page, to do some plotting of networks. We may come back to this later, but today’s Gist shows how you can plot pretty much literally anything as a network. First, we go back to our

Read more »

R pitfalls #4: redefining the basics

December 13, 2012
By
R pitfalls #4: redefining the basics

I try to be economical when writing code; for example, I tend to use single quotes over double quotes for characters because it saves me one keystroke. One area where I don’t do that is when typing TRUE and FALSE … Continue reading →

Read more »

Predictive Modeling using R and the OpenScoring-Engine – a PMML approach

December 13, 2012
By
Predictive Modeling using R and the OpenScoring-Engine – a PMML approach

On November, the 27th, a special post took my interest. Scott Mutchler presented a small framework for predictive analytics based on the PMML (Predictive Model Markup Language) and a Java-based REST-Interface. PMML is a XML based standard for the description and exchange of analytical models. The idea is that every piece of software which supports the corresponding...

Read more »

Multisite, multivariate genetic analysis: simulation and analysis

December 13, 2012
By
Multisite, multivariate genetic analysis: simulation and analysis

The email wasn’t a challenge but a simple question: Is it possible to run a multivariate analysis in multiple sites? I was going to answer yes, of course, and leave it there but it would be a cruel, non-satisfying answer. … Continue reading →

Read more »

Sponsors