Data Mining in R online course taught by Luis Torgo at statistics.com

June 8, 2011
By

An interested PR piece I got from Janet Dobbins: ————— Luis Torgo is teaching an online course, “Data Mining in R: Learning with Case Studies” at statistics.com. The course starts June 17 – July 15. Brief Description: The main goal of this course is to teach users how to perform data mining tasks using R. Instructor(s): Dr. Luis Torgo...

Read more »

Making Simple Packages in R on Windows

June 8, 2011
By
Making Simple Packages in R on Windows

There are any number of short tutorials on making add on R packages on your Windows machine. This is yet another version of that process. I’ve explained what I did in 10 easy steps on the pages, but I’ll give a brief overview here. In the first step I spent some time updating my R

Read more »

A Quantstrat to Build on Part 4

June 8, 2011
By
A Quantstrat to Build on Part 4

When we build a system, we are almost always trying to beat buy and hold by some metric or metrics.  I have not found a demo to compare a quantstrat system with a generic buy and hold system.  Here is the way I accomplish a basic comparison w...

Read more »

A Quantstrat to Build on Part 3

June 8, 2011
By
A Quantstrat to Build on Part 3

This just does the same thing as A Quantstrat to Build on Part 2, but I use sigCrossover and sigComparison instead of sigThreshold as my signal.  Maybe it will help some struggling to understand implementation of the different signal types.  ...

Read more »

Real-time Analytics for Capital Markets with Revolution R

June 8, 2011
By

In the 2011 edition of the Sybase Capital Markets Guide, Revolution Analytics CTO David Champagne talks about the need for up-to-date analytics in Finance, and how you can integrate Revolution R with quality real-time data sources. Here's an excerpt: R represents a radically different approach to the challenges posed by analyzing increasingly large and complex data sets. Because it...

Read more »

David Banks on Reproducible Research

June 8, 2011
By

Just got an email linking to Reproducible Research: A Range of Response, in the new journal Statistics, Politics, and Policy 2(1) by David Banks, who is also the journal's editor. Interestingly, the commentary doesn't mention the journal's policy (if one exists) on the reproducibility of research submitted there. Banks' writing is easy to read, though

Read more »

Stratigraphic diagrams using analogue

June 8, 2011
By
Stratigraphic diagrams using analogue

One of the routine tasks palaeoecologists do is plot data on species composition or geochemical proxies say along a sediment core or stratigraphic sequence. These diagrams are the canonical way of displaying stratigraphic data in this field. An example of … Continue reading →

Read more »

Stratigraphic diagrams using analogue

June 8, 2011
By
Stratigraphic diagrams using analogue

One of the routine tasks palaeoecologists do is plot data on species composition or geochemical proxies say along a sediment core or stratigraphic sequence. These diagrams are the canonical way of displaying stratigraphic data in this field. An example of a stratigraphic diagram is shown below.

Read more »

Generating unique random IDs

June 7, 2011
By
Generating unique random IDs

Recently I was asked to help create random IDs for someone. At first I thought, ‘Ah yup, 1:x (1,2,3, …,x), job done’. Then I thought that there had to be a R function/package to create better looking IDs, to which I didn’t find one, if there is, please let me know. In the mean time

Read more »

Drafting the Documentation for RTextTools

In preparation for The 4th Annual Conference of the Comparative Policy Agendas Project in Catania, Sicily, our development team has been busy drafting the documentation for RTextTools. In addition to standard documentation of functions, we want to provide quick-start guides, sample datasets, example scripts, and

Read more »

How to fit power laws

June 7, 2011
By
How to fit power laws

A new paper out in Ecology by Xiao and colleagues (in press, here) compares the use of log-transformation to non-linear regression for analyzing power-laws.They suggest that the error distribution should determine which method performs better. When you...

Read more »

A Quantstrat to Build on Part 2

June 7, 2011
By
A Quantstrat to Build on Part 2

As I explore additional functionality of quantstrat and make changes to my original post A Quantstrat to Build On, I will write multiple posts, and hopefully, the finished product will not be so overwhelming to comprehend.  Also, it might highligh...

Read more »

The ‘Big Analytics’ Revolution Starts with R: Webinar June 14

June 7, 2011
By

On Tuesday next week I'll be teaming up with Revolution Analytics' Mike Minelli to give a 30-minute webinar to introduce executives to R, Big Data, and applications of advanced analytics. If there's someone in your company who needs to know about the impact of R on getting value out of data, they can register here. Here's the agenda: The...

Read more »

R books are now showing up in the dollar bin. That’s a good…

June 7, 2011
By
R books are now showing up in the dollar bin. That’s a good…

R books are now showing up in the dollar bin. That’s a good sign!

Read more »

K-Means Clustering on Big Data

June 7, 2011
By
K-Means Clustering on Big Data

In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here -- ed. The k-means (Lloyd) algorithm, an intuitive way to explore...

Read more »

The pros and cons of robust data characterizations

The pros and cons of robust data characterizations

Over the years, I have looked at a lot of data contaminated with outliers, the subject of Chapter 7 of Exploring Data in Engineering, the Sciences, and Medicine.  That chapter adopts the definition of an outlier presented by Barnett and Lewis in their book Outliers in Statistical Data 2nd Edition

Read more »

Fittesmodel.com: A user-friendly way to conduct empirical research together

June 6, 2011
By

(A guest post by Camiel de Koning) ————– When trying to replicate, verify or extend empirical research of others, a researcher generally encounters many time-consuming barriers and there are often many prerequisites. Fittestmodel has the objective to overcome many of these problems, by presenting a webapplication that allows users to: use but not having to install R. quickly incorporate...

Read more »

R for Data Mining

June 6, 2011
By

Statistics and data mining often get bundled together, but (in my opinion), they're generally different practices with different goals. As a language designed for statistics, much of R's core functionality is focused on exploring and understanding data: model design, inference, and visualization. But when your goal is simply to get the best predictions from a big data set (without...

Read more »

In case you missed it: May Roundup

June 6, 2011
By

In case you missed them, here are some articles from May of particular interest to R users. A review of "R Cookbook", a new how-to book for R programmers. A detailed example of using the RevoScaleR package to analyze a large airline data set. A new guide for R beginners, "How to Learn R", provides links to R resources,...

Read more »

Shared Ecological Modelling References

June 6, 2011
By

05.06.2011 Today i started to create a list of books and articles about ecological modelling. In this list you will not only find general books about modelling but also books about spatial analysis, image analysis and other (in my opinion) important techniques useful in the context of ecological modelling. For the collection i use “Zotero”

Read more »

10 R One Liners to Impress Your Friends

June 5, 2011
By

Following the trend of one liners for various languages (Haskell, Scala, Python), here's some examples in RMultiply Each Item in a List by 2#listslapply(list(1:4),function(n){n*2})# otherwise(1:4)*2 Sum a List of Numbers#listslapply(list(1:4),sum)# oth...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time. It has been...

Read more »

An application of aggregate() and merge()

June 5, 2011
By
An application of aggregate() and merge()

Today, I encountered an interesting problem while processing a data set of mine. My data have observations on businesses that are repeated over time. My data set also contains information on longitude and latitude of the business location, but unfort...

Read more »

Conway’s Game of Life in R with ggplot2 and animation

June 5, 2011
By
Conway’s Game of Life in R with ggplot2 and animation

In undergrad I had a computer science professor that piqued my interest in applied mathematics, beginning with Conway’s Game of Life. At first, the Game of Life (not the board game) appears to be quite simple — perhaps, too simple — but it has been widely explored and is useful for modeling systems over time.

Read more »

Testing Different Methods for Merging a set of Files into a Dataframe

June 5, 2011
By
Testing Different Methods for Merging a set of Files into a Dataframe

I previously posted a method I used for merging a set of files into a dataframe. It wasn’t long before …Continue reading »

Read more »

Environments in R

June 4, 2011
By
Environments in R

One interesting thing about R is that you can get down into the insides fairly easily. You're allowed to see more of how things are put together than in most languages. One of the ways R does this is by having first-class environments. At first glance, environments are simple enough. An environment...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

Don Quijote — Word Statistics

June 4, 2011
By
Don Quijote — Word Statistics

Using the Gutenberg Project’s free text of Don Quijote + Unix for Poets, here are the most used (non-short) words in Miguel de Cervantes’ famous work: 2167 Quijote 2145 Sancho 1331 porque 1053 respondió 1027 había  900 merced  813 vuestra  79...

Read more »

searching ITIS and fetching Phylomatic trees

June 3, 2011
By
searching ITIS and fetching Phylomatic trees

I am writing a set of functions to search ITIS for taxonomic information (more databases to come) and functions to fetch plant phylogenetic trees from Phylomatic. Code at github.Also, see the examples in the demos folder on the Github site above.

Read more »