Blend what?

March 16, 2013
By

Why?Over the years I have learned quite a few things about machine learning but I have never thought of writing them down properly. Too often I can't figure out exactly what I did when I look at my old codes. The time is NOW!More importantly, I have fa...

Read more »

GNU R loop speed comparison

March 16, 2013
By
GNU R loop speed comparison

Recently I had several discussions about using for loops in GNU R and how they compare to *apply family in terms of speed. I have not seen a direct benchmark comparing them so I decided to execute one (warning: some of the code presented today tak...

Read more »

Scholarly metadata in R

March 16, 2013
By
Scholarly metadata in R

Scholarly metadata - the meta-information surrounding articles - can be super useful. Although metadata does not contain the full content of articles, it contains a lot of useful information, including title, authors, abstract, URL to the article, etc. One of the largest sources of metadata is provided via the Open Archives Initiative Protocol for Metadata Harvesting or OAI-PMH....

Read more »

Changing Axis Values in R Plot

March 15, 2013
By
Changing Axis Values in R Plot

A colleague asked me for how one can change axis attributes in a basic plot. Plotting anything in R is really, really easy. It is enough typing plot(x, y). In general, plot functions are nicely pre-cooked, so hardly one needs to change anything. But if changes in the default attributes are needed, it is possible

Read more »

Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

March 15, 2013
By
Evaluation of Orthogonal Signal Correction for PLS modeling (OSC-PLS and OPLS)

Partial least squares projection to latent structures or PLS is one of my favorite modeling algorithms. PLS is an optimal algorithm for predictive modeling using wide data or data with  rows << variables. While there is s a wealth of literature regarding the application of PLS to various tasks, I find it especially useful for biological

Read more »

How Did I Miss “The Golden Dilemma”?

March 15, 2013
By
How Did I Miss “The Golden Dilemma”?

I am ashamed to admit that I am way behind (about 10,127 downloads) in discovering this wonderful paper: The Golden Dilemma (January 8, 2013)Erb, Claude B. and Harvey, Campbell R.Available at SSRN: http://ssrn.com/abstract=2078535 Here are the authors presenting the concept in July 2012 if you prefer slideshow format (thanks...

Read more »

How do I make my graphs?

March 15, 2013
By
How do I make my graphs?

Someone who wishes to remain anonymous writes: I’ve been following your blog a long time and enjoy your posts on visualization/statistical graphics matters. I don’t recall however you ever describing the details of your setup for plotting. I’m a new R user (convert from matplotlib) and would love to know your thoughts on the ideal The post How...

Read more »

Calender Heatmap with Google Analytics Data

March 15, 2013
By
Calender Heatmap with Google Analytics Data

As data analytics consulting firm, we think we are fortunate that we keep finding problems to find. Recently my team mate found a glaring problem of not having any connector for R with Google. With the inspiration from Michael, Ajay O, it soon become a worth problem to solve. With RGoogleAnalytics package now, we have

Read more »

Veterinary Epidemiologic Research: GLM – Logistic Regression

March 14, 2013
By
Veterinary Epidemiologic Research: GLM – Logistic Regression

We continue to explore the book Veterinary Epidemiologic Research and today we’ll have a look at generalized linear models (GLM), specifically the logistic regression (chapter 16). In veterinary epidemiology, often the outcome is dichotomous (yes/no), representing the presence or absence of disease or mortality. We code 1 for the presence of the outcome and 0

Read more »

Data Science Education gets personal

March 14, 2013
By

by Joseph B. Rickert It is difficult to imagine that there is anyone on the planet with an internet connection and a desire to learn something new who has not at least looked into taking a massive open online course (MOOC). Last Fall, in an 11/4/12 article, the New York Time declared the Year of the MOOC and quoted...

Read more »

Upcoming events

March 14, 2013
By
Upcoming events

Highlighted LondonR is soon — see the “Previously Announced” section. New Events Thirsty Quants 2013 March 21, London. Some thirsty quants will be going for a drink on the 21st of March as of 18.30 at the Lamb Tavern in Leadenhall Market. http://www.lambtavernleadenhall.com/ Rethinking the Economics of Pensions 2013 March 21 & 22 in London. … Continue reading...

Read more »

Apply-style commands in R

March 14, 2013
By

Here's a quick table of what I think are the most useful apply-style commands in R: FunctionInputOutputBest forapplyRectangularRectangular or vectorApplying function to rows or columnslapplyAnythingListNon-trivial operations on almost any data typesap...

Read more »

Data Science in Business/Computational Social Science in Academia?

March 14, 2013
By
Data Science in Business/Computational Social Science in Academia?

Nomen Est Omen?Lately, the terms "data science" and "data scientist" turn up at an increasing pace in the R-blog-sphere. Since its first occurrence (to my knowledge,  "data scientist" has been coined by DJ Patil and Jeff Hammerbacher in 2008), th...

Read more »

Using bigmemory with Rcpp

March 14, 2013
By
Using bigmemory with Rcpp

The bigmemory package allows users to create matrices that are stored on disk, rather than in RAM. When an element is needed, it is read from the disk and cached in RAM. These objects can be much larger than native R matrices. Objects stored as such larger-than-RAM matrices are defined in the big.matrix class and they are designed...

Read more »

On ENAR, or Statistical Meetings in General

March 14, 2013
By

Last year I accepted an invitation from Ben to go to ENAR 2013 -- my first ENAR. I used to go to JSM and useR!, and apparently I enjoy useR! most. The reason is not, or not only, because I'm more of a technical person. It is just hard to concentrate at large statistical conferences. I want...

Read more »

qdap 0.2.1 Released

March 13, 2013
By
qdap 0.2.1 Released

I’m very pleased to announce the release of qdap 0.2.1 This is the second installment of the qdap package available at CRAN. The qdap package automates many of the tasks associated with quantitative discourse analysis of transcripts containing discourse, including … Continue reading →

Read more »

In case you missed it: February 2013 Roundup

March 13, 2013
By

In case you missed them, here are some articles from February of particular interest to R users. How to resample from a large data set with RHadoop, and a video introduction to the RHadoop packages. A 90-second video explains: What is Revolution R Enterprise? Jeffrey Stanton has published a free e-book "An Introduction to Data Science" using R. I...

Read more »

John Snow’s Cholera data in more formats

March 13, 2013
By
John Snow’s Cholera data in more formats

In honour of the bicentenary of John Snow’s birth – and because I was asked to by someone via email – I have now released my digitisation of John Snow’s Cholera data in a few other formats: KML and as Google Fusion Tables. To save you reading my previous blog posts on the subject, I’ll

Read more »

Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
By
Using maps and ggplot2 to visualize college hockey championships

Short: I plot the frequency of college hockey championships by state using the maps package, and ggplot2 Note: this example is based heavily on the example provided athttp://www.dataincolour.com/2011/07/maps-with-ggplot2/ data reference:http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship Question of interestAs a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it...

Read more »

Webinar tomorrow: 100% R and More

March 13, 2013
By

A quick note that I'll be hosting our regularly-scheduled webinar, Revolution R Enterprise, 100% R and More, at 10AM Pacific tomorrow. If you're new to R, or want to learn about the power, scalability and productivity features of Revolution R Enterprise, this is a great place to start. Revolution Analytics webinars: Revolution R Enterprise, 100% R and More

Read more »

New package for ensembling R models

March 13, 2013
By
New package for ensembling R models

I've written a new R package called caretEnsemble for creating ensembles of caret models in R.  It currently works well for regression models, and I've written some preliminary support for binary classification models. At th...

Read more »

R needs some bureaucracy

March 12, 2013
By

Writing a program in R is almost bureaucracy free: variables don’t need to be declared, the language does a reasonable job of guessing the type a value might need to be automatically be converted to, there is no need to create a function having a special name that gets called at program startup, the commonly

Read more »

Rcpp master class in New York last weekend

March 12, 2013
By

On Saturday I had the opportunity to teach another one-day master class on Rcpp. The class had been organized by Jared Lander, and organized very well I might add. The weekend started with a slight disappointment. I had taken Friday off, and hoped t...

Read more »

RcppArmadillo 0.3.800.1

March 12, 2013
By

Conrad released a first bug-fix release 3.800.1 of Armadillo earlier today. This has been wrapped up in release 0.3.800.1 of RcppArmadillo as usual. This release also contains a very nice function sample() (contributed by Christian Gunning) which p...

Read more »

A map of worldwide email traffic, created with R

March 12, 2013
By
A map of worldwide email traffic, created with R

The Washing Post reports that by analyzing more than 10 million emails sent through the Yahoo! Mail service in 2012, a team of researchers used the R language to create a map of countries whose citizens email each other most frequently: The chart above shows the top 1000 country-country pairs by email frequency, arranged in a clustered network using...

Read more »

Distrust of R

March 12, 2013
By

I guess I've been living in a bubble for a bit, but apparently there are a lot of people who still mistrust R. I got asked this week why I used R (and, specifically, the package rpart) to generate classification and regression trees instead of SAS Ente...

Read more »

R to Latex packages: Coverage

March 12, 2013
By

There are now quite a few R packages to turn cross-tables and fitted models into nicely formatted latex. In a previous post I showed how to use one of them to display regression tables on the fly. In this post I summarise what types of R object each of the major packages can deal with.

Read more »

AQP News and Updates

March 12, 2013
By

The AQP family of R packages has seen a lot of development over the last 3 months. Some of the highlights include: HTML manual pages with syntax-highlighting and figures, c/o knitr new vignettes: "dealing with bad data", gridded SSURGO (gSSURGO) demo,...

Read more »

Job advert

March 12, 2013
By
Job advert

We finally got around to prepare everything we needed to advertise the position which will be available in the MRC grant we've been awarded last year.The project will run for 30 months and we're looking for a post-doctoral candidate to work on the Rese...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.