Microarrays, scan dates and Bioconductor: it shouldn’t be this difficult

August 21, 2013
By
Microarrays, scan dates and Bioconductor: it shouldn’t be this difficult

When dealing with data from high-throughput experimental platforms such as microarrays, it’s important to account for potential batch effects. A simple example: if you process all your normal tissue samples this week and your cancerous tissue samples next week, you’re in big trouble. Differences between cancer and normal are now confounded with processing time and

Read more »

Prototyping Multinomial Logit with R

August 21, 2013
By
Prototyping Multinomial Logit with R

Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below. However, in my view, above methods are not flexible

Read more »

Demand for R jobs on the rise, ctd

August 21, 2013
By
Demand for R jobs on the rise, ctd

Earlier this month, we looked at the trends in the job prospects for data analysts with expertise in R and SAS, by looking at the number of job postings that mention each software package. Because R's single-letter name makes it hard to search for, and because SAS is used for many other things besides data analysis, I coupled the...

Read more »

What are the hottest areas for CS Research? (Based on Google Research 2013)

August 21, 2013
By
What are the hottest areas for CS Research? (Based on Google Research 2013)

What are some of the hottest areas of research in Computer Science at the moment (August 2013)? And at which universities is this research being carried out?The answers are subjective by definition, but looking at the numbers behind the Google Research...

Read more »

R and PMML Support

August 21, 2013
By
R and PMML Support

A PMML package for R that exports all kinds of predictive models is available directly from CRAN.Traditionally, the pmml package offered support for the following data mining algorithms:ksvm (kernlab): Support Vector Machinesnnet: Neural Networksrpart:...

Read more »

Doodling in R!

August 21, 2013
By
Doodling in R!

# I am working on creating some functions that will be capable of creating shapes and plots that look hand drawn. # I have made some progress in this goal. # In that process I have also discovered that I can make some doodles that look hand d...

Read more »

influence.ME now supports new lme4 1.0

August 21, 2013
By

influence.ME is an R package for detecting influential data in multilevel regression models (or, mixed effects models as they are referred to in the R community). The application of multilevel models has become common practice, but the development of diagnostic ...

Read more »

Groan – my first R package

August 20, 2013
By
Groan – my first R package

Being one of two R experts at my current job I figured I should be familiar with package development. Frankly, I've been procrastinating on this topic since I started using R in 2007 - I was doing just fine with source() and the section of the R manua...

Read more »

The financial meltdown, to a trance beat

August 20, 2013
By
The financial meltdown, to a trance beat

With the FMS Symphony by csv soundsystem you can listen to the Global Financial Crisis as you watch interest rates plunge while the Treasury floods the market with emergency funds. The source data for the chart and music comes from daily emails (like this one) sent by the US Treasury summarizing the cash spending and borrowing of the Federal...

Read more »

Downloading and Analyzing CD1025’s Playlist

August 20, 2013
By
Downloading and Analyzing CD1025’s Playlist

CD1025 is an “alternative” radio station here in Columbus. They are one of the few remaining radio stations that are independently owned and they take great pride in it. For data nerds like me, they also put a real time list of recently played songs on their website. The page has the most recent 50 songs played,...

Read more »

Time-series forecasting: Bike Accidents

August 20, 2013
By
Time-series forecasting: Bike Accidents

About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up

Read more »

“[” and “[[” with the apply() functions

August 20, 2013
By
“[” and “[[” with the apply() functions

Did you know you can use "

Read more »

Electronic lab notebook

August 20, 2013
By
Electronic lab notebook

I was interested to read C. Titus Brown‘s recent post, “Is version control an electronic lab notebook?” I think version control is really important, and I think all computational scientists should have something equivalent to a lab notebook. But I think of version control as serving needs orthogonal to those served by a lab notebook.

Read more »

Print glm-output to HTML table #rstats

August 20, 2013
By
Print glm-output to HTML table #rstats

We often use logistic regression models in our analyses and we also often need to publish the results in table format. And, we always use MS Word since this is our standard office in our department. So I thought about … Weiterlesen →

Read more »

Step by step to build my first R Hadoop System

August 20, 2013
By
Step by step to build my first R Hadoop System

by Yanchang Zhao, RDataMining.com After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience … Continue reading →

Read more »

The PISA2003lite package is released. Let’s explore!

August 20, 2013
By
The PISA2003lite package is released. Let’s explore!

Today I’m going to show how to install PISA2003lite, what is inside and how to use this R package. Datasets from this package will be used to compare student performance in four math sub-areas across different countries. At the end of the day we will find out in which areas top performers from different countries

Read more »

Weak Learners

August 20, 2013
By
Weak Learners

Tonight's session of R bore just enough fruit that I finally am writing my sophomore entry. I was browsing the slides from a machine learning lecture given by Leo Breiman and came across a relatively simple example he used to introduce the notions...

Read more »

ChainLadder 0.1.6 released with chain-ladder factor models

August 20, 2013
By
ChainLadder 0.1.6 released with chain-ladder factor models

Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN.The new version adds the function CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors fo...

Read more »

Sentiment Analysis using R

August 20, 2013
By
Sentiment Analysis using R

September 23, 2013Movie rating using Twitter Data – Using RToday I will explain you how to create a basic Movie review engine based on the tweets by people using R.The implementation of the Review Engine will be as follows:         Gets Tweets from Twitter     ...

Read more »

Warrior Zombies from Outer Space II: Mayhem Unleashed

August 20, 2013
By
Warrior Zombies from Outer Space II: Mayhem Unleashed

Given the speed at which I consume them, it's only justified that the first post on this blog is about movies. (Although, by that logic, it could have equally well been about sandwiches, Nutella, or tissue paper. Note to self: Look for a Nutella consumption dataset) Anyway, this post is about movie taglines - specifically, the words that...

Read more »

Downloading and Analyzing CD1025’s Playlist

August 19, 2013
By
Downloading and Analyzing CD1025’s Playlist

CD1025 is an “alternative” radio station here in Columbus. They are one of the few remaining radio stations that are independently owned and they take great pride in it. For data nerds like me, they also put a real time list of recently played songs on their website. The page has the most recent 50 songs played,...

Read more »

Correcting a pseudo-correlation matrix to be positive semidefinite

August 19, 2013
By

In a recent LinkedIn conversation, the topic of correlation between multiple financial indices was raised. While the actual details are not relevant, the discussion reminded me of one of the concerns I have whenever multivariate correlation is used—how to populate the correlation matrix. First, some background. Unfortunately, most financial random variables are not normally distributed—they Read the full...

Read more »

Export R Results Tables to Excel – Please don’t kick me out of your club

August 19, 2013
By
Export R Results Tables to Excel – Please don’t kick me out of your club

This post is written as a result of finding the following exchange on one of the R mailing lists:Is-there-a-way-to-export-regression-output-to-an-excel-spreadsheetQuestion: Is there a way to export regression output to an excel spreadsheet?Translation:...

Read more »

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Introduction Data in R are often stored in data frames, because they can store multiple types of data.  (In R, data frames are more general than matrices, because matrices can only store one type of data.)  Today’s post highlights some common functions in R that I like to use to explore a data frame before

Read more »

Gaussian Processes with RStan

August 19, 2013
By
Gaussian Processes with RStan

Email Previously I looked at how to simulate Gaussian processes in R, following the methods in Rasmussen and Williams. But now that Andrew Gelman et al. (of

Read more »

Question and Answer: Generating Binary and Discrete Response Data

August 19, 2013
By

I was recently contacted by a reader with two very specific questions and I thought that this would be a good topic to publicity respond to. He would like to simulate his data:I have firm level data and the model is discrete choice with the main expla...

Read more »

Text Mining with R – Comparing Word Counts in two Text Documents

August 19, 2013
By

Here's what I came up with to compare word counts in two pieces of text. If you got any idea, I'd love to learn about alternatives!## a function that compares word counts in two textswordcount ...

Read more »

Revolution Newsletter: August 2013

August 19, 2013
By

The most recent edition of the Revolution Newsletter is now available. In case you missed it, the news section is below, and you can read the full August edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. What is R? Has anyone ever asked you,...

Read more »

R vs Python Speed Comparison for Bootstrapping

August 19, 2013
By
R vs Python Speed Comparison for Bootstrapping

I’m interested in Python a lot, mostly because it appears to be wickedly fast. The downside is that I don’t know it nearly as well as R, so any speed gain in computation time is more than offset by Google … Continue reading →

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.