# Monthly Archives: October 2012

## Two ways that correlation and stepwise regression can give different results

October 8, 2012
In general, a correlation test is used to test the association between two variables (y and z). However, if there is a third variable (x) that might be related to z or y, it makes...

## Summarizing Data

October 8, 2012
In this post, I'll go over four functions that you can use to nicely summarize your data.  Before any regression analysis, a descriptive analysis is key to understanding your variables and the relationships between them.  Next week, I'll have...

## Example 10.5: Convert a character-valued categorical variable to numeric

October 8, 2012
In some settings it may be necessary to recode a categorical variable with character values into a variable with numeric values. For example, the matching macro we discussed in example 7.35 will only match on numeric variables. One way to conve...

## DIY ZeroAccess GeoIP Analysis : So What?

October 8, 2012
NOTE: A great deal of this post comes from @jayjacobs as he took a conversation we were having about thoughts on ways to look at the data and just ran like the Flash with it. Did you know that – if you’re a US citizen – you have approximately a 1 in 5 chance of getting the

## CrowdANALYTIX – Ideation Contest – Warranty Pricing

October 8, 2012
I recently completed an ideation contest on CrowdANALYTIX where the participants had to build an approach towards warranty pricing and fraud detection.Ideation contests are quite different from the usual data mining contests where the objective is...

## Functions for plotting and getting Greek in labels

October 8, 2012
The problem: We often want to plot data and assign plot attributes based on characteristics of the data. For example, if we have a group of students with the following IQs, we might want to indicate who is an outlier in the statistical sense. I like...

## S&P 500 correlations up to date

October 8, 2012
I haven’t heard much about correlation lately.  I was curious about what it’s been doing. Data The dataset is daily log returns on 464 large cap US stocks from the start of 2006 to 2012 October 5. The sector data were taken from Wikipedia. The correlation calculated here is the mean correlation of stocks among … Continue reading...

## GBIF biodiversity data from R – more functions

October 8, 2012
We have been working on an R package to get GBIF data from R, with the stable version available through CRAN here, and the development version available on GitHub here. We had a Google Summer of code stuent work on the package this summer - you can se...

## Presidential Candidate Sentiment Analysis

October 7, 2012
After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to

## SPIDER makes the top 10 barcoding publications of 2012

October 7, 2012
In the recent Barcode Bulletin published by iBoL, our humble paper announcing the R package spider: Species identity and evolution made second on their list of the top 10 publications of 2012. Not bad for a side project! Spider is available for downl...