Monthly Archives: October 2012

Two ways that correlation and stepwise regression can give different results

October 8, 2012
By

In general, a correlation test is used to test the association between two variables (y and z). However, if there is a third variable (x) that might be related to z or y, it makes...

Read more »

Summarizing Data

October 8, 2012
By
Summarizing Data

In this post, I'll go over four functions that you can use to nicely summarize your data.  Before any regression analysis, a descriptive analysis is key to understanding your variables and the relationships between them.  Next week, I'll have...

Read more »

Example 10.5: Convert a character-valued categorical variable to numeric

October 8, 2012
By
Example 10.5: Convert a character-valued categorical variable to numeric

In some settings it may be necessary to recode a categorical variable with character values into a variable with numeric values. For example, the matching macro we discussed in example 7.35 will only match on numeric variables. One way to conve...

Read more »

DIY ZeroAccess GeoIP Analysis : So What?

October 8, 2012
By
DIY ZeroAccess GeoIP Analysis : So What?

NOTE: A great deal of this post comes from @jayjacobs as he took a conversation we were having about thoughts on ways to look at the data and just ran like the Flash with it. Did you know that – if you’re a US citizen – you have approximately a 1 in 5 chance of getting the

Read more »

CrowdANALYTIX – Ideation Contest – Warranty Pricing

October 8, 2012
By

I recently completed an ideation contest on CrowdANALYTIX where the participants had to build an approach towards warranty pricing and fraud detection.Ideation contests are quite different from the usual data mining contests where the objective is...

Read more »

Functions for plotting and getting Greek in labels

October 8, 2012
By
Functions for plotting and getting Greek in labels

The problem: We often want to plot data and assign plot attributes based on characteristics of the data. For example, if we have a group of students with the following IQs, we might want to indicate who is an outlier in the statistical sense. I like...

Read more »

S&P 500 correlations up to date

October 8, 2012
By
S&P 500 correlations up to date

I haven’t heard much about correlation lately.  I was curious about what it’s been doing. Data The dataset is daily log returns on 464 large cap US stocks from the start of 2006 to 2012 October 5. The sector data were taken from Wikipedia. The correlation calculated here is the mean correlation of stocks among … Continue reading...

Read more »

GBIF biodiversity data from R – more functions

October 8, 2012
By
GBIF biodiversity data from R – more functions

We have been working on an R package to get GBIF data from R, with the stable version available through CRAN here, and the development version available on GitHub here. We had a Google Summer of code stuent work on the package this summer - you can se...

Read more »

Presidential Candidate Sentiment Analysis

October 7, 2012
By
Presidential Candidate Sentiment Analysis

After watching the Presidential debates and hearing all the opinions on how the candidates performed, I got the hair brained idea of creating a simple function that would do automate the pulling down of tweets for each candidate, analyze the positivity or negativity of tweets, and then graph them out. This project turned out to

Read more »

SPIDER makes the top 10 barcoding publications of 2012

October 7, 2012
By
SPIDER makes the top 10 barcoding publications of 2012

In the recent Barcode Bulletin published by iBoL, our humble paper announcing the R package spider: Species identity and evolution made second on their list of the top 10 publications of 2012. Not bad for a side project! Spider is available for downl...

Read more »