1740 search results for "gis"

The Register profiles Revolution Analytics

March 23, 2011
By

Tech news site The Register has just published an in-depth profile of Revolution Analytics. It was great meeting the author Dan Olds at Revolution HQ a couple of weeks ago, and sharing with him why we think the R language is the way forward for data science: modern, applied, large-scale statistical analysis. He captures that sentiment perfectly in the...

Read more »

sab-R-metrics Sidetrack: Bubble Plots

March 22, 2011
By
sab-R-metrics Sidetrack: Bubble Plots

While I had mentioned in my last post that I will cover logistic regression in my next post, I decided that a quick interlude in working with bubble plots would be fun. Bubble plots have become pretty popular recently, especially with all of the Visualization Challenges I've seen around the internet (by the way, I...

Read more »

sab-R-metrics Sidetrack: Bubble Plots

March 22, 2011
By
sab-R-metrics Sidetrack: Bubble Plots

While I had mentioned in my last post that I will cover logistic regression in my next post, I decided that a quick interlude in working with bubble plots would be fun. Bubble plots have become pretty popular recently, especially with all of the Visualization Challenges I've seen around the internet (by the way, I...

Read more »

Comparison of UAH and GISS Time Series with Common Baseline

March 22, 2011
By
Comparison of UAH and GISS Time Series with Common Baseline

In this post I set both UAH and GISS global temperature anomaly series to a common baseline period (1981-2010)  and compare them. Even though the UAH series is satellite based and GISS series is station based, the series exhibit striking … Continue reading

Read more »

Looking at the "Curse of Dimensionality" with R, foreach, and lattice

March 20, 2011
By
Looking at the "Curse of Dimensionality" with R, foreach, and  lattice

Here are the results of a "Curse of Dimensionality" homework assignment for Terran Lane's Introduction to Machine Learning class. Pretty pictures, interesting results, and a good exercise in explicit parallelism with R.




It's neat to see distance scaling linearly with standard deviation, and linearly with the Lth-root...

Read more »

Machine Learning Ex5.2 – Regularized Logistic Regression

March 20, 2011
By
Machine Learning Ex5.2 – Regularized Logistic Regression

Exercise 5.2 Improves the Logistic Regression implementation done in Exercise 4 by adding a regularization parameter that reduces the problem of over-fitting. We will be using Newton's Method.

Data

Here's the data we want to fit.

# linear regression
# load the data
mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydHZPN2pFbkZGd1RKeU81OFY3ZHJldWc&output=csv", header = TRUE)

# plot the data
plot(mydata$u, mydata$v,, xlab="u", ylab="v")
points(mydata$u,...

Read more »

How to: Binomial regression models in R

March 19, 2011
By
How to: Binomial regression models in R

Ever wondered how to predict success or failure as a function of other variables? Here's a quick tutorial on binomial regression in R.

Read more »

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

March 17, 2011
By
Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr m...

Read more »

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

March 17, 2011
By
Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr m...

Read more »

$3.2M in prizes for predicting hospitalization

March 17, 2011
By

Heritage Health and Kaggle have teamed up to create the biggest data science competition thus far: the Heritage Health Prize, which challenges competitors to build a statistical model to predict the number of days a person is likely to spend in hospital over the next year, based on (anonymized) factors such as demographics, medical visits and treatments, and other...

Read more »