1736 search results for "GIS"

Looking at the "Curse of Dimensionality" with R, foreach, and lattice

March 20, 2011
By
Looking at the "Curse of Dimensionality" with R, foreach, and  lattice

Here are the results of a "Curse of Dimensionality" homework assignment for Terran Lane's Introduction to Machine Learning class. Pretty pictures, interesting results, and a good exercise in explicit parallelism with R.




It's neat to see distance scaling linearly with standard deviation, and linearly with the Lth-root...

Read more »

Machine Learning Ex5.2 – Regularized Logistic Regression

March 20, 2011
By
Machine Learning Ex5.2 – Regularized Logistic Regression

Exercise 5.2 Improves the Logistic Regression implementation done in Exercise 4 by adding a regularization parameter that reduces the problem of over-fitting. We will be using Newton's Method.

Data

Here's the data we want to fit.

# linear regression
# load the data
mydata = read.csv("http://spreadsheets.google.com/pub?key=0AnypY27pPCJydHZPN2pFbkZGd1RKeU81OFY3ZHJldWc&output=csv", header = TRUE)

# plot the data
plot(mydata$u, mydata$v,, xlab="u", ylab="v")
points(mydata$u,...

Read more »

How to: Binomial regression models in R

March 19, 2011
By
How to: Binomial regression models in R

Ever wondered how to predict success or failure as a function of other variables? Here's a quick tutorial on binomial regression in R.

Read more »

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

March 17, 2011
By
Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr m...

Read more »

Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

March 17, 2011
By
Applying functions on groups: sqldf, plyr, doBy, aggregate or data.table ?

Which one of the sqldf, plyr, doBy and aggregate functions/packages would be faster for applying functions on groups of rows? I was wondering about this earlier in this post.  It seems sqldf would be the fastest according to a post in manipulatr m...

Read more »

$3.2M in prizes for predicting hospitalization

March 17, 2011
By

Heritage Health and Kaggle have teamed up to create the biggest data science competition thus far: the Heritage Health Prize, which challenges competitors to build a statistical model to predict the number of days a person is likely to spend in hospital over the next year, based on (anonymized) factors such as demographics, medical visits and treatments, and other...

Read more »

sab-R-metrics: Brief Sidetrack for Scatterplot Matrices

March 16, 2011
By
sab-R-metrics: Brief Sidetrack for Scatterplot Matrices

In my last two posts I talked about Ordinary Least Squares, then extended this discussion to the multiple predictor case and briefly talked about some of the problems that may arise. These problems can include omitted variable bias, heteroskedasticity, non-normality, and multicollinearity. Most of these problems are relatively minor in practice and have easy fixes,...

Read more »

sab-R-metrics: Brief Sidetrack for Scatterplot Matrices

March 16, 2011
By
sab-R-metrics: Brief Sidetrack for Scatterplot Matrices

In my last two posts I talked about Ordinary Least Squares, then extended this discussion to the multiple predictor case and briefly talked about some of the problems that may arise. These problems can include omitted variable bias, heteroskedasticity, non-normality, and multicollinearity. Most of these problems are relatively minor in practice and have easy fixes,...

Read more »

Machine Learning Ex4 – Logistic Regression and Newton’s Method

March 16, 2011
By
Machine Learning Ex4 – Logistic Regression and Newton’s Method

Exercise 4 is all about using Newton's Method to implement logistic regression on a classification problem.

For all this to make sense i suggest having a look at Andrew Ng machine learning lectures on openclassroom.

We start with a dataset representing 40 students who were admitted to college and 40 students who were not admitted, and their corresponding...

Read more »

More pi plus 1 (or plus 0.01) day fun

March 15, 2011
By
More pi plus 1 (or plus 0.01) day fun

Since I just didn’t get enough this morning, I spent some more time fooling around with estimating pi. Since I was basically counting the number of random x,y pairs inside a quarter circle and computing a sample average for more … Continue reading

Read more »