Articles by heuristicandrew

Text Data Mining with Twitter and R

April 8, 2011 | heuristicandrew

Twitter is a favorite source of text data for analysis: it’s popular (there is a huge volume of variety on all topics) and easily accessible using Twitter’s free, open APIs which are easily consumable in JSON and ATOM formats. Some … Continue reading → [Read more...]

Compcache on Ubuntu on Amazon EC2

May 4, 2010 | heuristicandrew

The following fully-automatic Bash script downloads, compiles, and initializes compcache version 0.6.2 on Ubuntu Karmic Koala (9.10). This script creates two swaps with a maximum of 4GB uncompressed size each. Two swaps are used to take advantage of 2 CPUs (or CPU cores in a multicore CPU). Compcache is a fascinating memory compression ... [Read more...]

Validating credit card numbers in SAS

March 16, 2010 | heuristicandrew

Major credit card issuing networks (including Visa, MasterCard, Discover, and American Express) allow simple credit card number validation using the Luhn Algorithm (also called the “modulus 10″ or “mod 10″ algorithm). The following code demonstrates an implementation in SAS. The code also validates the credit card number by length and by checking ... [Read more...]

Weighting model fit with ctree in party

March 15, 2010 | heuristicandrew

Conditional inference trees (ctree) in package party allows weighting which is useful when one classification outcome is more important than another. Useful examples are not difficult to imagine: in a marketing direct mailing, a false positive (non-res... [Read more...]

Plot ROC curve and lift chart in R

December 18, 2009 | heuristicandrew

This tutorial with real R code demonstrates how to create a predictive model using cforest (Breiman’s random forests) from the package party, evaluate the predictive model on a separate set of data, and then plot the performance using ROC curves ... [Read more...]

Delete rows from R data frame

October 8, 2009 | heuristicandrew

Deleting rows from a data frame in R is easy by combining simple operations. Let’s say you are working with the built-in data set airquality and need to remove rows where the ozona is NA (also called null, blank or missing). The method is a conce... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)