Monthly Archives: January 2013

Bayesian Classification with Gaussian Process

January 6, 2013
By
Bayesian Classification with Gaussian Process

Despite prowess of the support vector machine, it is not specifically designed to extract features relevant to the prediction. For example, in network intrusion detection, we need to learn relevant network statistics for the network defense. In consu...

Read more »

More Principal Components Fun

January 6, 2013
By
More Principal Components Fun

Today, I want to continue with the Principal Components theme and show how the Principal Component Analysis can be used to build portfolios that are not correlated to the market. Most of the content for this post is based on the excellent article, “Using PCA for spread trading” by Jev Kuznetsov. Let’s start by loading

Read more »

PLS Path Modeling with R: A Comprehensive Tutorial by Gaston Sanchez

January 6, 2013
By
PLS Path Modeling with R: A Comprehensive Tutorial by Gaston Sanchez

Gaston Sanchez has just published an online pdf of his new book PLS Path Modeling with R.I have been using Gaston's plspm r package for a couple of years to analyze marketing data.  I started when I needed to test a path model in wh...

Read more »

Querying an SQLite database from R

January 6, 2013
By

You have an SQLite database, perhaps as part of some replication materials, and you want to query it from R. You might want to be able to say: results <- runsql("select * from mytable order by date") and get the results back as an R object. Here's a function to do it. In the following,

Read more »

What Are Your Favorite Methodology and Statistics Blogs?

January 6, 2013
By

I recently searched for a list of the "top statistics blogs" or the "top methodology blogs" and I couldn't find a recent compilation. This contrasts with visualization blogs, which are relatively easily to find (e.g. top visualization blogs). I've decided to initiate the provision of this public good, but would like to draw on others'

Read more »

source_GitHubData: a simple function for downloading data from GitHub into R

January 6, 2013
By

Update 31 January: I've folded source_GitHubData into the repmis packaged. See this post. Update 7 January 2012: I updated the internal workings of source_GitHubData so that it now relies on httr rather than RCurl. Also it is more directly descended ...

Read more »

Sequential testing in a triangle test setting

January 6, 2013
By

It is well known the binomial test never has an error of exactly 5%. You aim for at most 5%, calculate the number correct to get there and end up with an error of e.g 2%. This is a shame but there is no solution. However, it is also an opportunity; the...

Read more »

tolower() – error catching unmappable characters

January 6, 2013
By
tolower() – error catching unmappable characters

The tolower() function returns an error where it can’t map to the Unicode character set of the input data – a common occurrence when analysing social media data with emoticons. Emoticons are those symbols that are commonly used on mobile phones but aren’t always recognised on all platforms. For example, when converting tweets to @delta

Read more »

Performance Benchmark of Running Sum Functions

January 6, 2013
By

First, let us consider a running sum function in pure R. To get started, I looked at the source code of the TTR package to see the algorithm used in runSum. The runSum function uses a Fortran routine to compute the running/rolling sum of a vector. The ...

Read more »

Using the Rcpp Timer

January 6, 2013
By

Sine the 0.10.2 release, Rcpp contains an internal class Timer which can be used for fine-grained benchmarking. Romain motivated Timer in a post to the mailing * list where Timer is used to measure the different components of the costs of random number...

Read more »