Demonstrate your R code with an interactive, embeddable Javascript widget

January 6, 2013
By

Let visitors execute and play with simple R examples right on your web page, thanks to a web service and an embeddable widget provided by the Sage project.

2012 Summary and 2013 Plans

January 6, 2013
By

2012 was a very important year for me. It was my first full year of trading only pure quantitative strategies. It was a very successful year as well, despite the fact that the S&P 500 returned 16% (including dividends) – a tough to beat benchmark. The strategy I use on the SPY, for which I

Bayesian Classification with Gaussian Process

January 6, 2013
By

Despite prowess of the support vector machine, it is not specifically designed to extract features relevant to the prediction. For example, in network intrusion detection, we need to learn relevant network statistics for the network defense. In consu...

More Principal Components Fun

January 6, 2013
By

Today, I want to continue with the Principal Components theme and show how the Principal Component Analysis can be used to build portfolios that are not correlated to the market. Most of the content for this post is based on the excellent article, “Using PCA for spread trading” by Jev Kuznetsov. Let’s start by loading

PLS Path Modeling with R: A Comprehensive Tutorial by Gaston Sanchez

January 6, 2013
By

Gaston Sanchez has just published an online pdf of his new book PLS Path Modeling with R.I have been using Gaston's plspm r package for a couple of years to analyze marketing data.  I started when I needed to test a path model in wh...

Querying an SQLite database from R

January 6, 2013
By

You have an SQLite database, perhaps as part of some replication materials, and you want to query it from R. You might want to be able to say: results <- runsql("select * from mytable order by date") and get the results back as an R object. Here's a function to do it. In the following,

What Are Your Favorite Methodology and Statistics Blogs?

January 6, 2013
By

I recently searched for a list of the "top statistics blogs" or the "top methodology blogs" and I couldn't find a recent compilation. This contrasts with visualization blogs, which are relatively easily to find (e.g. top visualization blogs). I've decided to initiate the provision of this public good, but would like to draw on others'

January 6, 2013
By

Update 31 January: I've folded source_GitHubData into the repmis packaged. See this post. Update 7 January 2012: I updated the internal workings of source_GitHubData so that it now relies on httr rather than RCurl. Also it is more directly descended ...

Sequential testing in a triangle test setting

January 6, 2013
By

It is well known the binomial test never has an error of exactly 5%. You aim for at most 5%, calculate the number correct to get there and end up with an error of e.g 2%. This is a shame but there is no solution. However, it is also an opportunity; the...

tolower() – error catching unmappable characters

January 6, 2013
By

The tolower() function returns an error where it can’t map to the Unicode character set of the input data – a common occurrence when analysing social media data with emoticons. Emoticons are those symbols that are commonly used on mobile phones but aren’t always recognised on all platforms. For example, when converting tweets to @delta

Performance Benchmark of Running Sum Functions

January 6, 2013
By

First, let us consider a running sum function in pure R. To get started, I looked at the source code of the TTR package to see the algorithm used in runSum. The runSum function uses a Fortran routine to compute the running/rolling sum of a vector. The ...

Using the Rcpp Timer

January 6, 2013
By

Sine the 0.10.2 release, Rcpp contains an internal class Timer which can be used for fine-grained benchmarking. Romain motivated Timer in a post to the mailing * list where Timer is used to measure the different components of the costs of random number...

The statistics software signal

January 5, 2013
By

Tyler Cowen links to a post by Sean Taylor, who writes the following about users of R: You are willing to invest in learning something difficult. You do not care about aesthetics, only availability of packages and getting results quickly. To me, R is easy and Sas is difficult. I once worked with some students The post The...

R/Finance 2013 Call for Papers

January 5, 2013
By

It’s that time of year again – we’ve just posted our Call for Papers for the R/Finance 2013 conference, which focuses on applied finance using R. This is our fifth annual conference, again organized by a group of R package authors and community contributors and hosted by the International Center for Futures and Derivatives (ICFD)

Monotonic deshrinking in weighted averaging models

January 5, 2013
By
$Monotonic deshrinking in weighted averaging models$

Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has … Continue reading →

Infinite generators in R

January 5, 2013
By

This is first in a series of posts about creating simulations in R. As a foundational discussion, I first look …Continue reading »

What’s that “pre- and post-multiply” stuff?

January 5, 2013
By

Often in SEM scripts you will see matrices being pre- and post-multiplied by some other matrix. For instance, this figures in scripts computing the genetic correlation between variables. How does pre- and post-multiplying a variance...

National identification number: Finland part 3

January 5, 2013
By

Last part of our short series about the Finnish social security number (Fssn). You can check part 1 here, and part 2 here. This last post we are interested in generating random Fssn's. This has no real world applications. It is just a coding excercis...

National identification number: Finland part2

January 5, 2013
By

Continuing our theme from last time. The Finnish social security number (FSSn) has the form xxxxxxyzzzq, where the check digit is q. If you want to check if the FSSn is real the check digit is matched to the remainder of xxxxxxzzz / 31. The check di...

Calling R Functions from C++

January 5, 2013
By

At its very essence, Rcpp permits easy access to native R objects at the C++ level. R objects can be simple vectors, list or matrices; compound data structures created from these; objects of S3, S4 or Reference Class vintage; or language objects a...

Find Out Available Usernames with R

January 5, 2013
By

Update on 2013/01/05: Xiao Nan in the comments pointed out that apply(combn(letters, 2), 2, paste0, collapse = '') was wrong for all two-letter usernames, and indeed it was. It is not a combination problem. Now I use his elegant outer() solution. One can also use expand.grid(letters, letters). Github decided to take off their downloads service, and I was very...

Monotonic deshrinking in weighted averaging models

January 5, 2013
By

Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has to be said, varying degrees of success and usefulness. In simple weighted averaging (WA) there is little to specify other...

IBS reversion edge with QuantShare

January 4, 2013
By

Happy New Years to readers; my resolution this year is to continue delivering thoughts and ideas to others in the hopes that we all might be able to benefit somewhat from sharing observations. I'll start by describing an edge using QuantShare as the b...

PLS Path Modeling with R

January 4, 2013
By

Today I’m very happy and so excited to announce my new book PLS Path Modeling with R, freely available in pdf format at:  www.gastonsanchez.com/PLS_Path_Modeling_with_R.pdf After working and writing insanely for a couple of months in the last quarter of 2012, I’ve finally achieved this personal milestone that means so much to me.  What started as a … Continue reading...

Solving 9-puzzle with GNU R

January 4, 2013
By

During holiday break I have decided to solve 9-puzzle, which is 3x3 variant of a well known 15-puzzle. The solution has proven to be a nice application of igraph package. Warning: this time the code takes a bit more tame than usual in my...

New AQP Tutorials

January 4, 2013
By

Soil Series Extent Several new AQP-related tutorials have been posted to the R-Forge project page. SoilProfileCollection class/method documentation soil series extent mapping examples read more

Shortening code in R: "with" and "within" are your friends

January 4, 2013
By

If you're anything like me, you use words to name both data frames and variables. I use metadata documentation to keep track of new variables I have constructed, and the logic I have used to construct them (and why I bothered!) but I find many variable...

A new version of analogue for a new year

January 4, 2013
By

Yesterday I rolled up a new version (0.10-0) of analogue, my R package for analysing palaeoecological data. It is now available from CRAN. There were lots of incremental changes to Stratiplot() to improve the quality of the stratigraphic diagrams produced … Continue reading →

What The Smeg? Some Text Analysis of the Red Dwarf Scripts

January 4, 2013
By

Introduction Just as Pocket fundamentally changed my reading behaviour, I am finding that now having Netflix (and even before that, other downloadable or streaming digital content) is really changing my behaviour as far as television is concerned. Where watching TV used to be an affair of browsing through 500 channels and complaining there was nothing on, now with the...