Let visitors execute and play with simple R examples right on your web page, thanks to a web service and an embeddable widget provided by the Sage project.

Today, I want to continue with the Principal Components theme and show how the Principal Component Analysis can be used to build portfolios that are not correlated to the market. Most of the content for this post is based on the excellent article, “Using PCA for spread trading” by Jev Kuznetsov. Let’s start by loading

You have an SQLite database, perhaps as part of some replication materials, and you want to query it from R. You might want to be able to say: results <- runsql("select * from mytable order by date") and get the results back as an R object. Here's a function to do it. In the following,

I recently searched for a list of the "top statistics blogs" or the "top methodology blogs" and I couldn't find a recent compilation. This contrasts with visualization blogs, which are relatively easily to find (e.g. top visualization blogs). I've decided to initiate the provision of this public good, but would like to draw on others'

Update 31 January: I've folded source_GitHubData into the repmis packaged. See this post. Update 7 January 2012: I updated the internal workings of source_GitHubData so that it now relies on httr rather than RCurl. Also it is more directly descended ...

It is well known the binomial test never has an error of exactly 5%. You aim for at most 5%, calculate the number correct to get there and end up with an error of e.g 2%. This is a shame but there is no solution. However, it is also an opportunity; the...

The tolower() function returns an error where it can’t map to the Unicode character set of the input data – a common occurrence when analysing social media data with emoticons. Emoticons are those symbols that are commonly used on mobile phones but aren’t always recognised on all platforms. For example, when converting tweets to @delta

First, let us consider a running sum function in pure R. To get started, I looked at the source code of the TTR package to see the algorithm used in runSum. The runSum function uses a Fortran routine to compute the running/rolling sum of a vector. The ...

Sine the 0.10.2 release, Rcpp contains an internal class Timer which can be used for fine-grained benchmarking. Romain motivated Timer in a post to the mailing * list where Timer is used to measure the different components of the costs of random number...

Tyler Cowen links to a post by Sean Taylor, who writes the following about users of R: You are willing to invest in learning something difficult. You do not care about aesthetics, only availability of packages and getting results quickly. To me, R is easy and Sas is difficult. I once worked with some students The post The...

It’s that time of year again – we’ve just posted our Call for Papers for the R/Finance 2013 conference, which focuses on applied finance using R. This is our fifth annual conference, again organized by a group of R package authors and community contributors and hosted by the International Center for Futures and Derivatives (ICFD)

Often in SEM scripts you will see matrices being pre- and post-multiplied by some other matrix. For instance, this figures in scripts computing the genetic correlation between variables. How does pre- and post-multiplying a variance...

Last part of our short series about the Finnish social security number (Fssn). You can check part 1 here, and part 2 here. This last post we are interested in generating random Fssn's. This has no real world applications. It is just a coding excercis...

Continuing our theme from last time. The Finnish social security number (FSSn) has the form xxxxxxyzzzq, where the check digit is q. If you want to check if the FSSn is real the check digit is matched to the remainder of xxxxxxzzz / 31. The check di...

At its very essence, Rcpp permits easy access to native R objects at the C++ level. R objects can be simple vectors, list or matrices; compound data structures created from these; objects of S3, S4 or Reference Class vintage; or language objects a...

Update on 2013/01/05: Xiao Nan in the comments pointed out that apply(combn(letters, 2), 2, paste0, collapse = '') was wrong for all two-letter usernames, and indeed it was. It is not a combination problem. Now I use his elegant outer() solution. One can also use expand.grid(letters, letters). Github decided to take off their downloads service, and I was very...

Weighted averaging regression and calibration is the most widely used method for developing a palaeolimnological transfer function. Such models are used to reconstruct properties of the past lake environment such as pH, total phosphorus, and water temperature with, it has to be said, varying degrees of success and usefulness. In simple weighted averaging (WA) there is little to specify other...

Today I’m very happy and so excited to announce my new book PLS Path Modeling with R, freely available in pdf format at: www.gastonsanchez.com/PLS_Path_Modeling_with_R.pdf After working and writing insanely for a couple of months in the last quarter of 2012, I’ve finally achieved this personal milestone that means so much to me. What started as a … Continue reading...

If you're anything like me, you use words to name both data frames and variables. I use metadata documentation to keep track of new variables I have constructed, and the logic I have used to construct them (and why I bothered!) but I find many variable...

Introduction Just as Pocket fundamentally changed my reading behaviour, I am finding that now having Netflix (and even before that, other downloadable or streaming digital content) is really changing my behaviour as far as television is concerned. Where watching TV used to be an affair of browsing through 500 channels and complaining there was nothing on, now with the...