Randomly deleting duplicate rows from a dataframe

January 22, 2013
By
Randomly deleting duplicate rows from a dataframe

I use R a lot in my day to day workflow, particularly for manipulating raw data files into a format that can be used for analysis. This is often a brain-taxing exercise and, sometimes, it would be totally quicker to … Continue reading →

Read more »

Randomly deleting duplicate rows from a dataframe

January 22, 2013
By
Randomly deleting duplicate rows from a dataframe

I use R a lot in my day to day workflow, particularly for manipulating raw data files into a format that can be used for analysis. This is often a brain-taxing exercise and, sometimes, it would be totally quicker to … Continue reading →

Read more »

Quick conversion of a list of lists into a data frame

January 22, 2013
By
Quick conversion of a list of lists into a data frame

Data frames are one of R’s distinguishing features. Exposing a list of lists as an array of cases, they make many formal operations such as regression or optimization easy to represent. The R data.frame operation for lists is quite slow, in large part because it exposes a vast amount of functionality. This sample shows one way to write a much...

Read more »

Quick conversion of a list of lists into a data frame

January 22, 2013
By
Quick conversion of a list of lists into a data frame

Data frames are one of R’s distinguishing features. Exposing a list of lists as an array of cases, they make many formal operations such as regression or optimization easy to represent. The R data.frame operation for lists is quite slow, in large part because it exposes a vast amount of functionality. This sample shows one way to write a much...

Read more »

A copper toned publication!

January 21, 2013
By
A copper toned publication!

At long last (1.5yrs since the first submission attempt to be exact), the research I worked on as a post-doctoral fellow has been published!Click on the image above to head over to the article for some light reading.  A lot of work went into this ...

Read more »

Data fishing: R and XML part 2

January 21, 2013
By
Data fishing: R and XML part 2

I’m constantly amazed at what can be done using free software, such as R, and more importantly, what can be done with data that are available on the internet. In an earlier post, I confessed to my sedentary lifestyle immersed in code, so my opinion regarding the utility of open-source software is perhaps biased. None

Read more »

A strained Data Science analogy

January 21, 2013
By

In the sponsored article Data Science: Buyer Beware at Forbes, SAP's Ray Rivera takes a dim view of Data Science. According to Rivera, Data Science is a "management fad" in the mold of Business Process Reengineering, and casts data scentists as self-ordained "gurus" whose mission is to stand between the "ignorant masses" that need access to data and a...

Read more »

Montreal R User group meetup at Wajam

January 21, 2013
By
Montreal R User group meetup at Wajam

This Thursday (Jan 24th), 5:30pm, the good folks at Wajam are hosting a meetup of the Montreal R User Group. The event will be at Bolidea at 4115 St Laurent, Montréal, QC. Be sure to RSVP. From Benjamin Rollert: This is an opportunity for people interested in R to hang out at our office, eat

Read more »

digest 0.6.1

January 21, 2013
By

digest version 0.6.1 is now on CRAN, and I will push the corresponding version into Debian shortly. Duncan Murdoch added AES support, and helped me fix two issues which (annoyingly) made the Rout.save output differ on another platform. CRANberries...

Read more »

El Nino and ggplot2

January 21, 2013
By

Some days ago i've got stucked in what appears to be a simple task: read some El Nino data, and then plot it. The problem was the data file format, which you can find here: El Nino Data. There's a mix between 'white space' and '.' characters, so read....

Read more »

Americans Live Longer and Work Less

January 21, 2013
By
Americans Live Longer and Work Less

Today I saw an article on Hacker News entitled, “America’s CEOs Want You to Work Until You’re 70″. I was particularly surprised by this article appearing out of the blue because I take it for granted that America will eventually have to raise the retirement age to avoid bankruptcy. After reading the article, I wasn’t

Read more »

Improved evolution of correlations

January 21, 2013
By

Update June 2013: A systematic analysis of the topic has been published:Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609-612. doi:10.1016/j.jrp.2013.05.009 Check also the supplementary website, where you can find the PDF of the paper. As an update of this post: here’s an

Read more »

Clustering and sector strength

January 21, 2013
By
Clustering and sector strength

An exploration of the usefulness of sectors. Previously This subject was discussed in “S&P 500 sector strengths”. Idea Stocks are put into groups based on the sector that the company is considered to be in.  Cluster analysis is a statistical technique that finds groups.  If sectors really move together, then clustering should recover sectors.  Will … Continue reading...

Read more »

“Introduction to R” Course February 21-22, 2013

January 21, 2013
By

Milano R net, in collaboration with Quantide, organizes "Introduction to R" Course February 21-22, 2013 Course description This two days course aims to provide an overview of the basic R environment and its applications. This course is intendended as a … Continue reading →

Read more »

Passing user-supplied C++ functions

January 21, 2013
By
Passing user-supplied C++ functions

Baptiste asked on StackOverflow about letting users supply C++ functions for use with Armadillo / RcppArmadillo. This posts helps with an extended answer. There is nothing specific about Armadillo here, this would the same way with Eigen, the GSL or any other library a user wants to support (and provides his or her own as<>() and wrap() converters...

Read more »

Passing user-supplied C++ functions

January 21, 2013
By
Passing user-supplied C++ functions

Baptiste asked on StackOverflow about letting users supply C++ functions for use with Armadillo / RcppArmadillo. This posts helps with an extended answer. There is nothing specific about Armadillo here, this would the same way with Eigen, the GSL or any other library a user wants to support (and provides his or her own as<>() and wrap() converters...

Read more »

Translating Weird R Errors

January 20, 2013
By
Translating Weird R Errors

I love R. I think it's intuitive and clever and overall a great language. But I do get really annoyed sometimes at the completely ridiculous, cryptic error messages it often gives me.  This post will go over some of those seemingly nonsensical err...

Read more »

Looking to boxplots (Shootout 2012)

January 20, 2013
By
Looking to boxplots (Shootout 2012)

Boxplots are a nice way to compare the three sample sets of the Shoot-out 2012 data files.There is a category variable (Set) in the data frame with the labels (Cal = Training Set, Test = Test Set and Val = Validation Set). # IMPORTING THE SAMPLE SETS #...

Read more »

Unemployment

January 20, 2013
By
Unemployment

I want to exercise a bit more with ggplot2 and there is always data to be gotten from Eurostat which is interesting. In Netherlands the statistics agency (CBS) brought these headlines (translated with http://translate.google.nl/):Unemployment...

Read more »

texreg: A package for beautiful and easily customizable LaTeX regression tables from R

January 20, 2013
By
texreg: A package for beautiful and easily customizable LaTeX regression tables from R

There was a very informative post last week showing how the R package stargazer is used to generate nice LaTeX tables from a number of R objects. This package looks very useful. However, I would like to extol the virtues of another R package that converts model objects in R into LaTeX code: texreg. For

Read more »

Plotting Tick Data with ggplot2

January 20, 2013
By

Here are some examples of using ggplot2 and kdb+ together to produce some simple graphs of data stored in kdb+. I am using the qserver extension for R (http://code.kx.com/wsvn/code/cookbook_code/r/) to connect to a running kdb+ instance from within R. First, … Continue reading →

Read more »

Robust Estimators of Location and Scale

January 20, 2013
By
Robust Estimators of Location and Scale

First, the median_Rcpp function is defined to compute the median of the given input vector. It is assumed that the input vector is unsorted, so a copy of the input vector is made using clone and then std::nth_element is used to access the nth sorted element. Since we only care about accessing one sorted element of the vector...

Read more »

Custom as and wrap converters example

January 20, 2013
By
Custom as and wrap converters example

The RcppBDT package interfaces Boost.Date_Time with R. Both systems have their own date representations—and this provides a nice example of custom as<>() and wrap() converters. Here, we show a simplified example. We start with the forward declarations: #include <RcppCommon.h> #include <boost/date_time/gregorian/gregorian_types.hpp> // Gregorian calendar types, no I/O namespace Rcpp { // 'date' class boost::gregorian::date // ...

Read more »

Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.

January 20, 2013
By
Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.

Consider the following matrix nr <- nc <- 6 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) sum(m)/length(m) 0.1667 dimnames(m) <- list(letters, letters) m a b c d e f a 0 0 0 0 0 1 b 0 0 0 1 0 1 c 0 0 0 0 0 0 d 0 0 0 0 0 0 e 1 1 0 0 0 0 f 0...

Read more »

Robust Estimators of Location and Scale

January 20, 2013
By
Robust Estimators of Location and Scale

First, the median_Rcpp function is defined to compute the median of the given input vector. It is assumed that the input vector is unsorted, so a copy of the input vector is made using clone and then std::nth_element is used to access the nth sorted element. Since we only care about accessing one sorted element of the vector...

Read more »

Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.

January 20, 2013
By
Coercion of matrix to sparse matrix (dgCMatrix) and maintaining dimnames.

Consider the following matrix nr <- nc <- 6 set.seed <- 123 m <- matrix(sample(c(rep(0,9), 1),nr*nc, replace=T), nrow=nr, ncol=nc) sum(m)/length(m) 0.1667 dimnames(m) <- list(letters, letters) m a b c d e f a 0 0 0 0 0 1 b 0 0 0 1 0 1 c 0 0 0 0 0 0 d 0 0 0 0 0 0 e 1 1 0 0 0 0 f 0...

Read more »

Custom as and wrap converters example

January 20, 2013
By
Custom as and wrap converters example

The RcppBDT package interfaces Boost.Date_Time with R. Both systems have their own date representations—and this provides a nice example of custom as<>() and wrap() converters. Here, we show a simplified example. We start with the forward declarations: #include <RcppCommon.h> #include <boost/date_time/gregorian/gregorian_types.hpp> // Gregorian calendar types, no I/O namespace Rcpp { // 'date' class boost::gregorian::date // ...

Read more »

Weekend Reading – S&P 500 Visual History

January 19, 2013
By
Weekend Reading – S&P 500 Visual History

Michael Johnston at the ETF Database shared a very interesting post with me over the holidays. The S&P 500 Visual History – is an interactive post that shows the top 10 components in the S&P 500 each year, going back to 1980. On a different note, Judson Bishop contributed a plota.recession() function to add recession

Read more »

A slightly different introduction to R, part I

January 19, 2013
By
A slightly different introduction to R, part I

Note in Swedish: Jag hoppas läsaren ursäktar att jag skriver på engelska då och då. This will be a brief introduction to using the statistics software R for biologists who want to do some of their data analysis in R. There are plenty of introductions to R (see here and here, for example; these are

Read more »

Sponsors