Just another R blog

January 2, 2013
By

New year, new resolutions. This year, as a personal challenge, I decided to create a blog where I could share (and also receive) some tricks and tips about R programming language. The main motivation behind this blog is to learn how to use Knitr (http://yihui.name/knitr/). While I'm very concerned about the importance of...

Read more »

The (near) Future of Data Analysis – A Review

January 2, 2013
By
The (near) Future of Data Analysis – A Review

Sean Murphy co-organizes Data Business DC, among many other things. Hadley Wickham, having just taught workshops in DC for RStudio, shared with the DC R Meetup his view on the future, or at least the near future of Data Analysis. … Continue reading → The post The (near) Future of Data Analysis – A Review appeared first on...

Read more »

The Unravelling of Structured Investment Vehicles or Birthdays

January 2, 2013
By

The best way for me to achieve deep understanding of a theorem is not through lengthy proofs alone, but through practical application/implementation or as they said in the Marine Corps Pract-App. One of the many reasons I love R is the ease to write functions and test results. The 2008 financial crisis was the topic of a recent dinner

Read more »

You can’t spell loss reserving without R

January 2, 2013
By
You can’t spell loss reserving without R

Last year, I spent a morning trying to return to first principles when modeling loss reserves. (Brief aside to non-actuaries: a loss reserve is the financial provision set aside to pay for claims which have either not yet settled, or have not yet been reported. If that doesn’t sound fascinating, this will likely be a

Read more »

Computing for Data Analysis, and Other Free Courses

January 2, 2013
By

Coursera's free Computing for Data Analysis course starts today. It's a four week long course, requiring about 3-5 hours/week. A bit about the course:In this course you will learn how to program in R and how to use R for effective data analysis. Y...

Read more »

R code and data for book “R and Data Mining: Examples and Case Studies”

January 2, 2013
By
R code and data for book “R and Data Mining: Examples and Case Studies”

R code and data for book “R and Data Mining: Examples and Case Studies” are now available at http://www.rdatamining.com/books/rdm/code. An online PDF version of the book (the first 11  chapters only) can also be downloaded at http://www.rdatamining.com/docs. Below are its … Continue reading →

Read more »

NFL Code on Github

January 2, 2013
By
NFL Code on Github

I’ve made some revisions and simplifications to the code to compile NFL data. It’s now all out on Github for anyone to play with in advance of the Superbowl. In the meantime, here’s a lovely picture comparing every team’s offense- as measured by total offensive yards- against their defenders. Note the anemic Chicago offense. https://github.com/PirateGrunt/NFL

Read more »

Packages v. Libraries in R

January 2, 2013
By
Packages v. Libraries in R

In the past I've used the terms "R library" and "R package" synonymously (e.g. this blog post and this paper), but a careful reader has called me out. Mark Sharp notes that there are differences between libraries and packages. Chapter one of the R Manual Writing R Extensions gives the details: A package is a directory of files which I encourage you...

Read more »

The (near) Future of Data Analysis – A Review

January 2, 2013
By
The (near) Future of Data Analysis – A Review

Sean Murphy co-organizes Data Business DC, among many other things. Hadley Wickham, having just taught workshops in DC for RStudio, shared with the DC R Meetup his view on the future, or at least the near future of Data Analysis. Herein lies my notes for this talk, spiffed up into semi-comprehensible language. Please...

Read more »

Producing animated GIFs and Videos

January 2, 2013
By
Producing animated GIFs and Videos

It took me a while to figure out how to use the animation package on my Windows OS. In making an animated GIF, the problem seems to have been quite simple in the end (and I should have been more patient in reading the instructions!) - Following installation of the program ImageMagick, one has...

Read more »

Clone all your gists locally with R

January 2, 2013
By

I really like gists as a quick way to include more lengthly code snippets into my blog posts. However, I am not a git user as such, and so I was quite concerned when I noticed that all my gists on this blog had vanished after Christmas. I suppose this was a result of Github's downtime...

Read more »

Armadillo subsetting

January 2, 2013
By

A StackOverflow question asked how convert from arma::umat to arma::mat. The former is format used for find and other logical indexing. For the particular example at hand, a call to the conv_to converter provided the solution. We rewrite the answer he...

Read more »

Happy International Year of Statistics

January 2, 2013
By
Happy International Year of Statistics

2013 promises to be a great year for all statistics aficionado as today is the first day of the International Year of Statistics. More than 1400 organizations from 108 countries — professional

Read more »

Multiple Classification and Authorship of the Hebrew Bible

January 1, 2013
By
Multiple Classification and Authorship of the Hebrew Bible

Sitting in my synagogue this past Saturday, I started thinking about the authorship analysis that I did using function word counts from texts authored by Shakespeare, Austen, etc.  I started to wonder if I could do something similar with the … Continue reading →

Read more »

Efficiecy of Extracting Rows from A Data Frame in R

January 1, 2013
By
Efficiecy of Extracting Rows from A Data Frame in R

In the example below, 552 rows are extracted from a data frame with 10 million rows using six different methods. Results show a significant disparity between the least and the most efficient methods in terms of CPU time. Similar to the finding in my previous post, the method with data.table package is the most efficient

Read more »

Polarisation and Mobilisation indicators

January 1, 2013
By
Polarisation and Mobilisation indicators

This blog post makes available a set of indicators discussed in a forthcoming edition of Digital Icons. In brief, the script takes a text input and calculates polarisation and mobilisation indexes based on the number of pronouns featured.The hypothesised relationship between pronouns and polarisation is one discussed extensively by critical discourse analysts, social...

Read more »

Standard Normal Variate (SNV: Other way)

January 1, 2013
By

This is another way to pre-treat aspectra set with the SNV math-treatment (Standard Normal Variate). You can see the other one in the post : "Standard Normal Variate (SNV)".In this post, I use the R function "sweep".library(ChemometricsWithR)#in a...

Read more »

Unicode in R packages (not)

January 1, 2013
By

Perhaps you are trying to add your nice new object as data for an R package. But wait. It has  foreign letters in its dimnames, so ’R CMD check’ will certainly complain. What you need is something to turn R’s natural Unicode-processing goodness into a relic from the early days of computing without inadvertently aliasing any words

Read more »

Sugar Functions head and tail

January 1, 2013
By

The R functions head and tail return the first (last) n elements of the input vector. With Rcpp sugar, the functions head and tail work the same way as they do in R. Here we use std::sort from the STL and then tail to return the top n items (items wit...

Read more »

STL for_each and generalized iteration

January 1, 2013
By

The STL contains a very general looping or sweeping construct in the for_each algorith. It can be used with function objects (such as the simple square function used here) but also with custom class which can be used to keep to keep state. #include <...

Read more »

Anaerobic Stress in Seeds – A Chemical Similarity Network Story

December 31, 2012
By
Anaerobic Stress in Seeds – A Chemical Similarity Network Story

The chemical similarity network or CSN is a great tool for organizing biological data based on known biochemistry or chemical structural similarity. Here is an example CSN for visualizing metabolomic  changes (measured via GC/TOF) due to anaerobic stress in germinating seeds. In this network edges are formed for chemical similarity scores > 75. Node color describes

Read more »

Getting Genetics Done 2012 In Review

December 31, 2012
By

Here are links to all of this year's posts (excluding seminar/webinar announcements), with the most visited posts in bold italic. As always, you can follow me on Twitter for more frequent updates. Happy new year!New Year's Resolution: Learn How to Code...

Read more »

The forward (explicit) Euler method

December 31, 2012
By
The forward (explicit) Euler method

The forward (explicit) Euler method is a first-order numerical procedure for solving ODEs with a given initial value. The forward Euler method is said to be the simplest and most obvious numerical ODEs integrator. In fact, the simulation using the forward Euler only … Continue reading →

Read more »

Nested loops with mapply

December 31, 2012
By
Nested loops with mapply

So as I sink deeper into the second level of R enlightenment, one thing troubled me. “lapply” is fine for looping over a single vector of elements, but it doesn’t do a nested loop structure. These tend to be pretty ubiquitous for me. I’m forever doing the same thing to a set of two or three

Read more »

Top Posts of 2012

December 31, 2012
By
Top Posts of 2012

This has been a great year for my blog. I've seen tremendous growth in my subscribers. I look forward to engaging with and learning from my followers in 2013 and I plan to offer valuable content in return. If you're interested in following along, you can quickly subscribe via RSS or e-mail. I use Google I encourage you...

Read more »

How odd was the UEFA draw?

December 31, 2012
By

I've been away for some time without closely following the media, and without significant internet access. When such a period is over it takes some time to regain momentum. Thus my short exit poll series will be continued in 2013. For now I'm still sor...

Read more »

Learning RStudio for R Statistical Computing

December 31, 2012
By
Learning RStudio for R Statistical Computing

I am happy to announce that our book on RStudio has been released last week.

Read more »

STL random_sample

December 31, 2012
By

An earlier post looked at random shuffle for permutations. The STL also supports creation of random samples. Alas, it seems that this functionality has not been promoted to the C++ standard yet — so we will have to do with what is an extensions ...

Read more »

Software engineer’s guide to getting started with data science

December 30, 2012
By
Software engineer’s guide to getting started with data science

Many of my software engineer friends ask me about learning data science. There are many articles on this subject from renowned data scientists (Dataspora, Gigaom, Quora, Hilary Mason). This post captures my journey (a software engin...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.