Monthly Archives: January 2016

Data Cleaning Part 1 – NYC Taxi Trip Data, Looking For Stories Behind Errors

January 31, 2016
By
Data Cleaning Part 1 – NYC Taxi Trip Data, Looking For Stories Behind Errors

SummaryData cleaning is a cumbersome but important task for Data Science project in reality. This is a discussion on my practice of data cleaning for NYC Taxi Trip data. There are lots of domain knowledge, common sense and business thinking involved.

Read more »

Using RcppNT2 to Compute the Sum

January 31, 2016
By
Using RcppNT2 to Compute the Sum

Introduction The Numerical Template Toolbox (NT2) collection of header-only C++ libraries that make it possible to explicitly request the use of SIMD instructions when possible, while falling back to regular scalar operations when not. NT2 itself is powered by Boost, alongside two proposed Boost libraries – Boost.Dispatch, which provides a mechanism for efficient tag-based dispatch for functions, and Boost.SIMD, which provides a framework for the implementation of...

Read more »

Connecting Religion and Demographics

January 31, 2016
By
Connecting Religion and Demographics

I have my second guest post up today at Ari Lamstein’s blog where I conclude my exploration of the Religious Congregations and Membership Study at the ARDA. In this post I show how we can look at the relationships between a data set like the religion census and demographic data to gain context and understanding. Go over...

Read more »

Using RcppNT2 to Compute the Variance

January 31, 2016
By
Using RcppNT2 to Compute the Variance

Introduction The Numerical Template Toolbox (NT2) collection of header-only C++ libraries that make it possible to explicitly request the use of SIMD instructions when possible, while falling back to regular scalar operations when not. NT2 itself is powered by Boost, alongside two proposed Boost libraries – Boost.Dispatch, which provides a mechanism for efficient tag-based dispatch for functions, and Boost.SIMD, which provides a framework for the implementation of...

Read more »

Introduction to RcppNT2

January 31, 2016
By
Introduction to RcppNT2

Modern CPU processors are built with new, extended instruction sets that optimize for certain operations. A class of these allow for vectorized operations, called Single Instruction / Multiple Data (SIMD) instructions. Although modern compilers will use these instructions when possible, they are often unable to reason about whether or not a particular block of code can be executed using SIMD instructions. The Numerical Template Toolbox...

Read more »

Shiny Developer Conference

January 31, 2016
By
Shiny Developer Conference

Really enjoying RStudio‘s Shiny Developer Conference | Stanford University | January 2016. Winston Chang just demonstrated profvis, really slick. You can profile code just by wrapping it in a profvis({}) block and the results are exported as interactive HTML widgets. For example, running the R code below: if(!('profvis' %in% rownames(installed.packages()))) { devtools::install_github('rstudio/profvis') } library('profvis') nrow … Continue reading...

Read more »

R Tagosphere!

January 31, 2016
By
R Tagosphere!

This post explores the inter-relationships of StackOverflow Tags for R-related questions. So I grabbed all the questions tagged with “r”, took the other tags in each question and made some network charts that show how often each tag is seen with the other tags. The point is to see the empirical relationships …

Read more »

Pitfall of XML package: issues specific to cp932 locale, Japanese Shift-JIS, on Windows

January 31, 2016
By
Pitfall of XML package:  issues specific to cp932 locale, Japanese Shift-JIS, on Windows

CRAN package XML has something wrong at parsing html pages encoded in cp932 (shift-jis).  In this report, I will show these issues and also their solutions which is workable at … Continue reading →

Read more »

Hillary Clinton’s Biggest 2016 Rival: Herself

January 30, 2016
By
Hillary Clinton’s Biggest 2016 Rival: Herself

In a recent post I noted that despite Bernie Sanders doing better in many important indicators, Obama 2008 received 3x more media coverage than Sanders 2016.Reasonably, a reader of my blog noted that not all coverage was equal, that a presidential hope...

Read more »

Strategies to Speedup R Code

January 30, 2016
By
Strategies to Speedup R Code

The for-loop in R, can be very slow in its raw un-optimised form, especially when dealing with larger data sets. There are a number of ways you can make your logics run fast, but you will be really surprised how fast you can actually go. This posts shows a number of approaches including simple tweaks

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)