Blog Archives

Growing List vs Growing Queue

November 17, 2018
By

### GROWING LIST ### base_lst1

Read more »

Convert Data Frame to Dictionary List in R

November 16, 2018
By

In R, there are a couple ways to convert the column-oriented data frame to a row-oriented dictionary list or alike, e.g. a list of lists. In the code snippet below, I would show each approach and how to extract keys and values from the dictionary. As shown in the benchmark, it appears that the generic

Read more »

Monotonic Binning with Equal-Sized Bads for Scorecard Development

October 14, 2018
By

In previous posts (https://statcompute.wordpress.com/2017/01/22/monotonic-binning-with-smbinning-package) and (https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression), I’ve developed 2 different algorithms for monotonic binning. While the first tends to generate bins with equal densities, the second would define finer bins based on the isotonic regression. In the code snippet below, a third approach would be illustrated for the purpose to generate bins with roughly equal-sized

Read more »

By-Group Summary with SparkR – Follow-up for A Reader Comment

September 23, 2018
By

A reader, e.g. Mr. Wayne Zhang, of my previous post (https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation) made a good comment that “Why not use directly either Spark or H2O to derive such computations without involving detailed map/reduce”. Although Spark is not as flexible as R in the statistical computation (in my opinion), it does have advantages for munging large-size data

Read more »

Union Multiple Data.Frames with Different Column Names

September 22, 2018
By
Union Multiple Data.Frames with Different Column Names

On Friday, while working on a project that I needed to union multiple data.frames with different column names, I realized that the base::rbind() function doesn’t take data.frames with different columns names and therefore just quickly drafted a rbind2() function on the fly to get the job done based on the idea of MapReduce that I

Read more »

Why Vectorize?

September 16, 2018
By

In the post (https://statcompute.wordpress.com/2018/09/15/how-to-avoid-for-loop-in-r), I briefly introduced the idea of vectorization and potential use cases. One might be wondering why we even need the Vectorize() function given the fact that it is just a wrapper and whether there is any material efficiency gain by vectorizing a function. It is true that the Vectorize() function is

Read more »

How to Avoid For Loop in R

September 15, 2018
By

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency. In R, there are two ways to implement the same functionality of a FOR loop. The first

Read more »

Modeling Frequency Outcomes with Ordinal Models

September 10, 2018
By
Modeling Frequency Outcomes with Ordinal Models

When modeling frequency outcomes, we often need to go beyond the standard Poisson regression due to the strict distributional assumption and to consider more flexible alternatives. In general, there are two broad categories of modeling approaches in light of practical concerns about frequency outcomes. The first category of models are mainly intended to address the

Read more »

Playing Map() and Reduce() in R – Subsetting

September 8, 2018
By
Playing Map() and Reduce() in R – Subsetting

In the previous post (https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation), I’ve shown how to employ the MapReduce when calculating by-group statistics. Actually, the same Divide-n-Conquer strategy can be applicable to other use cases, one of which is the subsetting operation. In the example below, let’s still use the same iris data for the demonstration purpose. In R, the most convenient

Read more »

Playing Map() and Reduce() in R – By-Group Calculation

September 3, 2018
By

Clojure is such an interesting programming language that it can not only enhance our skill set but also change the way how we should write the program. After learning Clojure, I can’t help thinking about how to employ the functional programming and MapReduce paradigm to improve our experience with other programming languages, e.g. R in

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)