Articles by statcompute

Convert Data Frame to Dictionary List in R

November 16, 2018 | statcompute

In R, there are a couple ways to convert the column-oriented data frame to a row-oriented dictionary list or alike, e.g. a list of lists. In the code snippet below, I would show each approach and how to extract keys and values from the dictionary. As shown in the ... [Read more...]

Monotonic Binning with Equal-Sized Bads for Scorecard Development

October 14, 2018 | statcompute

In previous posts (https://statcompute.wordpress.com/2017/01/22/monotonic-binning-with-smbinning-package) and (https://statcompute.wordpress.com/2017/06/15/finer-monotonic-binning-based-on-isotonic-regression), I’ve developed 2 different algorithms for monotonic binning. While the first tends to generate bins with equal densities, the second would define finer bins based on the isotonic regression. In the code snippet below, a third ... [Read more...]

By-Group Summary with SparkR – Follow-up for A Reader Comment

September 23, 2018 | statcompute

A reader, e.g. Mr. Wayne Zhang, of my previous post (https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation) made a good comment that “Why not use directly either Spark or H2O to derive such computations without involving detailed map/reduce”. Although Spark is not as flexible as R in the statistical ... [Read more...]

Union Multiple Data.Frames with Different Column Names

September 22, 2018 | statcompute

On Friday, while working on a project that I needed to union multiple data.frames with different column names, I realized that the base::rbind() function doesn’t take data.frames with different columns names and therefore just quickly drafted a rbind2() function on the fly to get the job ...
[Read more...]

Why Vectorize?

September 16, 2018 | statcompute

In the post (https://statcompute.wordpress.com/2018/09/15/how-to-avoid-for-loop-in-r), I briefly introduced the idea of vectorization and potential use cases. One might be wondering why we even need the Vectorize() function given the fact that it is just a wrapper and whether there is any material efficiency gain by vectorizing a ... [Read more...]

How to Avoid For Loop in R

September 15, 2018 | statcompute

A FOR loop is the most intuitive way to apply an operation to a series by looping through each item one by one, which makes perfect sense logically but should be avoided by useRs given the low efficiency. In R, there are two ways to implement the same functionality of ... [Read more...]

Modeling Frequency Outcomes with Ordinal Models

September 10, 2018 | statcompute

When modeling frequency outcomes, we often need to go beyond the standard Poisson regression due to the strict distributional assumption and to consider more flexible alternatives. In general, there are two broad categories of modeling approaches in light of practical concerns about frequency outcomes. The first category of models are ...
[Read more...]

Playing Map() and Reduce() in R – Subsetting

September 8, 2018 | statcompute

In the previous post (https://statcompute.wordpress.com/2018/09/03/playing-map-and-reduce-in-r-by-group-calculation), I’ve shown how to employ the MapReduce when calculating by-group statistics. Actually, the same Divide-n-Conquer strategy can be applicable to other use cases, one of which is the subsetting operation. In the example below, let’s still use the same ...
[Read more...]

Playing Map() and Reduce() in R – By-Group Calculation

September 3, 2018 | statcompute

Clojure is such an interesting programming language that it can not only enhance our skill set but also change the way how we should write the program. After learning Clojure, I can’t help thinking about how to employ the functional programming and MapReduce paradigm to improve our experience with ... [Read more...]

More Flexible Ordinal Outcome Models

August 28, 2018 | statcompute

In the previous post (https://statcompute.wordpress.com/2018/08/26/adjacent-categories-and-continuation-ratio-logit-models-for-ordinal-outcomes), we’ve shown alternative models for ordinal outcomes in addition to commonly used Cumulative Logit models under the proportional odds assumption, which are also known as Proportional Odds model. A potential drawback of Proportional Odds model is the lack of flexibility ... [Read more...]

Ordered Probit Model and Price Movements of High-Frequency Trades

August 19, 2018 | statcompute

The analysis of high frequency stock transactions has played an important role in the algorithmic trading and the result can be used to monitor stock movements and to develop trading strategies. In the paper “An Ordered Probit Analysis of Transaction Stock Prices” (1992), Hausman, Lo, and MacKinlay discussed estimating trade-by-trade stock ...
[Read more...]

Co-integration and Pairs Trading

July 29, 2018 | statcompute

The co-integration is an important statistical concept behind the statistical arbitrage strategy named “Pairs Trading”. While projecting a stock price with time series models is by all means difficult, it is technically feasible to find a pair of (or even a portfolio of) stocks sharing the common trend such that ...
[Read more...]

Mimicking SQLDF with MonetDBLite

May 9, 2018 | statcompute

Like many useRs, I am also a big fan of the sqldf package developed by Grothendieck, which uses SQL statement for data frame manipulations with SQLite embedded database as the default back-end. In examples below, I drafted a couple R utility functions with the MonetDBLite back-end by mimicking the sqldf ... [Read more...]

MLE with General Optimization Functions in R

May 3, 2018 | statcompute

In my previous post (https://statcompute.wordpress.com/2018/02/25/mle-in-r/), it is shown how to estimate the MLE based on the log likelihood function with the general-purpose optimization algorithm, e.g. optim(), and that the optimizer is more flexible and efficient than wrappers in statistical packages. A benchmark comparison are given ... [Read more...]

Read Random Rows from A Huge CSV File

April 28, 2018 | statcompute

Given R data frames stored in the memory, sometimes it is beneficial to sample and examine the data in a large-size csv file before importing into the data frame. To the best of my knowledge, there is no off-shelf R function performing such data sampling with a relatively low computing ... [Read more...]

Clojure Integration with R

April 4, 2018 | statcompute

(require '[tnoda.rashinban :as rr] '[tnoda.rashinban.core :as rc] '[clojure.core.matrix.dataset :as dt] '[clojure.core.matrix.impl.dataset :as id]) ;; CREATE A TOY DATA (def ds [{:id 1.0 :name "name1"} {:id 2.0 :n... [Read more...]

MLE in R

February 25, 2018 | statcompute

When I learned and experimented a new model, I always like to start with its likelihood function in order to gain a better understanding about the statistical nature. That’s why I extensively used the SAS/NLMIXED procedure that gives me more flexibility. Today, I spent a couple hours playing ... [Read more...]

R Interfaces to Python Keras Package

February 11, 2018 | statcompute

Keras is a popular Python package to do the prototyping for deep neural networks with multiple backends, including TensorFlow, CNTK, and Theano. Currently, there are two R interfaces that allow us to use Keras from R through the reticulate package. While the keras R package is able to provide a ... [Read more...]
1 2 3 4 5 8

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)