Blog Archives

Mimicking SQLDF with MonetDBLite

May 9, 2018
By

Like many useRs, I am also a big fan of the sqldf package developed by Grothendieck, which uses SQL statement for data frame manipulations with SQLite embedded database as the default back-end. In examples below, I drafted a couple R utility functions with the MonetDBLite back-end by mimicking the sqldf function. There are several interesting

Read more »

MLE with General Optimization Functions in R

May 3, 2018
By

In my previous post (https://statcompute.wordpress.com/2018/02/25/mle-in-r/), it is shown how to estimate the MLE based on the log likelihood function with the general-purpose optimization algorithm, e.g. optim(), and that the optimizer is more flexible and efficient than wrappers in statistical packages. A benchmark comparison are given below showing the use case of other general optimizers commonly

Read more »

Read Random Rows from A Huge CSV File

April 28, 2018
By

Given R data frames stored in the memory, sometimes it is beneficial to sample and examine the data in a large-size csv file before importing into the data frame. To the best of my knowledge, there is no off-shelf R function performing such data sampling with a relatively low computing cost. Therefore, I drafted two

Read more »

LogRatio Regression – A Simple Way to Model Compositional Data

April 15, 2018
By

The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio. In the example below, I will show how

Read more »

Clojure Integration with R

April 4, 2018
By

(require ' ' ' ') ;; CREATE A TOY DATA (def ds [{:id 1.0 :name "name1"} {:id 2.0 :n...

Read more »

MLE in R

February 25, 2018
By

When I learned and experimented a new model, I always like to start with its likelihood function in order to gain a better understanding about the statistical nature. That’s why I extensively used the SAS/NLMIXED procedure that gives me more flexibility. Today, I spent a couple hours playing the optim() function and its wrappers, e.g.

Read more »

R Interfaces to Python Keras Package

February 11, 2018
By

Keras is a popular Python package to do the prototyping for deep neural networks with multiple backends, including TensorFlow, CNTK, and Theano. Currently, there are two R interfaces that allow us to use Keras from R through the reticulate package. While the keras R package is able to provide a flexible and feature-rich API, the

Read more »

Additional Thoughts on Estimating LGD with Proportional Odds Model

February 6, 2018
By

In my previous post (https://statcompute.wordpress.com/2018/01/28/modeling-lgd-with-proportional-odds-model), I’ve discussed how to use Proportional Odds Models in the LGD model development. In particular, I specifically mentioned that we would estimate a sub-model, which can be Gamma or Simplex regression, to project the conditional mean for LGD values in the (0, 1) range. However, it is worth pointing out

Read more »

Modeling LGD with Proportional Odds Model

January 28, 2018
By

The LGD model is an important component in the expected loss calculation. In https://statcompute.wordpress.com/2015/11/01/quasi-binomial-model-in-sas, I discussed how to model LGD with the quasi-binomial regression that is simple and makes no distributional assumption. In the real-world LGD data, we usually would observe 3 ordered categories of values, including 0, 1, and in-betweens. In cases with a

Read more »

Model Non-Negative Numeric Outcomes with Zeros

September 17, 2017
By
Model Non-Negative Numeric Outcomes with Zeros

As mentioned in the previous post (https://statcompute.wordpress.com/2017/06/29/model-operational-loss-directly-with-tweedie-glm/), we often need to model non-negative numeric outcomes with zeros in the operational loss model development. Tweedie GLM provides a convenient interface to model non-negative losses directly by assuming that aggregated losses are the Poisson sum of Gamma outcomes, which however might not be well supported empirically from

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)