[This article was first published on HOXO-M - anonymous data analyst group in Japan - , and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## 1. Overview

Density ratio estimation is described as follows: for given two data samples \$x\$ and \$y\$ from unknown distributions \$p(x)\$ and \$q(y)\$ respectively, estimate \$\$ w(x) = frac{p(x)}{q(x)} \$\$ where \$x\$ and \$y\$ are \$d\$-dimensional real numbers.

The estimated density ratio function \$w(x)\$ can be used in many applications such as the inlier-based outlier detection  and covariate shift adaptation . Other useful applications about density ratio estimation were summarized by Sugiyama et al. (2012) .

The package densratio provides a function `densratio()` that returns a result has the function to estimate density ratio `compute_density_ratio()`.

For example,

`set.seed(3)<br />x <- rnorm(200, mean = 1, sd = 1/8)<br />y <- rnorm(200, mean = 1, sd = 1/2)<br /><br />library(densratio)<br />result <- densratio(x, y)<br />result`
`## <br />## Call:<br />## densratio(x = x, y = y, method = "uLSIF")<br />## <br />## Kernel Information:<br />##   Kernel type:  Gaussian RBF <br />##   Number of kernels:  100 <br />##   Bandwidth(sigma):  0.1 <br />##   Centers:  num [1:100, 1] 1.007 0.752 0.917 0.824 0.7 ...<br />## <br />## Kernel Weights(alpha):<br />##   num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...<br />## <br />## The Function to Estimate Density Ratio:<br />##   compute_density_ratio()`

In this case, the true density ratio \$w(x)\$ is known, so we can compare \$w(x)\$ with the estimated density ratio \$hat{w}(x)\$.

`true_density_ratio <- function(x) dnorm(x, 1, 1/8) / dnorm(x, 1, 1/2)<br />estimated_density_ratio <- result\$compute_density_ratio<br /><br />plot(true_density_ratio, xlim=c(-1, 3), lwd=2, col="red", xlab = "x", ylab = "Density Ratio")<br />plot(estimated_density_ratio, xlim=c(-1, 3), lwd=2, col="green", add=TRUE)<br />legend("topright", legend=c(expression(w(x)), expression(hat(w)(x))), col=2:3, lty=1, lwd=2, pch=NA)` ## 2. How to Install

You can install the densratio package from CRAN.

`install.packages("densratio")`

You can also install the package from GitHub.

`install.packages("devtools") # if you have not installed "devtools" package<br />devtools::install_github("hoxo-m/densratio")`

The source code for densratio package is available on GitHub at

## 3. Details

### 3.1. Basics

The package provides `densratio()` that the result has the function to estimate density ratio.

For data samples `x` and `y`,

`library(densratio)<br /><br />result <- densratio(x, y)`

In this case, `result\$compute_density_ratio()` can compute estimated density ratio.

### 3.2. Methods

`densratio()` has `method` parameter that you can pass `"uLSIF"` or `"KLIEP"`.

• uLSIF (unconstrained Least-Squares Importance Fitting) is the default method. This algorithm estimates density ratio by minimizing the squared loss. You can find more information in Hido et al. (2011) .

• KLIEP (Kullback-Leibler Importance Estimation Procedure) is the anothor method. This algorithm estimates density ratio by minimizing Kullback-Leibler divergence. You can find more information in Sugiyama et al. (2007) .

The both methods assume that the denity ratio is represented by linear model: \$\$ w(x) = alpha_1 K(x, c_1) + alpha_2 K(x, c_2) + … + alpha_b K(x, c_b) \$\$ where \$\$ K(x, c) = expleft(frac{-|x – c|^2}{2 sigma ^ 2}right) \$\$ is the Gaussian RBF.

`densratio()` performs the two main jobs:

• First, deciding kernel parameter \$sigma\$ by cross validation,
• Second, optimizing kernel weights \$alpha\$.

As the result, you can obtain `compute_density_ratio()`.

### 3.3. Result and Paremeter Settings

`densratio()` outputs the result like as follows:

`## <br />## Call:<br />## densratio(x = x, y = y, method = "uLSIF")<br />## <br />## Kernel Information:<br />##   Kernel type:  Gaussian RBF <br />##   Number of kernels:  100 <br />##   Bandwidth(sigma):  0.1 <br />##   Centers:  num [1:100, 1] 1.007 0.752 0.917 0.824 0.7 ...<br />## <br />## Kernel Weights(alpha):<br />##   num [1:100] 0.4044 0.0479 0.1736 0.125 0.0597 ...<br />## <br />## Regularization Parameter(lambda):  <br />## <br />## The Function to Estimate Density Ratio:<br />##   compute_density_ratio()`

• Kernel type is fixed by Gaussian RBF.
• The number of kernels is the number of kernels in the linear model. You can change by setting `kernel_num` parameter. In default, `kernel_num = 100`.
• Bandwidth(sigma) is the Gaussian kernel bandwidth. In default, `sigma = "auto"`, the algorithms automatically select the optimal value by cross validation. If you set `sigma` a number, that will be used. If you set a numeric vector, the algorithms select the optimal value in them by cross validation.
• Centers are centers of Gaussian kernels in the linear model. These are selected at random from the data sample `x` underlying a numerator distribution `p_nu(x)`. You can find the whole values in `result\$kernel_info\$centers`.
• Kernel weights are alpha parameters in the linear model. It is optimaized by the algorithms. You can find the whole values in `result\$alpha`.
• The funtion to estimate density ratio is named `compute_density_ratio()`.

## 4. Multi Dimensional Data Samples

In the above, the input data samples `x` and `y` were one dimensional. `densratio()` allows to input multidimensional data samples as `matrix`.

For example,

`library(densratio)<br />library(mvtnorm)<br /><br />set.seed(71)<br />x <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/8, 2))<br />y <- rmvnorm(300, mean = c(1, 1), sigma = diag(1/2, 2))<br /><br />result <- densratio(x, y)<br />result`
`## <br />## Call:<br />## densratio(x = x, y = y, method = "uLSIF")<br />## <br />## Kernel Information:<br />##   Kernel type:  Gaussian RBF <br />##   Number of kernels:  100 <br />##   Bandwidth(sigma):  0.316 <br />##   Centers:  num [1:100, 1:2] 1.178 0.863 1.453 0.961 0.831 ...<br />## <br />## Kernel Weights(alpha):<br />##   num [1:100] 0.145 0.128 0.138 0.187 0.303 ...<br />## <br />## Regularization Parameter(lambda):  0.1 <br />## <br />## The Function to Estimate Density Ratio:<br />##   compute_density_ratio()`

Also in this case, we can compare the true density ratio with the estimated density ratio.

`true_density_ratio <- function(x) {<br />  dmvnorm(x, mean = c(1, 1), sigma = diag(1/8, 2)) /<br />    dmvnorm(x, mean = c(1, 1), sigma = diag(1/2, 2))<br />}<br />estimated_density_ratio <- result\$compute_density_ratio<br /><br />N <- 20<br />range <- seq(0, 2, length.out = N)<br />input <- expand.grid(range, range)<br />z_true <- matrix(true_density_ratio(input), nrow = N)<br />z_hat <- matrix(estimated_density_ratio(input), nrow = N)<br /><br />par(mfrow = c(1, 2))<br />contour(range, range, z_true, main = "True Density Ratio")<br />contour(range, range, z_hat, main = "Estimated Density Ratio")` The dimensions of `x` and `y` must be same.