Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Box Cox transformation in R appeared first on Data Science Tutorials

What do you have to lose?. Check out Data Science tutorials here Data Science Tutorials.

Box Cox transformation in R, The Box-Cox transformation is a power transformation that eliminates nonlinearity between variables, differing variances, and variable asymmetry.

The ability to turn a variable into a new variable with a normal distribution is therefore very helpful.

## Box cox family

The following expression gives the Box-Cox functions transformations for various values of lambda:

being y the changed variable and lambda (λ) the transformation parameter However, the following table describes the most typical transformations:

In practise, it is advised to choose the value from the table rather than the precise value if the estimated transformation parameter is close to one of the values of the previous table because the value from the table is simpler to understand.

How to make a rounded corner bar plot in R? – Data Science Tutorials

### The boxcox function in R

The boxcox function from the MASS package in R can be used to estimate the transformation parameter using maximum likelihood estimation.

We will also receive the parameter’s 95% confidence interval from this function. The following are the arguments for the function:

```boxcox(object,
lambda = seq(-2, 2, 1/10),
plotit = TRUE,
interp,
eps = 1/50,
xlab = expression(lambda),
ylab = "log-Likelihood",
…) ```

#### Example

Take into account the sample vector x below, which deviates from the normal distribution:

```x <- c(0.2, 0.528, 0.11, 0.260, 0.091,
1.314, 1.52, 0.244, 1.981, 0.273,
0.461, 0.366, 1.407, 0.79, 2.266)

hist(x)```

You must compute a linear model with the lm function and pass it to the boxcox function as shown below in order to determine the appropriate “lambda”:

```library(MASS)
boxcox(lm(x ~ 1))```

Keep in mind that the others reflect the 95% confidence interval of the estimation, and the dashed vertical line in the middle represents the estimated parameter lambda hat.

The best choice is to apply the logarithmic transformation of the data because the preceding plot indicates that the 0 is inside the confidence interval of the optimal “lambda” and because the estimation of the parameter in this example is quite near to 0. (see the table of the first section).

How to Label Outliers in Boxplots in ggplot2? (datasciencetut.com)

```# Transformed data
new <- log(x)
# Histogram
hist(new)```

The data now appears to be more closely following a normal distribution, but you can also run a statistical test like the Shapiro-Wilk test to make sure:

```shapiro.test(new)
Shapiro-Wilk normality test
data:  new
W = 0.94531, p-value = 0.4538```

We lack evidence to reject the null hypothesis of normalcy because the p-value is higher than the typical levels of significance (1%, 5%, and 10%).

How to draw heatmap in r: Quick and Easy way – Data Science Tutorials

##### Extracting the exact lambda

You can determine the actual lambda using the following code if the confidence interval of the estimated parameter doesn’t fit with any of the table’s values:

```library(MASS)
b <- boxcox(lm(x ~ 1))
# Exact lambda
lambda <- b\$x[which.max(b\$y)]
lambda
0.02020202```

How to create Anatogram plot in R – Data Science Tutorials

Using the expression from the first part, you can now transform the variable:

`new_x_exact <- (x ^ lambda - 1) / lambda`

The post Box Cox transformation in R appeared first on Data Science Tutorials

Learn how to expert in the Data Science field with Data Science Tutorials.