Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to my data science blog datistics where I will gradually post all the vignettes and programming POC’s that I have written over the past two years. Most of them can be already found in my github repository.

I am using blogdown to create this blog and using R and RStudio. However I have recently taken up python programming for work again, so my first challenge will be to also add posts in the form of jupyter notebooks.

As for my first post I will add the code that I use to generate my page logo in R.

# Tweedie distributions

We often encounter distributions that are not normal, I often encounter poisson and gamma distributions as well as distributions with an inflated zero value all of which belong to the family of tweedie distributions. When changing the parameter $$p$$ which can take values between 0 and 2 ( p == 0 gaussian, p == 1 poisson, p == 2 gamma) we can sample the different tweedie distributions.

the tweedie package only supports values for 1 <= p <= 2

suppressWarnings({
suppressPackageStartupMessages({
require(tidyverse)
require(tweedie)
require(ggridges)
})
})
df = tibble( p = seq(1,2,0.1) ) %>%
mutate( data = map(p, function(p) rtweedie(n = 500
, mu = 1
, phi = 1
, power = p )  ) ) %>%
unnest(data)

df %>%
ggplot( aes(x = data) )+
geom_histogram(bins = 100, fill = '#77773c') +
facet_wrap(~p, scales = 'free_y')

# Joyplot

We will now transform these distributions into a joyplot in the style of the Joy Divisions album Unknown Pleasurs cover art.

We will use ggridges formerly known as ggjoy.

joyplot = function(df){

p = df %>%
ggplot(aes(x = data, y = as.factor(p), fill = ..x.. ) ) +
, size = 0.5
, scale = 3) +
theme( panel.background = element_rect(fill = 'white')
, panel.grid = element_blank()
, aspect.ratio = 1
, axis.title = element_blank()
, axis.text = element_blank()
, axis.ticks = element_blank()
, legend.position = 'none') +
xlim(-1,5) +
scale_fill_viridis_c(option = "inferno")

return(p)

}

joyplot(df)
## Picking joint bandwidth of 0.24

I order to distribute them a bit better over the x-axis we will transform them using a sine wave pattern.

df = tibble( p = seq(1,2,0.05)
, rwn = row_number(p)
, sin = sin(rwn) ) %>%
mutate( data = map(p, function(p) rtweedie(500
, mu = 1
, phi = 1
, power = p)  ) ) %>%
unnest(data) %>%
filter( data <= 4) %>%
mutate( data = ( 4 * abs( sin(rwn) ) ) - data )

joyplot(df)
## Picking joint bandwidth of 0.206