**Memo's Island**, and kindly contributed to R-bloggers)

Figure 1: Synthetic data and fitted curves. |

S-shaped distributed data can be found in many applications. Such data can be approximated with logistic distribution function [1]. Cumulative distribution function of logistic distribution function is a logistic function, i.e., logit.

To demonstrate this, in this short example, after generating a synthetic data, we will fit quasibinomial regression model to different observations.

ggplot [2], an implementation of grammar of graphics [3], provides capability to apply regression or customised smoothing onto a raw data during plotting.

**Generating Synthetic Data**

Let generate set of $n$ observation over time $t$, denoted, $X_{1}, X_{2}, …, X_{n}$ for $k$ observation $X=(x_{1}, x_{2}, …, x_{k})$. We will use cumulative function for logistic distribution [4],

$$F(x;mu,s) = frac{1}{2} + frac{1}{2} tanh((x-mu)/2s)$$, adding some random noise to make

it realistic.

Let’s say there are $k=6$ observations with the following parameter sets, $mu = {9,2,3,5,7,5}$ and $s={2,2,4,3,4,2}$, we will utilise `mapply` [5] in generating a syntetic data frame.

generate_logit_cdf <- function(mu, s,

sigma_y=0.1,

x=seq(-5,20,0.1)) {

x_ms <- (x-mu)/s

y <- 0.5 + 0.5 * tanh(x_ms)

y <- abs(y + rnorm(length(x), 0, sigma_y))

ix <- which(y>=1.0)

if(length(ix)>=1) {

y[ix] <- 1.0

}

return(y)

}

set.seed(424242)

x <- seq(-5,20,0.025) # 1001 observation

mu_vec <- c(1,2,3,5,7,8) # 6 variables

s_vec <- c(2,2,4,3,4,2)

# Syntetic variables

observations_df<- mapply(generate_logit_cdf,

mu_vec,

s_vec,

MoreArgs = list(x=x))

# Give them names

colnames(observations_df) <- c("Var1", "Var2", "Var3", "Var4", "Var5", "Var6")

head(observations_df)

**Smoothing of observations**

Using the syntetic data we have generated, `observations_df`,

we can noq use `ggplot` and `quasibinomial` `glm` to visualise

and smooth the variables.

library(ggplot2)

library(reshape2)

df_all <- reshape2:::melt(observations_df)

colnames(df_all) <- c("x", "observation", "y")

df_all$observation <- as.factor(df_all$observation)

p1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) + geom_point() +

scale_color_brewer(palette = "Reds") +

theme(

panel.background = element_blank(),

axis.text.x = element_text(face="bold", color="#000000", size=11),

axis.text.y = element_text(face="bold", color="#000000", size=11),

axis.title.x = element_text(face="bold", color="#000000", size=11),

axis.title.y = element_text(face="bold", color="#000000", size=11)

# legend.position = "none"

)

l1<-ggplot(df_all, aes(x=x, y=y, colour=observation)) +

geom_point(size=3) + scale_color_brewer(palette = "Reds") +

scale_color_brewer(palette = "Reds") +

#geom_smooth(method="loess", se = FALSE, size=1.5) +

geom_smooth(aes(group=observation),method="glm", family=quasibinomial(), formula="y~x",

se = FALSE, size=1.5) +

xlab("x") +

ylab("y") +

#scale_y_continuous(breaks=seq(0.0,1,0.1)) +

#scale_x_continuous(breaks=seq(0.0,230,20)) +

#ggtitle("") +

theme(

panel.background = element_blank(),

axis.text.x = element_text(face="bold", color="#000000", size=11),

axis.text.y = element_text(face="bold", color="#000000", size=11),

axis.title.x = element_text(face="bold", color="#000000", size=11),

axis.title.y = element_text(face="bold", color="#000000", size=11)

)

library(gridExtra)

gridExtra:::grid.arrange(p1,l1)

**References**

[1] https://en.wikipedia.org/wiki/Logistic_distribution#Applications

[2] http://www.ggplot.org

[3] The Grammar of Graphics, L. Wilkinson, http://www.amzn.com/038724544

[4] http://en.wikipedia.org/wiki/Logistic_distribution#Cumulative_distribution_function.

[5] https://stat.ethz.ch/R-manual/R-devel/library/base/html/mapply.html

**leave a comment**for the author, please follow the link and comment on their blog:

**Memo's Island**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...