Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In my previous post, I introduced the concept of smoothing using Fourier basis functions and I applied them onto temperature data. It is important to note the that a similar kind of analysis can be replicated using B-splines (see this page). In this post, I extend the concept to an another type of basis functions: Gaussian Radial basis functions. Radial basis functions are part of a class of single hidden layer feedforward networks which can be expressed as a linear combination of radially symmetric nonlinear basis functions. Each basis function forms a localized receptive field in the input space. The most commonly used function is the Gaussian Basis

## Modeling Gaussian Basis functions

Algebraically, Gaussian basis functions are defined as follows:
\$\$ phi_k(t;mu_k ,sigma^2_k) = exp left(-dfrac{||t-mu_k||^2}{2sigma^2_k}right),textbf{ } k=1,dots,K \$\$ where (mu_k) is a parameter determining the center of the basis function, (sigma^2_k) is a parameter that determines the width and (||.||) is the Euclidian norm. The basis functions overlap with each other to capture the information about (t), and the width parameter play an essential role to capture the structure in the data over the region of input data. The parameters featuring in each basis function are often determined heuristically based on the structure of the observed data.

### K-means Clustering

Using K-means Clustering, the observational space (mathcal{T}) into K clusters ({C_1,C_2,dots,C_K}) that correspond to the number of basis functions. They are determined by:

\$\$hat{mu}_k = frac{1}{N_k}sum_{t_j in C_k} t_j, text{ and } hat{sigma}^2_k = dfrac{1}{N_k}sum_{t_j in C_k} ||t_j – hat{mu}_k||^2\$\$ where (N_k) is the number of observations which belongs to the (k^{th}) cluster. However, this method does not produce unique parameters for a unique set of observations, due to the stochastic nature of the starting value in the clustering algorithm. Because of that feature, the K-means clustering underperforms when capturing all the information from the data. This is noticeable when the set of Functional Data are observed at equidistant points. In order to control the amount of overlapping as a mean to overcome the lack of overlapping among basis functions, we may introduce a hyper-parameter. Hence, (phi_k(t))’s are written:
\$\$phi_k(t;mu_k ,sigma^2_k) = exp left(-dfrac{||t-mu_k||^2}{2nu sigma^2_k}right),textbf{ } k=1,dots,K\$\$
In R, these basis functions are generated as below:

```data(motorcycledata, package = "adlift")
mtt = motorcycledata\$time # the timestamps
n = 8 # the number of basis functions over T
nyu = 5 # hyperparameter

k = kmeans(mtt, centers = n,algorithm = "Hartigan-Wong")
myu = as.vector(k\$centers)
h = k\$withinss/k\$size

B <- matrix(0,length(mtt),n)
for (j in 1:n){
B[,j] = exp(-0.5*(mtt-myu[j])^2/(h[j]*nyu))
}
```

Here is the plot (note the irregularities of the basis functions) ### B-Splines

An another approach is to the idea of B-Splines to stabilize the estimation of the Gaussian basis functions parameters. The parameters of the basis functions are determined by preassigned knots similar to B-Splines basis functions. Let’s the set of observations ({x_j;text{ } j=1,dots,n}) arranged by magnitude, and the equally spaced knots (t_ktext{ }(k = 1,dots,K+4)) are set up as:
\$\$t_1n observations are divided into ((K-3)) intervals ([t_4,t_5],[t_5,t_6],dots,[t_K,t_{K+1}]). The basis functions are defined with a center (t_k) and a width (h = (t_k – t_{k-2})/3) for (k = 3,dots,K+2) as follows:
\$\$phi_k(x;t_k ,h^2) = exp left(-dfrac{||x-t_k||^2}{2 h^2}right)
\$\$In R, these basis functions are generated as below:

```data(motorcycledata, package = "adlift")
mtt = motorcycledata\$time # the timestamps
m = 8 # the number of basis functions over T
nyu = 5 # hyperparameter

range = diff(range(mtt))
kn = seq(min(tt) - (range/(m-3))*3, max(mtt) + (range/(m-3))*3, by = range/(m-3))
myu = kn[3:(m+2)]
h = diff(kn,lag = 2)/3

B <- matrix(0,length(mtt),(m))
for (j in 1:m){
B[,j] = exp(-0.5*(tt-myu[j])^2/(h^2))
}
```

Here is the plot (notice the difference with K-Means approach): ## Practical Application

At this point, we already have devised and implemented the use of Gaussian basis functions. Now it is time to do some smoothing, yaaay.
The data I will be using is the motorcycledata from the adlift. The data gives the results of 133 simulations showing the effects of motorcycle crashes on victims heads. I will use 20 basis functions and the hyperparameter will be set to 2 (i.e. (nu = 2)).
In R, this is how it is done.

```set.seed(59)
tt = motorcycledata\$time # the timestamps
m = 20 # number of basis functions
nyu = 2 # hyperparameter

##### B-Splines Gaussian basis
range = diff(range(tt))
kn = seq(min(tt) - (range/(m-3))*3, max(tt) + (range/(m-3))*3, by = range/(m-3))
myu = kn[3:(m+2)]
h = diff(kn,lag = 2)/3

BsMotor <- matrix(0,length(tt),(m))
for (j in 1:m){
B[,j] = exp(-0.5*(tt-myu[j])^2/(h^2))
}

##### K-Means clustering Gaussian basis
k = kmeans(tt, centers = n, algorithm = "Hartigan-Wong")
myu = as.vector(k\$centers)
h = k\$withinss/k\$size

KMotor <- matrix(0, length(tt), (m))
for (j in 1:m){
B[,j] = exp(-0.5*(tt-myu[j])^2/(h[j]*nyu))
}

##### Smoothing Out (using LS)
f1 = function(data, m, B){

Binv = solve(t(B)%*%B, diag(m))
S = B%*%Binv%*%t(B)
return(S%*%data)
}

accel1 = f1(data = motorcycledata\$accel,m = m,B = BsMotor)
accel2 = f1(data = motorcycledata\$accel,m = m,B = KMotor)

gaussian_motor = setNames(data.frame(motorcycledata[, 2], accel1, accel2), c("Acceleration", "GaussBsplines", "GaussKmeans"))
```