# Weighted residual empirical processes in semi-parametric copula adjusted for regression

**YoungStatS**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Overview

In this post we first review the concept of semi-parametric copula and the accompanying estimation procedure of pseudo-likelihood estimation (PLE). We then generalize the estimation problem to the setting where the copula signal is hidden in a semi- or non-parametric regression model. Under this setting we have to base the PLE on the residuals. The particular challenge of the diverging score function is handled via the technique of the weighted residual empirical processes.

# The semi-parametric copula model

Copula has been a popular method to model multivariate dependence
structure since its introduction in Sklar (1959). Consider a random vector
\(\mathbf{E}=(E_1,\dots,E_p)^\top\in\mathbb{R}^p\) with joint distribution function
\(H\); we assume throughout that \(E_k\), \(k\in\{1,\dots,p\}\) has absolutely
continuous marginal distribution function \(F_k\). Then the *copula* \(C\)
associated with \(\mathbf{E}\) is the joint distribution function of the
marginally transformed random vector \((F_1(E_1),\dots,F_p(E_p))^\top\).
It is clear from this definition that \(C\) itself is always a
distribution supported on the unit hypercube \([0,1]^p\), and \(C\) always
has *uniform marginals* supported on \([0,1]\) whatever the marginals of
\(\mathbf{E}\) may be. (The explicit form of \(C\) follows from the Sklar’s
theorem, for instance Corollary 2.10.10 in Nelsen (2006):
\(C(\mathbf{u}) = H(F_1^{\leftarrow}(u_1),\dots,F_p^{\leftarrow}(u_p))\) for
\(F_k^{\leftarrow}\) the inverse of \(F_k\) and
\(\mathbf{u} = (u_1,\dots,u_p)^\top\in[0,1]^p\).) Furthermore, by the *invariance
property*, if \(g_1,\dots,g_p\) are univariate strictly increasing
functions, then \(\mathbf{E}\) and its marginally transformed version
\((g_1(E_1),\dots,g_p(E_p))^\top\) will admit the copula.

Thus, copula is a *margin-free* measure of multivariate dependence.
Applied in the opposite direction, one could also start from a copula
and couple the copula with arbitrary marginals to create multivariate
distributions in a flexible manner. For instance, beyond the usual
applications in finance and economy, copulas could be used in the latter
manner to model the dependence among the repeated observations in
longitudinal data (Sun, Frees, and Rosenberg (2008)).

In this post we will focus on the *semi-parametric* copula model that
serves as a middle ground between a totally non-parametric approach to
copula modelling (via the so called *empirical copula*, see for instance
Fermanian, Radulović, and Wegkamp (2004) and Berghaus, Bücher, and Volgushev (2017)) and a
totally parametric modelling of the random vector \(\mathbf{E}\). In the
semi-parametric copula model, we consider a collection of possible
distributions of \(\mathbf{E}\) where the copulas \(C=C(\mathbf{\cdot};\mathbf{\theta})\) are
constrained to be parametrized by an Euclidean *copula parameter*
\(\mathbf{\theta}=(\theta_1,\dots,\theta_d)^\top\), but where the marginals
\(F_1,\dots,F_p\) of \(\mathbf{E}\) could range over all \(p\)-tuples of absolutely
continuous univariate distribution functions.

# The pseudo-likelihood method

In the semi-parametric copula model, the primary interest is often the true value \(\mathbf{\theta}^*\) of the copula parameter that determines the multivariate dependence. An obvious challenge in estimating \(\mathbf{\theta}^*\) in the copula setting is how to handle the unknown marginals \(F_1,\dots,F_p\). The canonical solution is the pseudo-likelihood estimation (PLE) introduced in Oakes (1994) and Genest, Ghoudi, and Rivest (1995) that we now describe.

Let \(g_1(\mathbf{\cdot};\mathbf{\theta}),\dots,g_d(\mathbf{\cdot};\mathbf{\theta})\) be a collection of
appropriate *score functions* such that the population *estimating
equation* \(\mathbb{E} g_m(F_1(E_1),\dots,F_p(E_p);\mathbf{\theta})=0\) holds only when
\(\mathbf{\theta}=\mathbf{\theta}^*\), for all \(m\in\{1,\dots,d\}\). In principle one can
always choose the score functions to be the ones in the maximum
likelihood estimation, namely
\(g_m(\cdot;\mathbf{\theta})=\frac{\partial}{\partial \theta_m} \log c(\mathbf{\cdot};\mathbf{\theta})\)
where \(c(\mathbf{\cdot};\mathbf{\theta})\) is the density of the copula
\(C(\mathbf{\cdot};\mathbf{\theta})\). Thus, if \(F_1,\dots,F_p\) were known, to estimate
\(\mathbf{\theta}^*\) empirically based on a sample
\(\mathbf{E}_i=(E_{i,1},\dots,E_{i,p})^\top\), \(i\in\{1,\dots,n\}\) of \(\mathbf{E}\), one
could simply “find the zero” of the empirical version of the estimating
equation, that is to estimate \(\mathbf{\theta}^*\) by
\(\mathbf{\hat{\theta}}^{\text{parametric}}\) that solves, for all
\(m\in\{1,\dots,d\}\), \[\begin{align*}
\frac{1}{n} \sum_{i=1}^n g_m(F_1(E_{i,1}),\dots,F_p(E_{i,p});\hat{\theta}^{\text{parametric}}) =0 .
\end{align*}\] (The superscript “parametric” points to the fact that when
\(F_1,\dots,F_p\) are known, we are basically solving a parametric
problem.) However, in semi-parametric copula modelling we commonly avoid
setting \(F_1,\dots,F_p\) to some particular form. The PLE method solves
this problem by replacing the unknown \(F_k\), \(k\in\{1,\dots,p\}\) by its
empirical counterpart, namely the empirical distribution function
\(F_{n,k}(t) = \frac{1}{n+1} \sum_{i=1}^n \mathbf{1}\{E_{i,k} \le t \}\). The
*oracle* PLE estimator \(\mathbf{\hat{\theta}}^{\text{oracle}}\) of \(\mathbf{\theta}^*\) is
then the one that solves the following, revised estimating equation: for
all \(m\in\{1,\dots,d\}\), \[\begin{align}
$\frac{1}{n} \sum_{i=1}^n g_m(F_{n,1}(E_{i,1}),\dots,F_{n,p}(E_{i,p});\mathbf{\hat{\theta}}^{\text{oracle}}) = \int g_m(\mathbf{u};\mathbf{\hat{\theta}}^{\text{oracle}})\mathrm{d}C_n(\mathbf{u}) = 0.
\label{eq:EE_oracle}
\end{align}\] All integrals in this post are over \([0,1]^p\). To simplify
our expression above and later, we have introduced the empirical copula
\(C_n\) that is a multivariate distribution function on \([0,1]^p\) with a
mass of \(1/n\) at each of
\((F_{n,1}(E_{i,1}),\dots,F_{n,p}(E_{i,p}))^\top\), \(i\in\{1,\dots,n\}\)
(precisely,
\(C_n(\mathbf{u}) = \frac{1}{n} \sum_{i=1}^n \mathbf{1}\left\{ F_{n,1}(E_{i,1})\le u_1, \dots, F_{n,p}(E_{i,p})\le u_p \right\}\)).
The qualifier “oracle” in \(\mathbf{\hat{\theta}}^{\text{oracle}}\) is used to
distinguish the current case when we can still directly observe the
copula sample \(\mathbf{E}_i\), \(i\in\{1,\dots,n\}\) (albeit without knowing
\(F_1,\dots,F_p\)), from the case when even that sample will be subject to
perturbation which we now turn to.

# Residual-based pseudo-likelihood for semi-parametric copula adjusted for regression

From now on we suppose that the copula signal \(\mathbf{E}=\mathbf{E}_{\mathbf{\theta}^*}\) is
“hidden” in a multivariate response semi- or non-parametric regression
model, a setting considered in our recent work
(Zhao, Gijbels, and Van Keilegom (2022)): for a covariate \(\mathbf{X}\in\mathbb{R}^q\) (independent
of \(\mathbf{E}\)) and a response \(\mathbf{Y}=(Y_1,\dots,Y_p)^\top\in\mathbb{R}^p\),
\[\begin{align*}
Y_1 &= m_1(\mathbf{X}) + E_1 , \\
&\quad \vdots \\
Y_p &= m_p(\mathbf{X}) + E_p .
\end{align*}\] In its raw form, the model above is a purely non-parametric
regression model; by specifying particular forms of the regression
function \(m_k\), the above will accommodate a wide range of popular non-
and semi-parametric regression variants such as the partly linear
regression model and the additive model. (It’s not much more difficult
to consider a more flexible, heteroscedastic model
\(Y_k = m_k(\mathbf{X}) + \sigma_k(\mathbf{X})E_k\), though we refrain from doing so in this
post.) Gijbels, Omelka, and Veraverbeke (2015) considered a similar model and
studied the resulting *empirical copula process*.

Under this regression model, we can observe an i.i.d. sample
\((\mathbf{Y}_1,\mathbf{X}_1),\dots,(\mathbf{Y}_n,\mathbf{X}_n)\) of \((\mathbf{Y},\mathbf{X})\), but crucially *not*
the copula sample \(\mathbf{E}_1, \dots, \mathbf{E}_n\). To eventually arrive at our
estimator for \(\mathbf{\theta}^*\) in this setting, we will first form our
empirical copula \(\hat{C}_n\) based on the residuals of the regression as
follows. Let \(\hat{m}_k\) be some estimator for \(m_k\). Let’s estimate the
\(k\)th component of \(E_i=(E_{i,1},\dots,E_{i,p})^\top\) by the residual
\[\begin{align*}
\hat{E}_{i,k} = Y_{i,k} – \hat{m}_k(X_i) .
\end{align*}\] Then, we form the residual-based empirical distribution and
copula: \[\begin{align*}
\hat{F}_{n,k}(t) = \dfrac{1}{n+1} \sum_{i=1}^n \mathbf{1}\{ \hat{E}_{i,k} \le t \},\quad \hat{C}_n(\mathbf{u}) = \frac{1}{n} \sum_{i=1}^n \prod_{k=1}^p \mathbf{1}\{\hat{F}_{n,k}(\hat{E}_{i,k})\le u_k \}.
\end{align*}\] Finally, to estimate \(\mathbf{\theta}^*\), we settle for the
estimator \(\hat{\theta}^{\text{residual}}\) that solves \[\begin{align}
\frac{1}{n} \sum_{i=1}^n g_m( \hat{F}_{n,1}(\hat{E}_{i,1}),\dots,\hat{F}_{n,p}(\hat{E}_{i,p}) ;\hat{\theta}^{\text{residual}}) = \int g_m(\mathbf{u};\hat{\theta}^{\text{residual}})\mathrm{d}\hat{C}_n(\mathbf{u}) = 0 .
\label{eq:EE_residual}
\end{align}\]

Comparing above equations, we would expect that when the residual-based empirical copula \(\hat{C}_n\) is asymptotically indistinguishable from the oracle empirical copula \(C_n\), the residual-based copula parameter estimator \(\hat{\theta}^{\text{residual}}\) should be asymptotically indistinguishable from \(\hat{\theta}^{\text{oracle}}\) as well. To formally reach this conclusion, standard estimating equation theory requires (among other conditions that we will ignore in this post) that the estimating equations at the truth should become indistinguishable, namely \[\begin{align} \int g_m(\mathbf{u};\mathbf{\theta}^*)\mathrm{d}\hat{C}_n(\mathbf{u}) – \int g_m(\mathbf{u};\mathbf{\theta}^*)\mathrm{d}C_n(\mathbf{u}) = o_p(n^{-1/2}) . \label{eq:EE_diff} \end{align}\]

One typical, although ultimately restrictive, approach to establish above equation is to invoke integration by parts
(Neumeyer, Omelka, and Hudecová (2019), Chen, Huang, and Yi (2021)): ideally, this would
yield \[\begin{align*}
\int g_m(\mathbf{u};\mathbf{\theta}^*)\mathrm{d}\hat{C}_n(\mathbf{u}) \sim &\int \hat{C}_n(\mathbf{u})\mathrm{d}g_m(\mathbf{u};\mathbf{\theta}^*) \\
\stackrel{\text{up to}~o_p(n^{-1/2})}{\approx} & C_n(\mathbf{u})\mathrm{d}g_m(\mathbf{u};\mathbf{\theta}^*) \sim \int g_m(\mathbf{u};\mathbf{\theta}^*)\mathrm{d}C_n(\mathbf{u}) .
\end{align*}\] In the above “\(\sim\)” is only meant to give a drastically
simplified and hence not-quite-correct representation of integration by
parts (we refer the readers to Appendix A in Radulović, Wegkamp, and Zhao (2017) for a precise
multivariate integration by parts formula particularly useful for
copulas), but it already conveys the underlying idea: the aim is to
convert \(\hat{C}_n\) and \(C_n\) in the integrals from measures to integrands
so that proving the closeness between the two integrals is clearly
reduced to proving the closeness between \(\hat{C}_n\) and \(C_n\). However,
the integration by parts trick, although popular, often requires
*bounded* \(g_m\) to properly define the measure \(\mathrm{d}g_m\), but this is
often *not satisfied for even the most common copulas*. For instance, in
Gaussian copula, the score functions are quadratic forms of the
\(\Phi^{\leftarrow}(\mathbf{u}_k)\)’s where \(\Phi^{\leftarrow}\) is the standard normal quantile function
(see Eq. (2.2) in Segers, Akker, and Werker (2014)), so are clearly divergent as \(u_k\)
approaches 0 or 1.

In Zhao, Gijbels, and Van Keilegom (2022) we instead adopted a more direct approach.
Let
\(g_m^{[k]}(\mathbf{u};\mathbf{\theta}^*)=\frac{\partial}{\partial u_k} g_m(\mathbf{u};\mathbf{\theta}^*)\).
Then Taylor-expanding above equation we see that we will need
(among other ingredients) an \(o_p(n^{-1/2})\) rate for the terms on the
right-hand side of \[\begin{align}
\int g_m(\mathbf{u};\mathbf{\theta}^*)\mathrm{d}({\hat{C}_n-C_n})(\mathbf{u}) \approx \sum_{k=1}^p \left[ \frac{1}{n} \sum_{i=1}^n g_m^{[k]}(F_k(E_{i,k});\mathbf{\theta}^*) \left\{ \hat{F}_{n,k}(\hat{E}_{i,k})-F_{n,k}(E_{i,k}) \right\} \right] .
\label{eq:EE_diff_2}
\end{align}\] It is *not* enough that the terms
\(\hat{F}_{n,k}(\hat{E}_{i,k})-F_{n,k}(E_{i,k})\) on the right-hand side are \(o_p(n^{-1/2})\)
(in fact they are not), due to the divergence of \(g_m^{[k]}\). We need to
take a more careful look at \(\hat{F}_{n,k}(\hat{E}_{i,k})-F_{n,k}(E_{i,k})\), whose
analysis belongs to *residual empirical processes*. To demonstrate the
benefits of considering the *weighted* version of such processes, we
first review some basic literature on the weighted (non-residual)
empirical processes in the simplified setting of the real line.

# Weighted empirical processes on the real line

In this section we consider estimating a distribution function \(F=F_U\)
of a random variable \(U\). We rely on the empirical distribution function
\(F_n\) constructed from the i.i.d. observations \(U_1, \dots, U_n\) of \(U\):
\(F_n(t) = \frac{1}{n+1} \sum_{i=1}^n \mathbf{1}\left\{U_i\le t\right\}\). The
resulting classical empirical process on the real line
\(\sqrt{n} (F_n -F)(t)\), \(t\in \mathbb{R}\) must be one of the most extensively
studied objects in all of probability; for illustration we will just
quote a form of the associated law of the iterated logarithm (LIL):
\[\begin{align*}
\limsup_{n\rightarrow\infty} \sup_{t} \left|\dfrac{1}{\sqrt{\log\log(n)}} \sqrt{n} (F_n -F)(t) \right| = \dfrac{1}{\sqrt{2}} .
\end{align*}\] Clearly, the LIL treats all points \(t\in \mathbb{R}\) *equally*.
However, in reality the \(F(t)\) at some \(t\) is easier to estimate than
others. This is essentially because the variability
\(\text{var}\left\{F_n(t)\right\}=F(t)\left\{1-F(t)\right\}/n\) approaches
\(0\) when \(F(t)\) approaches \(0\) or \(1\), for any \(n\). We can clearly
observe this feature in the small simulation study represented by the
following figure, where the “band” enclosing the deviations (based on
100 Monte-Carlo simulations) gets narrower toward the boundaries of the
support of \(U\).

Above: Plot of the deviation \(F_n-F\), for sample size \(n=50\), based on 100 Monte-Carlo simulations. For simplicity we assumed \(U\sim{Unif}(0,1)\), so \(F(t)=t\) for \(t\in[0,1]\). The deviation from each simulation is represented by a single red dashed line. The 10% and 90% quantiles of the deviations at each \(t\) value are indicated by the two blue lines. Clearly, the “band” enclosing the deviations gets narrower toward the boundaries of the support of \(U\).]

This feature can also be characterized theoretically. For instance, we
can find the LIL for the weighted process \(\sqrt{n}(F_n -F)/\sqrt{F}\),
that is the classical process but now scaled by an additional standard
deviation factor \(1/\sqrt{F}\), from Csáki (1977): for some \(0

Such results on the weighted empirical processes can be generalized to settings beyond the real line, for instance to sets in \(\mathbb{R}^p\) (Alexander (1987)) and sets of functions (Giné, Koltchinskii, and Wellner (2003)).

# Results for residual-based estimators

For us, the idea of the weighted empirical processes will be applied to the residual empirical processes, which will further culminate in our eventual result on the residual-based estimator \(\hat{\theta}^{\text{residual}}\) for the copula parameter. We will first consider the weighted residual empirical processes.

## Results on weighted residual empirical process

Let \(f_k\) be the density of \(E_k\), and \(\cal{T}_n\) be the \(\sigma\)-field generated by the \((\mathbf{X}_i,\mathbf{Y}_i)\)’s, \(i\in\{1,\dots,n\}\). The usual decomposition of a residual empirical process is \[\begin{align*} \hat{F}_{n,k}(t) – F_{n,k}(t) = f_k(t) \cdot \mathbb{E}\left[ (\hat{m}_k-m_k)(\mathbf{X}) | \cal{T}_n \right] + r_{1,k}(t) \end{align*}\] where \(r_{1,k}\) is a remainder term that could be controlled as follows:

Theorem: Suppose that we can embed \(\hat{m}_k-m_k\) into a function class \(\cal{D}\) with bracketing number \(N_{[]}(\tau,\cal{D}) \lesssim (1/\tau)^\beta \exp(K (1/\tau)^{1/\nu})\) where \(\beta\), \(K\) and \(\nu\) are constants. Suppose that \(\|\hat{m}_k-m_k\|_\infty=O_p(a_n)\) (where \(\|\cdot\|_\infty\) is the supremum norm over the support of \(\mathbf{X}\)). Then under mild regularity conditions, \[\begin{gather*} \sup_{t\in\ \mathbb{R}} \dfrac{|r_{1,k}(t)|}{ n^{-\frac{1}{2}} \left\{ f_k(t) \cdot a_n + a_n^2\right\}^{\frac{1}{2}(1-1/\nu)} + n^{-\frac{1}{1+1/\nu}} + a_n^2 } = O_p( 1 ) . \end{gather*}\]

Clearly, the convergence rate of the remainder term \(r_{1,k}\) is improved by both the rate of \(\|\hat{m}_k-m_k\|_\infty\) and the weight \(f_k\). The latter point is especially beneficial when \(f_k\) is a density “of the usual shape” that decays at its tails, which will allow the rate of \(r_{1,k}\) to be tightened accordingly (exactly similar to how the rate of \(\sqrt{n}(F_n -F)(t)\) is tightened by \(\sqrt{F(t)}\) in the weighted empirical processes on the real line that we reviewed earlier). These features will tame the divergence of \(g_m^{[k]}\) in above equation.

Because what eventually form the ingredients of the residual-based
copula \(\hat{C}_n\) are the *residual ranks* \(\hat{F}_{n,k}( \hat{E}_{i,k} )\)’s, we need
to go one step further and consider the analogous results for them:

Theorem: For all \(n\ge 1\), \(k\in\{1,\dots,p\}\) and \(i\in\{1,\dots,n\}\), \[\begin{align} \label{eq:res_rank_decomp} \hat{F}_{n,k}( \hat{E}_{i,k} ) – F_{n,k}( E_{i,k} ) &= -f_k(E_{i,k}) \left\{ (\hat{m}_k-m_k)(\mathbf{X}_i) – \mathbb{E}\left[ (\hat{m}_k-m_k)(\mathbf{X}) | \cal{T}_n \right] \right\} + r_{1,k}( \hat{E}_{i,k} ) + r_{2,k,i} \end{align}\] where \(r_{1,k}\) is as in the last theorem, and \(r_{2,k,i}\) is another remainder term that satisfies \[\begin{align*} \max_{i\in\{1,\dots,n\}} \dfrac{ |r_{2,k,i}| }{ \log^{\frac{1}{2}}(n) n^{-\frac{1}{2}} \left\{ f_k(E_{i,k})\cdot a_n \right\}^{\frac{1}{2}} + a_n^2 } = O_p(1) . \end{align*}\]

Note that simliar to \(r_{1,k}\) earlier, the rate of \(r_{2,k,i}\) also enjoys the dependence on \(\|\hat{m}_k-m_k\|_\infty\) and the weighing by \(f_k\).

## Results on residual-based estimator for the copula parameter

We are now ready to plug in the decomposition of \(\hat{F}_{n,k}( \hat{E}_{i,k} ) – F_{n,k}( E_{i,k} )\) in the equation in the last theorem into above equation. The leading term (the one proportional to \(f_k\)), which is centered, is now summed over \(i\in\{1,\dots,n\}\) in and so enjoys an additional \(n^{-1/2}\)-scaling. The remainder terms are weighted and so tame the divergence of the scores \(g_m^{[k]}\). Eventually, we arrive at the asymptotic equivalence between the residual-based PLE and the oracle PLE:

Theorem: Under the conditions of the last two theorems, and some additional regularity conditions which in particular require, for \(m\in\{1,\dots,d\}\) and \(k\in\{1,\dots,p\}\), \[\begin{align} \int \left\{g_m^{[k]}(\mathbf{u};\mathbf{\theta}^*) f_k(F_k^{\leftarrow}(u_k)) \right\}^2\mathrm{d}C(\mathbf{u}) < \infty, \label{eq:g_m^k_divergence} \end{align}\] the asymptotic equivalence holds. Furthermore, \(\hat{\theta}^{\text{residual}} – \hat{\theta}^{\text{oracle}} = o_p(n^{-1/2})\).

Condition above in fact allows for quite non-trivial divergence of the score functions \(g_m\) (it certainly accommodates the Gaussian and the \(t\)-copulas). To apply the theorem above, one still needs to verify the correct upper bound on the bracketing number for embedding \(hat{m}_k-m_k\), which again turns out to be non-restrictive. For instance, for partly linear regression \(Y_k = \tilde{m}_k(\mathbf{x}) + \mathbf{\theta}_k^\top \mathbf{w} + E_k\) with \(\mathbf{w}\in\mathbb{R}^{q_{\mathbf{L},n}}\), we can allow the dimension \(q_{\mathbf{L},n}\) of the linear covariate to grow up to \(q_{\mathbf{L},n}=o(n^{1/4})\).

# Bibliography

*Probability Theory and Related Fields*75 (3): 379–423.

*Bernoulli*23 (1): 743–72.

*Journal of Econometrics*222 (1): 484–501.

*Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete*238 (2): 147–67.

*Bernoulli*10 (5): 847–60.

*Biometrika*82: 543–52.

*Scandinavian Journal of Statistics*42 (4): 1109–26.

*Stochastic Inequalities and Applications*, edited by Evarist Giné, Christian Houdré, and David Nualart, 249–78. Birkhäuser.

*An Introduction to Copulas*. 2nd ed. New York: Springer.

*Journal of Multivariate Analysis*171: 139–62.

*Journal of Nonparametric Statistics*3 (3–4): 343–54.

*Bernoulli*23 (4): 3346–84.

*The Annals of Statistics*42 (5): 1911–40.

*Publications de l’Institut de Statistique de L’Université de Paris*8: 229–31.

*Insurance: Mathematics and Economics*42 (2): 817–30.

*The Annals of Statistics*50 (2): 754–80.

**leave a comment**for the author, please follow the link and comment on their blog:

**YoungStatS**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.