Articles by statcompute

Improve SVM Tuning through Parallelism

March 19, 2016 | statcompute

As pointed out in the chapter 10 of “The Elements of Statistical Learning”, ANN and SVM (support vector machines) share similar pros and cons, e.g. lack of interpretability and good predictive power. However, in contrast to ANN usually suffering from local minima solutions, SVM is always able to converge globally. ... [Read more...]

Where Bagging Might Work Better Than Boosting

January 2, 2016 | statcompute

In the previous post (https://statcompute.wordpress.com/2016/01/01/the-power-of-decision-stumps), it was shown that the boosting algorithm performs extremely well even with a simple 1-level stump as the base learner and provides a better performance lift than the bagging algorithm does. However, this observation shouldn’t be generalized, which would be ... [Read more...]

The Power of Decision Stumps

January 1, 2016 | statcompute

A decision stump is the weak classification model with the simple tree structure consisting of one split, which can also be considered a one-level decision tree. Due to its simplicity, the stump often demonstrates a low predictive performance. As shown in the example below, the AUC measure of a stump ... [Read more...]

Prediction Intervals for Poisson Regression

December 20, 2015 | statcompute

Different from the confidence interval that is to address the uncertainty related to the conditional mean, the prediction interval is to accommodate the additional uncertainty associated with prediction errors. As a result, the prediction interval is always wider than the confidence interval in a regression model. In the context of ... [Read more...]

Calculate Leave-One-Out Prediction for GLM

December 13, 2015 | statcompute

In the model development, the “leave-one-out” prediction is a way of cross-validation, calculated as below: 1. First of all, after a model is developed, each observation used in the model development is removed in turn and then the model is refitted with the remaining observations 2. The out-of-sample prediction for the refitted ... [Read more...]

Fitting Generalized Regression Neural Network with Python

December 9, 2015 | statcompute

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. In [1]: # LOAD PACKAGES In [2]: import pandas as pd In [3]: import numpy as np In [4]: from sklearn import preprocessing as pp In [5]: from sklearn import cross_validation as cv In [6]: from neupy.algorithms import GRNN as grnn In [7]: from neupy.functions import mse In [8]: # DATA PROCESSING In [9]: df = pd.read_table("csdata.txt") In [10]: y = df.ix[:, 0] In [11]: y.describe() Out[11]: count 4421.000000 mean 0.090832 std 0.193872 min 0.000000 25% 0.000000 50% 0.000000 75% 0.011689 max 0.998372 Name: LEV_LT3, dtype: float64 In [12]: x = df.ix[:, 1:df.shape[1]] In [13]: st_x = pp.scale(x) In [14]: st_x.mean(axis = 0) Out[14]: array([ 1.88343648e-17, 5.76080438e-17, -1.76540780e-16, -7.71455583e-17, -3.80705294e-17, 3.79409490e-15, 4.99487355e-17, -2.97100804e-15, 3.93261537e-15, -8.70310886e-16, -1.30728071e-15]) In [15]: st_x.std(axis = 0) Out[15]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) In [16]: x_train, x_test, y_train, y_test = cv.train_test_split(st_x, y, train_size = 0.7, random_state = 2015) In [17]: [...] [Read more...]

Modeling Frequency in Operational Losses with Python

December 8, 2015 | statcompute

Poisson and Negative Binomial regressions are two popular approaches to model frequency measures in the operational loss and can be implemented in Python with the statsmodels package as below: Although Quasi-Poisson regressions is not currently supported by the statsmodels package, we are still able to estimate the model with the ... [Read more...]

Modeling Severity in Operational Losses with Python

December 6, 2015 | statcompute

When modeling severity measurements in the operational loss with Generalized Linear Models, we might have a couple choices based on different distributional assumptions, including Gamma, Inverse Gaussian, and Lognormal. However, based on my observations from the empirical work, the differences in parameter estimates among these three popular candidates are rather ... [Read more...]

Estimating Quasi-Poisson Regression with GLIMMIX in SAS

October 14, 2015 | statcompute

When modeling the frequency measure in the operational risk with regressions, most modelers often prefer Poisson or Negative Binomial regressions as best practices in the industry. However, as an alternative approach, Quasi-Poisson regression provides a more flexible model estimation routine with at least two benefits. First of all, Quasi-Poisson regression ... [Read more...]

Some Considerations of Modeling Severity in Operational Losses

August 16, 2015 | statcompute

In the Loss Distributional Approach (LDA) for Operational Risk models, multiple distributions, including Log Normal, Gamma, Burr, Pareto, and so on, can be considered candidates for the distribution of severity measures. However, the challenge remains in the stress testing exercise, e.g. CCAR, to relate operational losses to macro-economic scenarios ... [Read more...]

Are These Losses from The Same Distribution?

June 14, 2015 | statcompute

In Advanced Measurement Approaches (AMA) for Operational Risk models, the bank needs to segment operational losses into homogeneous segments known as “Unit of Measures (UoM)”, which are often defined by the combination of lines of business (LOB) and Basel II event types. However, how do we support whether the losses ... [Read more...]

Granger Causality Test

May 25, 2015 | statcompute

[This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. # READ QUARTERLY DATA FROM CSV library(zoo) ts1 <- read.zoo('Documents/data/macros.csv', header = T, sep = ",", FUN = as.yearqtr) # CONVERT THE DATA TO STATIONARY TIME SERIES ts1$hpi_rate <- log(ts1$hpi / lag(ts1$hpi)) ts1$unemp_rate <- log(ts1$unemp / lag(ts1$unemp)) ts2 <- ts1[1:nrow(ts1) - 1, c(3, 4)] # METHOD 1: LMTEST PACKAGE library(lmtest) grangertest(unemp_rate ~ hpi_rate, order = 1, data = ts2) # Granger causality test # # Model 1: unemp_rate ~ Lags(unemp_rate, 1:1) + Lags(hpi_rate, 1:1) # Model 2: unemp_rate ~ Lags(unemp_rate, 1:1) # Res.Df Df F Pr(>F) # 1 55 # 2 56 -1 4.5419 0.03756 * # --- # Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 # METHOD 2: VARS PACKAGE library(vars) var <- VAR(ts2, p = 1, type = "const") causality(var, cause = "hpi_rate")$Granger # Granger causality H0: hpi_rate do not Granger-cause unemp_rate # # data: VAR object var # F-Test = 4.5419, [...] [Read more...]

Read A Block of Spreadsheet with R

May 10, 2015 | statcompute

In R, there are two ways to read a block of the spreadsheet, e.g. xlsx file, as the one shown below. The xlsx package provides the most intuitive interface with readColumns() function by explicitly defining the starting and the ending columns and rows. However, if we can define a ...
[Read more...]

To Difference or Not To Difference?

May 9, 2015 | statcompute

In the textbook of time series analysis, we’ve been taught to difference the time series in order to have a stationary series, which can be justified by various plots and statistical tests. In the real-world time series analysis, things are not always as clear as shown in the textbook. ...
[Read more...]

Modeling Count Time Series with tscount Package

March 31, 2015 | statcompute

The example below shows how to estimate a simple univariate Poisson time series model with the tscount package. While the model estimation is straightforward and yeilds very similar parameter estimates to the ones generated with the acp package (https://statcompute.wordpress.com/2015/03/29/autoregressive-conditional-poisson-model-i), the prediction mechanism is a bit tricky. 1) ... [Read more...]

rPithon vs. rPython

March 30, 2015 | statcompute

Similar to rPython, the rPithon package (http://rpithon.r-forge.r-project.org) allows users to execute Python code from R and exchange the data between Python and R. However, the underlying mechanisms between these two packages are fundamentally different. Wihle rPithon communicates with Python from R through pipes, rPython accomplishes the ... [Read more...]

Autoregressive Conditional Poisson Model – I

March 29, 2015 | statcompute

Modeling the time series of count outcome is of interest in the operational risk while forecasting the frequency of losses. Below is an example showing how to estimate a simple ACP(1, 1) model, e.g. Autoregressive Conditional Poisson, without covariates with ACP package. [Read more...]

Ensemble Learning with Cubist Model

March 20, 2015 | statcompute

The tree-based Cubist model can be easily used to develop an ensemble classifier with a scheme called “committees”. The concept of “committees” is similar to the one of “boosting” by developing a series of trees sequentially with adjusted weights. However, the final prediction is the simple average of predictions from ... [Read more...]

Model Segmentation with Cubist

March 18, 2015 | statcompute

Cubist is a tree-based model with a OLS regression attached to each terminal node and is somewhat similar to mob() function in the Party package (https://statcompute.wordpress.com/2014/10/26/model-segmentation-with-recursive-partitioning). Below is a demonstrate of cubist() model with the classic Boston housing data. [Read more...]

Download Federal Reserve Economic Data (FRED) with Python

December 10, 2014 | statcompute

In the operational loss calculation, it is important to use CPI (Consumer Price Index) adjusting historical losses. Below is an example showing how to download CPI data online directly from Federal Reserve Bank of St. Louis and then to calculate monthly and quarterly CPI adjustment factors with Python. [Read more...]
1 3 4 5 6 7 8

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)