Articles by matloff

The Generalized Method of Moments and the gmm package

December 20, 2015 | matloff

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a ...
[Read more...]

The Method of Boosting

December 8, 2015 | matloff

One of the techniques that has caused the most excitement in the machine learning community is boosting, which in essence is a process of iteratively refining, e.g. by reweighting, of estimated regression and classification functions (though it has primarily been applied to the latter), in order to improve predictive ... [Read more...]

OVA vs. AVA in Classification Problems, via regtools

December 2, 2015 | matloff

OVA and AVA? Huh? These stand for One vs. All and All vs. All, in classification problems with more than 2 classes. To illustrate the idea, I’ll use the UCI Vertebral Column data and Letter Recognition Data, and analyze them using my regtools package. As some of you know, I’... [Read more...]

Back to the BLAS Issue

November 21, 2015 | matloff

A few days ago, I wrote here about how some researchers, such Art Owen and Katelyn Gao at Stanford and Patrick Perry at NYU, have been using an old, old statistical technique — random effects models — for a new, new application — recommender systems. In addition to describing their approach to that ... [Read more...]

Partools, Recommender Systems and More

November 15, 2015 | matloff

Recently I attended a talk by Stanford’s Art Owen, presenting work done with his student, Katelyn Gao. This talk touched on a number of my interests, both mathematical and computational. What particularly struck me was that Art and Katelyn are applying a very old — many would say very boring — ... [Read more...]

A New Method for Statistical Disclosure Limitation, I

October 15, 2015 | matloff

The Statistical Disclosure Limitation (SDL) problem involves modifying a data set in such a manner that statistical analysis on the modified data is reasonably close to that performed on the original data, while preserving the privacy of individuals in the data set. For instance, we might have a medical data ... [Read more...]

Unbalanced Data Is a Problem? No, BALANCED Data Is Worse

September 29, 2015 | matloff

Say we are doing classification analysis with classes labeled 0 through m-1. Let Ni be the number of observations in class i. There is much handwringing in the machine learning literature over situations in which there is a wide variation among the Ni. I will argue here, though, that the problem ... [Read more...]

More on the Heteroscedasticity Issue

September 22, 2015 | matloff

In my last post, I dsciussed R software, including mine, that handles heteroscedastic settings for linear and nonlinear regression models. Several readers had interesting comments and questions, which I will address here. To review: Though most books and software assume homoscedasticity, i.e. constancy of the variance of the response ... [Read more...]

Can You Say “Heteroscedasticity” 3 Times Fast?

September 18, 2015 | matloff

Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = ... [Read more...]

New R Software/Methodology for Handling Missing Dat

September 16, 2015 | matloff

I’ve added some missing-data software to my regtools package on GitHub. In this post, I’ll give an overview of missing-data methodology, and explain what the software does. For details, see my JSM paper, jointly authored with my student Xiao (Max) Gu. There is a long history of development ... [Read more...]

Exciting userR! 2016 Conference

September 12, 2015 | matloff

The 2016 meeting of the annual useR! conference will be held in June at Stanford University. This is a fantastic venue, and we believe it may be the largest useR! meeting to date. See the above link for details! [Read more...]

Partools 1.1.4

August 21, 2015 | matloff

Partools 1.1.4 is now on GitHub. The main change this time is enhancement of the debugging facilities (which work not only for partools but also the cluster-based portion of R’s parallel package in general). As some of you know, I place huge importance on debugging, so much so that I ... [Read more...]

partools: a Sensible R Package for Large Data Sets

August 5, 2015 | matloff

As I mentioned recently, the new, greatly extended version of my partools package is now on CRAN. (The current version on CRAN is 1.1.3, whereas at the time of my previous announcement it was only 1.1.1. Note that Unix is NOT required.) It is my contention that for most R users who ... [Read more...]

CACM Highlights R

July 23, 2015 | matloff

The Association for Computing Machinery is the main professional organization for computer science, largely for academia but still with a broad membership. ACM publishes a number of journals, most of them for research but its flagship publication is a magazine, the Communications of the ACM. The current issue of the ... [Read more...]

Heteroscedasticity in Regression — It Matters!

June 7, 2015 | matloff

R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set ... [Read more...]

Macros in R

June 5, 2015 | matloff

In programming, sometimes it’s useful to write a macro rather than a function. (Don’t worry if you’ve never heard the term before.) In this post, I’ll give an example of use of macros in R. using the gtools package on CRAN. I wanted to write some ... [Read more...]

Discovered Two Great Web Sites Today

June 3, 2015 | matloff

Today is my lucky day.  I learned of two very interesting Web pages, both of them quite informative and the first of them rather provocative (yay!). I have some comments on both, in some cases consisting of mild disagreement, which I may post later, but in any event, I highly ... [Read more...]

Update on Snowdoop, a MapReduce Alternative

May 29, 2015 | matloff

In blog posts a few months ago, I proposed an alternative to MapReduce, e.g. to Hadoop, which I called “Snowdoop.” I pointed out that systems like Hadoop and Spark are very difficult to install and configure, are either too primitive (Hadoop)  or too abstract (Spark) to program, and above ... [Read more...]

My New Book and Other Matters

May 22, 2015 | matloff

I haven’t posted for a while, so here are some news items: My new book, Parallel Computation for Data Science, will be out in June or July. I believe it will be useful to anyone doing computationally intensive work. After a few months being busy with the book and ... [Read more...]

Tutorial on High-Performance Computing in R

February 3, 2015 | matloff

I wanted to call your attention to what promises to be an outstanding tutorial on High-Performance Computing (HPC) in R, presented in Web streaming format. My Rth package coauthor Drew Schmidt, who is also one of the authors of the pbdR package, will be one of the presenters.  Should very ... [Read more...]
1 2 3 4 5 6

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)