Articles by matloff

Innumeracy, Statistics and R

March 1, 2016 | matloff

A couple of years ago, when an NPR journalist was interviewing me, the conversation turned to quantitative matters. The reporter said, only half jokingly, “We journalists are innumerate and proud.” :-) Some times it shows, badly. This morning a radio reporter stated, “Hillary Clinton beat Bernie Sanders among South Carolina ... [Read more...]

50% Draft of Forthcoming Book Available

March 1, 2016 | matloff

As I’ve mentioned here a couple of times, I am in the midst of writing a book, From Linear Models to Machine Learning: Regression and Classification, with Examples in R. As has been my practice with past books, I have now placed a 50% rough draft of the book on ... [Read more...]

Some Comments on Donaho’s “50 Years of Data Science”

January 23, 2016 | matloff

An old friend recently called my attention to a thoughtful essay by Stanford statistics professor David Donaho, titled “50 Years of Data Science.” Given the keen interest these days in data science, the essay is quite timely. The work clearly shows that Donaho is not only a grandmaster theoretician, but also ... [Read more...]

The Generalized Method of Moments and the gmm package

December 20, 2015 | matloff

An almost-as-famous alternative to the famous Maximum Likelihood Estimation is the Method of Moments. MM has always been a favorite of mine because it often requires fewer distributional assumptions than MLE, and also because MM is much easier to explain than MLE to students and consulting clients. CRAN has a ...
[Read more...]

The Method of Boosting

December 8, 2015 | matloff

One of the techniques that has caused the most excitement in the machine learning community is boosting, which in essence is a process of iteratively refining, e.g. by reweighting, of estimated regression and classification functions (though it has primarily been applied to the latter), in order to improve predictive ... [Read more...]

OVA vs. AVA in Classification Problems, via regtools

December 2, 2015 | matloff

OVA and AVA? Huh? These stand for One vs. All and All vs. All, in classification problems with more than 2 classes. To illustrate the idea, I’ll use the UCI Vertebral Column data and Letter Recognition Data, and analyze them using my regtools package. As some of you know, I’... [Read more...]

Back to the BLAS Issue

November 21, 2015 | matloff

A few days ago, I wrote here about how some researchers, such Art Owen and Katelyn Gao at Stanford and Patrick Perry at NYU, have been using an old, old statistical technique — random effects models — for a new, new application — recommender systems. In addition to describing their approach to that ... [Read more...]

Partools, Recommender Systems and More

November 15, 2015 | matloff

Recently I attended a talk by Stanford’s Art Owen, presenting work done with his student, Katelyn Gao. This talk touched on a number of my interests, both mathematical and computational. What particularly struck me was that Art and Katelyn are applying a very old — many would say very boring — ... [Read more...]

A New Method for Statistical Disclosure Limitation, I

October 15, 2015 | matloff

The Statistical Disclosure Limitation (SDL) problem involves modifying a data set in such a manner that statistical analysis on the modified data is reasonably close to that performed on the original data, while preserving the privacy of individuals in the data set. For instance, we might have a medical data ... [Read more...]

Unbalanced Data Is a Problem? No, BALANCED Data Is Worse

September 29, 2015 | matloff

Say we are doing classification analysis with classes labeled 0 through m-1. Let Ni be the number of observations in class i. There is much handwringing in the machine learning literature over situations in which there is a wide variation among the Ni. I will argue here, though, that the problem ... [Read more...]

More on the Heteroscedasticity Issue

September 22, 2015 | matloff

In my last post, I dsciussed R software, including mine, that handles heteroscedastic settings for linear and nonlinear regression models. Several readers had interesting comments and questions, which I will address here. To review: Though most books and software assume homoscedasticity, i.e. constancy of the variance of the response ... [Read more...]

Can You Say “Heteroscedasticity” 3 Times Fast?

September 18, 2015 | matloff

Most books on regression analysis assume homoscedasticity, the situation in which Var(Y | X = t), for a response variable Y and vector of predictor variables X, is the same for all t. Yet, needless to say, almost all data in real life is heteroscedastic. For Y = human weight and X = ... [Read more...]

New R Software/Methodology for Handling Missing Dat

September 16, 2015 | matloff

I’ve added some missing-data software to my regtools package on GitHub. In this post, I’ll give an overview of missing-data methodology, and explain what the software does. For details, see my JSM paper, jointly authored with my student Xiao (Max) Gu. There is a long history of development ... [Read more...]

Exciting userR! 2016 Conference

September 12, 2015 | matloff

The 2016 meeting of the annual useR! conference will be held in June at Stanford University. This is a fantastic venue, and we believe it may be the largest useR! meeting to date. See the above link for details! [Read more...]

Partools 1.1.4

August 21, 2015 | matloff

Partools 1.1.4 is now on GitHub. The main change this time is enhancement of the debugging facilities (which work not only for partools but also the cluster-based portion of R’s parallel package in general). As some of you know, I place huge importance on debugging, so much so that I ... [Read more...]

partools: a Sensible R Package for Large Data Sets

August 5, 2015 | matloff

As I mentioned recently, the new, greatly extended version of my partools package is now on CRAN. (The current version on CRAN is 1.1.3, whereas at the time of my previous announcement it was only 1.1.1. Note that Unix is NOT required.) It is my contention that for most R users who ... [Read more...]

CACM Highlights R

July 23, 2015 | matloff

The Association for Computing Machinery is the main professional organization for computer science, largely for academia but still with a broad membership. ACM publishes a number of journals, most of them for research but its flagship publication is a magazine, the Communications of the ACM. The current issue of the ... [Read more...]

Heteroscedasticity in Regression — It Matters!

June 7, 2015 | matloff

R’s main linear and nonlinear regression functions, lm() and nls(), report standard errors for parameter estimates under the assumption of homoscedasticity, a fancy word for a situation that rarely occurs in practice. The assumption is that the (conditional) variance of the response variable is the same at any set ... [Read more...]

Macros in R

June 5, 2015 | matloff

In programming, sometimes it’s useful to write a macro rather than a function. (Don’t worry if you’ve never heard the term before.) In this post, I’ll give an example of use of macros in R. using the gtools package on CRAN. I wanted to write some ... [Read more...]

Discovered Two Great Web Sites Today

June 3, 2015 | matloff

Today is my lucky day.  I learned of two very interesting Web pages, both of them quite informative and the first of them rather provocative (yay!). I have some comments on both, in some cases consisting of mild disagreement, which I may post later, but in any event, I highly ... [Read more...]
1 3 4 5 6 7

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)