Unbalanced Data Is a Problem? No, BALANCED Data Is Worse
Say we are doing classification analysis with classes labeled 0 through m-1. Let Ni be the number of observations in class i. There is much handwringing in the machine learning literature over situations in which there is a wide variation among the Ni. I will argue here, though, that the problem ... [Read more...]