Articles by John Myles White

DataGotham

August 21, 2012 | John Myles White

As some of you may know already, I’m co-organizing an upcoming conference called DataGotham that’s taking place in September. To help spread the word about DataGotham, I’m cross-posting the most recent announcement below: We’d like to let you know about DataGotham: a celebration of New York ... [Read more...]

The Social Dynamics of the R Core Team

August 12, 2012 | John Myles White

Recently a few members of R Core have indicated that part of what slows down the development of R as a language is that it has become increasingly difficult over the years to achieve consensus among the core developers of the language. Inspired by these claims, I decided to look ... [Read more...]

My New Book: Developing, Deploying and Debugging Multi-Armed Bandit Algorithms

July 28, 2012 | John Myles White

I’m happy to announce that I’ve started writing a new book for O’Reilly, which will focus on teaching readers how to use Multi-Armed Bandit Algorithms to build better websites. My hope is that the book can help web developers build up an intuition for the core conundrum ... [Read more...]

Automatic Hyperparameter Tuning Methods

July 20, 2012 | John Myles White

At MSR this week, we had two very good talks on algorithmic methods for tuning the hyperparameters of machine learning models. Selecting appropriate settings for hyperparameters is a constant problem in machine learning, which is somewhat surprising given how much expertise the machine learning community has in optimization theory. I ... [Read more...]

Criticism 5 of NHST: p-Values Measure Effort, Not Truth

July 17, 2012 | John Myles White

Introduction In the third installment of my series of criticisms of NHST, I focused on the notion that a p-value is nothing more than a one-dimensional representation of a two-dimensional space in which (1) the measured size of an effect and (2) the precision of this measurement have been combined in such ... [Read more...]

Optimization Functions in Julia

July 9, 2012 | John Myles White

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization ... [Read more...]

Bayesian Nonparametrics in R

June 25, 2012 | John Myles White

On July 25th, I’ll be presenting at the Seattle R Meetup about implementing Bayesian nonparametrics in R. If you’re not sure what Bayesian nonparametric methods are, they’re a family of methods that allow you to fit traditional statistical models, such as mixture models or latent factor models, ... [Read more...]

The Great Julia RNG Refactor

June 21, 2012 | John Myles White

Many readers of this blog will know that I’m a big fan of Bayesian methods, in large part because automated inference tools like JAGS allow modelers to focus on the types of structure they want to extract from data rather than worry about the algorithmic details of how they ... [Read more...]

Criticism 4 of NHST: No Mechanism for Producing Substantive Cumulative Knowledge

May 18, 2012 | John Myles White

[Note to the Reader: This is a much rougher piece than the previous pieces because the argument is more complex. I ask that you please point out places where things are unclear and where claims are not rigorous.] In this fourth part of my series of criticisms of NHST, I’... [Read more...]

Criticism 3 of NHST: Essential Information is Lost When Transforming 2D Data into a 1D Measure

May 14, 2012 | John Myles White

Introduction Continuing on with my series on the weaknesses of NHST, I’d like to focus on an issue that’s not specific to NHST, but rather one that’s relevant to all quantitative analysis: the destruction caused by an inappropriate reduction of dimensionality. In our case, we’ll be ... [Read more...]

Criticism 2 of NHST: NHST Conflates Rare Events with Evidence Against the Null Hypothesis

May 12, 2012 | John Myles White

Introduction This is my second post in a series describing the weaknesses of the NHST paradigm. In the first post, I argued that NHST is a dangerous tool for a community of researchers because p-values cannot be interpreted properly without perfect knowledge of the research practices of other scientists — knowledge ... [Read more...]

Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities

May 10, 2012 | John Myles White

Introduction Over my years as a graduate student, I have built up a long list of complaints about the use of Null Hypothesis Significance Testing (NHST) in the empirical sciences. In the next few weeks, I’m planning to publish a series of blog posts, each of which will articulate ... [Read more...]

cumplyr: Extending the plyr Package to Handle Cross-Dependencies

May 3, 2012 | John Myles White

Introduction For me, Hadley Wickham‘s reshape and plyr packages are invaluable because they encapsulate omnipresent design patterns in statistical computing: reshape handles switching between the different possible representations of the same underlying data, while plyr automates what Hadley calls the Split-Apply-Combine strategy, in which you split up your data ... [Read more...]

Implementing the Exact Binomial Test in Julia

April 14, 2012 | John Myles White

One major benefit of spending my time recently adding statistical functionality to Julia is that I’ve learned a lot about the inner guts of algorithmic null hypothesis significance testing. Implementing Welch’s two-sample t-test last week was a trivial task because of the symmetry of the null hypothesis, but ... [Read more...]

Floating Point Arithmetic and The Descent into Madness

April 13, 2012 | John Myles White

While I should confess upfront that I’ve always had a weaker command of the details of floating point arithmetic than I feel I ought to have, this sort of thing still blows my mind when I stumble upon it. These moments invariably make me realize that floating point math ... [Read more...]

Comparing Julia and R’s Vocabularies

April 9, 2012 | John Myles White

While exploring the Julia manual recently, I realized that it might be helpful to put the basic vocabularies of Julia and R side-by-side for easy comparison. So I took Hadley Wickham’s R Vocabulary section from the book he’s putting together on the devtools wiki, put all of the ... [Read more...]

Simulated Annealing in Julia

April 4, 2012 | John Myles White

Building Optimization Functions for Julia In hopes of adding enough statistical functionality to Julia to make it usable for my day-to-day modeling projects, I’ve written a very basic implementation of the simulated annealing (SA) algorithm, which I’ve placed in the same JuliaVsR GitHub repository that I used for ... [Read more...]

Julia, I Love You

March 31, 2012 | John Myles White

Julia is a new language for scientific computing that is winning praise from a slew of very smart people, including Harlan Harris, Chris Fonnesbeck, Douglas Bates, Vince Buffalo and Shane Conway. As a language, it has lofty design goals, which, if attained, will make it noticeably superior to Matlab, R ... [Read more...]

Back to Blogging

March 31, 2012 | John Myles White

If you’re subscribed to this blog, you’ve surely noticed the very long hiatus I’ve taken from writing over the last six months. I wish I’d kept up with blogging more faithfully this year, but, in my defense, I’ve been busy doing a few big things: ... [Read more...]

Using Sparse Matrices in R

October 31, 2011 | John Myles White

Introduction I’ve recently been working with a couple of large, extremely sparse data sets in R. This has pushed me to spend some time trying to master the CRAN packages that support sparse matrices. This post describes three of them: the Matrix, slam and glmnet packages. The first two ... [Read more...]

« 1 2 3 4 … 6 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by John Myles White

DataGotham

The Social Dynamics of the R Core Team

My New Book: Developing, Deploying and Debugging Multi-Armed Bandit Algorithms

Automatic Hyperparameter Tuning Methods

Criticism 5 of NHST: p-Values Measure Effort, Not Truth

Optimization Functions in Julia

Bayesian Nonparametrics in R

The Great Julia RNG Refactor

Criticism 4 of NHST: No Mechanism for Producing Substantive Cumulative Knowledge

Criticism 3 of NHST: Essential Information is Lost When Transforming 2D Data into a 1D Measure

Criticism 2 of NHST: NHST Conflates Rare Events with Evidence Against the Null Hypothesis

Criticism 1 of NHST: Good Tools for Individual Researchers are not Good Tools for Research Communities

cumplyr: Extending the plyr Package to Handle Cross-Dependencies

Implementing the Exact Binomial Test in Julia

Floating Point Arithmetic and The Descent into Madness

Comparing Julia and R’s Vocabularies

Simulated Annealing in Julia

Julia, I Love You

Back to Blogging

Using Sparse Matrices in R

Articles by John Myles White

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)