Articles by John Myles White

Data corruption in R 3.0.2 when using read.csv

January 29, 2014 | John Myles White

Introduction It may be old news to some, but I just recently discovered that the automatic type inference system that R uses when parsing CSV files assumes that data sets will never contain 64-bit integer values. Specially, if an integer value read from a CSV file is too large to ... [Read more...]

The Relationship between Vectorized and Devectorized Code

December 22, 2013 | John Myles White

Introduction Some people have come to believe that Julia’s vectorized code is unusably slow. To correct this misconception, I outline a naive benchmark below that suggests that Julia’s vectorized code is, in fact, noticeably faster than R’s vectorized code. When experienced Julia programmers suggest that newcomers should ... [Read more...]

Writing Type-Stable Code in Julia

December 6, 2013 | John Myles White

For many of the people I talk to, Julia’s main appeal is speed. But achieving peak performance in Julia requires that programmers absorb a few subtle concepts that are generally unfamiliar to users of weakly typed languages. One particularly subtle performance pitfall is the need to write type-stable code. ... [Read more...]

September Talks

September 5, 2013 | John Myles White

To celebrate my last full month on the East Coast, I’m doing a bunch of talks. If you’re interested in hearing more about Julia or statistics in general, you might want to come out to one of the events I’ll be at: Julia Tutorial at DataGotham: On 9/12, ... [Read more...]

Hopfield Networks in Julia

July 28, 2013 | John Myles White

As a fun side project last night, I decided to implement a basic package for working with Hopfield networks in Julia. Since I suspect many of the readers of this blog have never seen a Hopfield net before, let me explain what they are and what they can be used ... [Read more...]

What’s Next

May 9, 2013 | John Myles White

The last two weeks have been full of changes for me. For those who’ve been asking about what’s next, I thought I’d write up a quick summary of all the news. (1) I successfully defended my thesis this past Monday. Completing a Ph.D. has been a massive ... [Read more...]

Using Norms to Understand Linear Regression

March 22, 2013 | John Myles White

Introduction In my last post, I described how we can derive modes, medians and means as three natural solutions to the problem of summarizing a list of numbers, \((x_1, x_2, \ldots, x_n)\), using a single number, \(s\). In particular, we measured the quality of different potential summaries in three ... [Read more...]

Modes, Medians and Means: A Unifying Perspective

March 22, 2013 | John Myles White

Introduction / Warning Any traditional introductory statistics course will teach students the definitions of modes, medians and means. But, because introductory courses can’t assume that students have much mathematical maturity, the close relationship between these three summary statistics can’t be made clear. This post tries to remedy that situation ... [Read more...]

Writing Better Statistical Programs in R

January 24, 2013 | John Myles White

A while back a friend asked me for advice about speeding up some R code that they’d written. Because they were running an extensive Monte Carlo simulation of a model they’d been developing, the poor performance of their code had become an impediment to their work. After I ... [Read more...]

Americans Live Longer and Work Less

January 21, 2013 | John Myles White

Today I saw an article on Hacker News entitled, “America’s CEOs Want You to Work Until You’re 70″. I was particularly surprised by this article appearing out of the blue because I take it for granted that America will eventually have to raise the retirement age to avoid bankruptcy. ... [Read more...]

Symbolic Differentiation in Julia

January 7, 2013 | John Myles White

A Brief Introduction to Metaprogramming in Julia In contrast to my previous post, which described one way in which Julia allows (and expects) the programmer to write code that directly employs the atomic operations offered by computers, this post is meant to introduce newcomers to some of Julia’s higher ... [Read more...]

Computers are Machines

January 3, 2013 | John Myles White

When people try out Julia for the first time, many of them are worried by the following example: 1 2 3 4 5 6 7 julia__ factorial(n) = n == 0 ? 1 : n * factorial(n - 1)   julia__ factorial(20) 2432902008176640000   julia__ factorial(21) -4249290049419214848 If you’re not familiar with computer architecture, this [...] [Read more...]

What is Correctness for Statistical Software?

December 14, 2012 | John Myles White

Introduction A few months ago, Drew Conway and I gave a webcast that tried to teach people about the basic principles behind linear and logistic regression. To illustrate logistic regression, we worked through a series of progressively more complex spam detection problems. The simplest data set we used was the ... [Read more...]

A Cheap Criticism of p-Values

December 6, 2012 | John Myles White

One of these days I am going to finish my series on problems with how NHST is issued in the social sciences. Until then, I came up with a cheap criticism of p-values today. To make sense of my complaint, you’ll want to head over to Andy Gelman’s ... [Read more...]

The State of Statistics in Julia

December 2, 2012 | John Myles White

Updated 12.2.2012: Added sample output based on a suggestion from Stefan Karpinski. Introduction Over the last few weeks, the Julia core team has rolled out a demo version of Julia’s package management system. While the Julia package system is still very much in beta, it nevertheless provides the first plausible ... [Read more...]

The Shape of Floating Point Random Numbers

October 15, 2012 | John Myles White

[Updated 10/18/2012: Fixed a typo in which mantissa was replaced with exponent.] Over the weekend, Viral Shah updated Julia’s implementation of randn() to give a 20% speed boost. Because we all wanted to test that this speed-up had not come at the expense of the validity of Julia’s RNG system, ... [Read more...]

Overfitting

October 13, 2012 | John Myles White

What do you think when you see a model like the one below? Does this strike you as a good model? Or as a bad model? There’s no right or wrong answer to this question, but I’d like to argue that models that are able to match white ... [Read more...]

EDA Before CDA

October 6, 2012 | John Myles White

One Paragraph Summary Always explore your data visually. Whatever specific hypothesis you have when you go out to collect data is likely to be worse than any of the hypotheses you’ll form after looking at just a few simple visualizations of that data. The most effective hypothesis testing framework ... [Read more...]

Playing with The Circular Law in Julia

September 25, 2012 | John Myles White

Introduction Statistically-trained readers of this blog will be very familiar with the Central Limit Theorem, which describes the asymptotic sampling distribution of the mean of a random vector composed of IID variables. Some of the most interesting recent work in mathematics has been focused on the development of increasingly powerful ... [Read more...]

Will Data Scientists Be Replaced by Tools?

August 28, 2012 | John Myles White

The Quick-and-Dirty Summary I was recently asked to participate in a proposed SXSW panel that will debate the question, “Will Data Scientists Be Replaced by Tools?” This post describes my current thinking on that question as a way of (1) convincing you to go vote for the panel’s inclusion in ... [Read more...]
1 2 3 6

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)