Blog Archives

Hack, a template for improving code reliability

March 24, 2014
By

My sole prediction for 2014 has come true, Facebook have announced the Hack language (if you don’t know that HHVM is the Hip Hop Virtual Machine you are obviously not a trendy developer). This language does not follow the usual trend in that it looks useful, rather than being fashion fluff for corporate developers to

Read more »

By now I ought to feel more knowledgeable about R

March 18, 2014
By

I was surprised to find recently that there are now over 15,000 lines of R code in the book I am working on. If I had written that much code in another ‘newly’ acquired language I would probably feel a lot more knowledgeable about it than I currently feel about R. Why don’t I feel

Read more »

Performing a non-local return in R

February 23, 2014
By
Performing a non-local return in R

In most languages return is a statement, but in R it is a function (in fact R does not really have statements, it only has expressions). This function-like behavior of return is useful for figuring out the order in which operations are performed, e.g., the value returned by return(1)+return(2) tells us that binary operators are

Read more »

Converting graphs in pdf files to csv format

December 19, 2013
By
Converting graphs in pdf files to csv format

Looking at a graph displayed as part of a pdf document is so tantalizing; I want that data as a csv! One way to get the data is to email the author(s) and ask for it. I do this regularly and sometimes get the apologetic reply that the data is confidential. But I can see

Read more »

Ordinary Least Squares is dead to me

November 28, 2013
By

Most books that discuss regression modeling start out and often finish with Ordinary Least Squares (OLS) as the technique to use; Generalized Least Squares (GLS) sometimes get a mention near the back. This is all well and good if the readers’ data has the characteristics required for OLS to be an applicable technique. A lot

Read more »

R now has its own shelf in Dillons

November 24, 2013
By
R now has its own shelf in Dillons

I was in Dillons, the one opposite University College London, at the start of the week and what did I spy there? There is now a bookshelf devoted to R (right, second from top) in the programming languages section. The shelf would be a lot fuller if O’Reilly did not have a complete section devoted

Read more »

I made a mistake, please don’t shoot me

July 31, 2013
By

The major difference between commercial/academic written software is the handling of user mistakes, or to be more exact what is considered to be a user mistake. In the commercial world the emphasis is on keeping the customer happy, which translates into trying hard to gracefully handle any ‘mistake’ the user makes. Academic software is generally

Read more »

Amount of end-user usage of code in Firefox

July 25, 2013
By
Amount of end-user usage of code in Firefox

How much end-user usage does the code in Firefox receive over time? Short answer: The available data is very sparse and lots of hand waving is needed to concoct something. The longer answer is below as another draft section from my book Empirical software engineering with R. As always comments and pointers to more data

Read more »

Unique bytes in a sliding window as a file content signature

July 21, 2013
By
Unique bytes in a sliding window as a file content signature

I was at a workshop a few months ago where a speaker pointed out a useful technique for spotting whether a file contains compressed data, e.g., a virus hidden in a script by compressing it to look like a jumble of numbers. Compressed data contains a uniform distribution of byte values (after all, compression is

Read more »

Preferential attachment applied to frequency of accessing a variable

May 17, 2013
By
Preferential attachment applied to frequency of accessing a variable

If, when writing code for a function, up to the current point in the code distinct local variables have been accessed for reading times (), will the next read access be from a previously unread local variable and if not what is the likelihood of choosing each of the distinct variables (global variables are ignored

Read more »