Blog Archives

I made a mistake, please don’t shoot me

July 31, 2013
By

The major difference between commercial/academic written software is the handling of user mistakes, or to be more exact what is considered to be a user mistake. In the commercial world the emphasis is on keeping the customer happy, which translates into trying hard to gracefully handle any ‘mistake’ the user makes. Academic software is generally

Read more »

Amount of end-user usage of code in Firefox

July 25, 2013
By
Amount of end-user usage of code in Firefox

How much end-user usage does the code in Firefox receive over time? Short answer: The available data is very sparse and lots of hand waving is needed to concoct something. The longer answer is below as another draft section from my book Empirical software engineering with R. As always comments and pointers to more data

Read more »

Unique bytes in a sliding window as a file content signature

July 21, 2013
By
Unique bytes in a sliding window as a file content signature

I was at a workshop a few months ago where a speaker pointed out a useful technique for spotting whether a file contains compressed data, e.g., a virus hidden in a script by compressing it to look like a jumble of numbers. Compressed data contains a uniform distribution of byte values (after all, compression is

Read more »

Preferential attachment applied to frequency of accessing a variable

May 17, 2013
By
Preferential attachment applied to frequency of accessing a variable

If, when writing code for a function, up to the current point in the code distinct local variables have been accessed for reading times (), will the next read access be from a previously unread local variable and if not what is the likelihood of choosing each of the distinct variables (global variables are ignored

Read more »

Prioritizing project stakeholders using social network metrics

April 20, 2013
By
Prioritizing project stakeholders using social network metrics

Identifying project stakeholders and their requirements is a very important factor in the success of any project. Existing techniques tend to be very ad-hoc. In her PhD thesis Soo Ling Lim came up with a very interesting solution using social network analysis and what is more made her raw data available for download I have

Read more »

Never too experienced to make a basic mistake

April 15, 2013
By

I was one of the 170 or so people at the Data Science hackathon in London over the weekend. As always this was well run by Carlos and his team who kept us fed, watered and connected to the Internet. One of the three challenges involved a dataset containing pairs of Twitter users, A and

Read more »

Push hard on a problem here and it might just pop up over there

April 2, 2013
By

One thing I have noticed when reading other peoples’ R code is that their functions are often a lot longer than mine. Writing overly long functions is a common novice programmer mistake, but the code I am reading does not look like it is written by novices (based on the wide variety of base functions

Read more »

R needs some bureaucracy

March 12, 2013
By

Writing a program in R is almost bureaucracy free: variables don’t need to be declared, the language does a reasonable job of guessing the type a value might need to be automatically be converted to, there is no need to create a function having a special name that gets called at program startup, the commonly

Read more »

The most worthwhile R coding guidelines I know

March 2, 2013
By
The most worthwhile R coding guidelines I know

Since my post questioning whether native R usage exists (e.g., a common set of R coding patterns) several people have asked about coding/style guidelines for R. My approach to style/coding guidelines is economic, adhering to a guideline involves paying a cost now for some future benefit. Obviously to be worthwhile the benefit must be greater

Read more »

Does native R usage exist?

February 22, 2013
By

Note to R users: Users of other languages enjoy spending lots of time discussing the minutiae of the language they use, something R users don’t appear to do; perhaps you spend your minutiae time on statistics which I don’t yet know well enough to spot when it occurs). There follows a minutiae post that may

Read more »