I make using OpenBUGS fun (and easier)! I've been a BUGS, WinBUGS and OpenBUGS user for some time now (20 years and counting!). The combination of R and OpenBUGS using the R2OpenBUGS package allows the user to bring together data preparation...

In a previous post, I discussed different approaches to speeding up some loops in data frames. In particular, R data frames provide a simple framework for representing large cohorts of agents in stochastic epidemiological models, such as those representing disease … Continue reading →

In case you missed them, here are some articles from June of particular interest to R users. The FDA goes on the record that it's OK to use R for drug trials. A review of talks at the useR! 2012 conference. Using the negative binomial distribution to convert monthly fecundity into the chances of having a baby in a...

A couple of days ago, I had posted a short Python script to convert numpy files into a simple binary format which R can read quickly. Nice, but still needing an extra file. Shortly thereafter, I found Carl Rogers cnpy library which makes reading and writing numpy files from C++ a breeze, and I quickly wrapped this up into a new package...

http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html Q: How do you make a hairless primate? Answer 1: Take a hairy primate, wait a few million years and see if Darwin was right. Answer 2: Make them work i...

Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC...

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers) After struggling for quite a walk on that AMSI public lecture talk, and dreading its loss with the problematic Macbook, I managed to complete a first draft last night in Adelaide, downloading a final set of images from the Web...

The TIOBE Community Programming Index ranks the popularity of programming languages, but from a programming language perspective rather than as analytical software (http://www.tiobe.com). It extracts measurements from blogs, entries in Wikipedia, books on Amazon, search engine results, etc. and combines them into a single index. … Continue reading →

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia, 5-7 December 2012 http://ausdm12.togaware.com/ The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. This year’s conference, AusDM’12, co-hosted … Continue reading →

by Yanchang Zhao, RDataMining.com It is a 270-page book on data mining with Excel. It can be downloaded as a PDF file at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.1393&rep=rep1&type=pdf. Below is its table of contents. - Overview of the Data Mining Process - Data Exploration … Continue reading →

In previous posts I described how to input data stored on GitHub directly into R. You can do the same thing with source code stored on GitHub. Hadley Wickham has actually made the whole process easier by combining the getURL, textConnection, and source commands into one function: source_url. This is in his devtools...

(by Dimitris Rizopoulos) Dear R-users, I’d like to announce the release of version 1.0-0 of package JM (already available from CRAN) for the joint modeling of longitudinal and time-to-event data using shared parameter models. These models are applicable in mainly two settings. First, when focus is in the survival outcome and we wish to account for the effect of an...

The 13th in Project Euler is one big number problem: Work out the first ten digits of the sum of the following one-hundred 50-digit numbers. Obviously, there are some limits in machine representation of numbers. In R, 2^(-1074) is the smallest … Continue reading →

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization algorithms like L-BFGS and the Nelder-Mead

If you haven’t worked with the gWidgets package it’s worth some time exploring it which is what I’ve been doing for a little paleo project I’ve been working on. After struggling with the few demos and tutorials I could find I went ahead and bought the book: Programming Graphical User Interfaces in R. Luckily the

Geography is often about statistics as it is the basis for fast exchange of information: providing a mean and standard deviation to the audience is often much easier then showing raw data: Learning a script language for this purpose can be a hard-ass work. But I think it is more often a need of practice.

The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form: enjoyCompany tooMuchFamily 1 strongly agree strongly disagree 2 strongly agree strongly

R has had a maps package available since the very early days. It's great for simple geographic maps, but the political boundaries can be out of date. For more detailed maps, you can also download shape files and use the sp package to draw borders directly. But for accurate and attractive maps of countries, roads and satellite imagery, nothing...

Global Biodiversity Information Facility or GBIF is an international consortium working towards making Biodiversity information available through single portal to everyone. GBIF with its partners are working towards mobilizing data, developing data and metadata standards, developing distributed database system and making the data accessible through APIs. At this point this largest single window data source covering wide spectrum of taxa and

While traditional statistics courses teach students to calculate intervals and test for binomial proportions using a normal or t approximation, this method does not always work well. Agresti and Coull ("Approximate is better than "exact' for interval estimation of binomial proportions". The American Statistician, 1998; 52:119-126) demonstrated this and reintroduced an...

Besides microscopic images in our routine, common photos are frequently taken to measure quantitative plant features, such as leaf area, root length, branch numbers, etc. Scientific software is available for manual processing. For example, to measure the root length, one need to use the … Continue reading →