Getting numpy data into R — Take Two

July 10, 2012
By

A couple of days ago, I had posted a short Python script to convert numpy files into a simple binary format which R can read quickly. Nice, but still needing an extra file. Shortly thereafter, I found Carl Rogers cnpy library which makes reading and writing numpy files from C++ a breeze, and I quickly wrapped this up into a new package...

Read more »

This is *huge*: SAScii package

July 10, 2012
By

http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html Q: How do you make a hairless primate? Answer 1: Take a hairy primate, wait a few million years and see if Darwin was right. Answer 2: Make them work i...

Read more »

Importing public data with SAS instructions into R

July 10, 2012
By

Many public agencies release data in a fixed-format ASCII (FWF) format. But with the data all packed together without separators, you need a "data dictionary" defining the column widths (and metadata about the variables) to make sense of them. Unfortunately, many agencies make such information available only as a SAS script, with the column information embedded in a PROC...

Read more »

introduction to R: learning by doing (part 2: plots)

July 10, 2012
By
introduction to R: learning by doing (part 2: plots)

Lets go one with the second part of learning R by doing R (you will find the first part here. As we have used vectors, matrices and loops in the first part, we will concentrate on graphics in this one. but first we will need data to plot: Sometimes you will need several plots in

Read more »

simulation, an ubiquitous tool

July 10, 2012
By
simulation, an ubiquitous tool

(This article was first published on Xi'an's Og » R, and kindly contributed to R-bloggers) After struggling for quite a walk on that AMSI public lecture talk, and dreading its loss with the problematic Macbook, I managed to complete a first draft last night in Adelaide, downloading a final set of images from the Web...

Read more »

Visualizing Graphical Models

July 10, 2012
By
Visualizing Graphical Models

I'm anticipating presenting research of mine based on Bayesian graphical models to an audience that might not be familiar with them. When presenting ordinary regression results, there's already the sort of statistical sniper questions along the lines o...

Read more »

SAS Beats R on July 2012 TIOBE Rankings

July 10, 2012
By
SAS Beats R on July 2012 TIOBE Rankings

The TIOBE Community Programming Index ranks the popularity of programming languages, but from a programming language perspective rather than as analytical software (http://www.tiobe.com). It extracts measurements from blogs, entries in Wikipedia, books on Amazon, search engine results, etc. and combines them into a single index. … Continue reading →

Read more »

2nd CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

July 10, 2012
By
2nd CFP: the 10th Australasian Data Mining Conference (AusDM 2012)

The Tenth Australasian Data Mining Conference (AusDM 2012) Sydney, Australia, 5-7 December 2012 http://ausdm12.togaware.com/ The Australasian Data Mining Conference has established itself as the premier Australasian meeting for both practitioners and researchers in data mining. This year’s conference, AusDM’12, co-hosted … Continue reading →

Read more »

Data Mining In Excel: Lecture Notes and Cases

July 10, 2012
By
Data Mining In Excel: Lecture Notes and Cases

by Yanchang Zhao, RDataMining.com It is a 270-page book on data mining with Excel. It can be downloaded as a PDF file at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.83.1393&rep=rep1&type=pdf. Below is its table of contents. - Overview of the Data Mining Process - Data Exploration … Continue reading →

Read more »

Sourcing Code from GitHub

July 10, 2012
By
Sourcing Code from GitHub

In previous posts I described how to input data stored on GitHub directly into R. You can do the same thing with source code stored on GitHub. Hadley Wickham has actually made the whole process easier by combining the getURL, textConnection, and source commands into one function: source_url. This is in his devtools...

Read more »

Package JM — version 1.0-0

July 10, 2012
By

(by Dimitris Rizopoulos) Dear R-users, I’d like to announce the release of version 1.0-0 of package JM (already available from CRAN) for the joint modeling of longitudinal and time-to-event data using shared parameter models. These models are applicable in mainly two settings. First, when focus is in the survival outcome and we wish to account for the effect of an...

Read more »

Review: Kölner R Meeting 6 July 2012

July 10, 2012
By
Review: Kölner R Meeting 6 July 2012

The second Cologne R user meeting took place last Friday, 6 July 2012, at the Institute of Sociology. Thanks to Bernd Weiß, who provided the meeting room, we didn't have to worry about the infrastructure, like we did at our first gathering. Again, we ...

Read more »

Project Euler — problem 13

July 9, 2012
By

The 13th in Project Euler is one big number problem: Work out the first ten digits of the sum of the following one-hundred 50-digit numbers. Obviously, there are some limits in machine representation of numbers. In R, 2^(-1074) is the smallest … Continue reading →

Read more »

Optimization Functions in Julia

July 9, 2012
By
Optimization Functions in Julia

Over the last few weeks, I’ve made a concerted effort to develop a basic suite of optimization algorithms for Julia so that Matlab programmers used to using fminunc() and R programmers used to using optim() can start to transition code over to Julia that requires access to simple optimization algorithms like L-BFGS and the Nelder-Mead

Read more »

Preview of functional programming syntax for futile.paradigm 2.1

July 9, 2012
By
Preview of functional programming syntax for futile.paradigm 2.1

I’m developing a streamlined syntax for the next release of futile.paradigm. While this version is backwards compatible, it introduces a …Continue reading »

Read more »

A gWidgets GUI for climate data

July 9, 2012
By
A gWidgets GUI for climate data

If you haven’t worked with the gWidgets package it’s worth some time exploring it which is what I’ve been doing for a little paleo project I’ve been working on. After struggling with the few demos and tutorials I could find I went ahead and bought the book: Programming Graphical User Interfaces in R. Luckily the

Read more »

introduction to R: learning by doing (part 1)

July 9, 2012
By
introduction to R: learning by doing (part 1)

Geography is often about statistics as it is the basis for fast exchange of information: providing a mean and standard deviation to the audience is often much easier then showing raw data: Learning a script language for this purpose can be a hard-ass work. But I think it is more often a need of practice.

Read more »

Fumblings with Ranked Likert Scale Data in R

July 9, 2012
By
Fumblings with Ranked Likert Scale Data in R

The code is horrible and the visualisations quite possibly misleading, but I’m dead tired and there are a couple of tricks in the following R code that I want to remember, so here’s a contrived bit of fumbling with some data of the form: enjoyCompany tooMuchFamily 1 strongly agree strongly disagree 2 strongly agree strongly

Read more »

A Napa Valley wine tasting map, made with R and ggmap

July 9, 2012
By
A Napa Valley wine tasting map, made with R and ggmap

R has had a maps package available since the very early days. It's great for simple geographic maps, but the political boundaries can be out of date. For more detailed maps, you can also download shape files and use the sp package to draw borders directly. But for accurate and attractive maps of countries, roads and satellite imagery, nothing...

Read more »

Map biodiversity records with rgbif, maps and ggplot2 packages in R

July 9, 2012
By
Map biodiversity records with rgbif, maps and ggplot2 packages in R

Global Biodiversity Information Facility or GBIF is an international consortium working towards making Biodiversity information available through single portal to everyone.  GBIF with its partners are working towards mobilizing data, developing data and metadata standards, developing distributed database system and making the data accessible through APIs. At this point this largest single window data source covering wide spectrum of taxa and

Read more »

Example 9.37: (Mis)behavior of binomial confidence intervals

July 9, 2012
By
Example 9.37: (Mis)behavior of binomial confidence intervals

While traditional statistics courses teach students to calculate intervals and test for binomial proportions using a normal or t approximation, this method does not always work well. Agresti and Coull ("Approximate is better than "exact' for interval estimation of binomial proportions". The American Statistician, 1998; 52:119-126) demonstrated this and reintroduced an...

Read more »

Trend and Spatial Pattern of Poverty in the Philippines

July 9, 2012
By
Trend and Spatial Pattern of Poverty in the Philippines

In a teaching demo that I have conducted, I discussed on how R can be used to analyze trends and spatial pattern of poverty incidence in the Philippines. Playing on the data I got from the National Statistical Coordination Board below is what I got.&...

Read more »

leaf area measuring — R package “EBImage”

July 9, 2012
By
leaf area measuring — R package “EBImage”

Besides microscopic images in our routine, common photos are frequently taken to measure quantitative plant features, such as leaf area, root length, branch numbers, etc. Scientific software is available for manual processing. For example, to measure the root length, one need to use the … Continue reading →

Read more »

Network Visualization of Key Driver Analysis

July 8, 2012
By
Network Visualization of Key Driver Analysis

Whatever happened to those evaluations that your airline asked you to complete after taking a flight? They ask you for a number of ratings about buying your ticket, attributes of the plane, the service you received, and if you were satisfied, if you wo...

Read more »

Bubble Plots (ggplot2)

July 8, 2012
By
Bubble Plots (ggplot2)

1 Introduction Rarely have I seen a three dimension graph including time, value, and volatility. It is essenti

Read more »

New package RcppCNPy with release 0.1.0 (and 0.0.1 earlier last week)

A few days ago I had blogged about getting NumPy data in R by using a simple converter script. That works fine, but it is a little annoying to have to write an entire file only to read from it again. So I kept looking around for a better solution---and soon found the cnpy library by Carl Rogers which provides simple C++...

Read more »

Representation of numerical NA’s in R and the 1954 enigma

July 8, 2012
By
Representation of numerical NA’s in R and the 1954 enigma

I've always wondered how exactly the missing value (NA) in R is represented under the hood. Last weekend I was working on a little project that gave me enough excuse to spend some time on finding this out. So, I … Continue reading →

Read more »

Fitting a dynamic model, and determining the number of parameters that can be fitted.

July 8, 2012
By
Fitting a dynamic model, and determining the number of parameters that can be fitted.

Let's suppose that we have the same dynamic model we presented before - that is, the Lorentz system of differential equations. Remember? In order to perform a fitting we need to define an objective function of sort: this will then be minimised. Now,...

Read more »

Universal portfolio, part 7

July 7, 2012
By
Universal portfolio, part 7

After reproducing all original figures and tables from Universal Portfolios, R coupled with modern processors allows to perform some more analysis.First we calculate the final wealth of the universal portfolio for all possible pairs of stocks, and...

Read more »