Uncategorized

Ack! Duplicates in the Data!

May 3, 2012 | inkhorn82

As I mentioned in a previous post, I compiled the data set that I’m currently working on in PostgreSQL.  To get this massive data set, I had to write a query that was massive by dint of the number of … Continue reading → [Read more...]

EU rules that computer languages cannot be copyrighted

May 2, 2012 | Derek-Jones

The European Court of Justice has published its decision in SAS v WPL; the title of the press release says it all “The functionality of a computer program and the programming language cannot be protected by copyright”. To summarise the background, World Programming Ltd developed a system that was capable ... [Read more...]

Mining for relations between nominal variables

May 1, 2012 | inkhorn82

The task today was to find what variables had significant relations with an important grouping variable in the big dataset I’ve been working with lately.  The grouping variable has 3 levels, and represents different behaviours of interest.  At first I … Continue reading → [Read more...]

Incompetence borne of excessive cleverness

April 29, 2012 | Derek-Jones

I have just got back from the 24 hour Data Science Global Hackathon; I was an on-site participant at Hub Westminster in London (thanks to Carlos and his team for doing such a great job looking after us all {around 50 turned up from the 100 who registered; the percentage was similar in ... [Read more...]

An academic programming language paper about R

April 27, 2012 | Derek-Jones

The R language has passed another milestone, a paper aimed at the academic programming language community (or at least one section of this community) has been written about it, Evaluating the Design of the R Language by Morandat, Hill, Osvald and Vitek. Hardly earth shattering news, but it may have ... [Read more...]

R Tips: lots of tips for R programming

April 26, 2012 | Yanchang Zhao

by Yanchang Zhao, RDataMining.com There are more than 100 R tips at http://pj.freefaculty.org/R/Rtips.html, which provide quick examples to small challenges in everyday R programming, especially for users switching from other languages to R. There is also a .PDF version for … Continue reading → [Read more...]

Projects in RStudio

April 24, 2012 | inkhorn82

Now that I have one enormous project on the go and one smaller one, I find it’s helping me considerably to have each project stored in separate RStudio projects.  So, each project has its own scripting that I’ve been working … Continue reading → [Read more...]

User Input in R vs Python

April 18, 2012 | Abraham Mathew

Both R and Python have facilities where the coder can write a script which requests a user to input some information. In Python 2.6, the main function for this task is raw_input (in Python 3.0, it’s input()). In R, there are a series of functions that can be used to ... [Read more...]

Fun Editing R Graphs in Inkscape

April 12, 2012 | inkhorn82

Last week, I read a chapter out of Visualize This by Nathan Yau.  I was, of course, delighted to see that he was championing the use of R.  One really cool thing that I learned from his book, and was very … Continue reading → [Read more...]

Nick Stokes Distance code, now with Big Memory

April 12, 2012 | Steven Mosher

In my last post I was struggling with getting a big memory version of the distance matrix to work fast. Nick and other readers had some suggestions and after puttering around with Nicks code I’ve adapted it to big memory and not impacted the run time very much. For ... [Read more...]

Sampling and the Analysis of Big Data

April 8, 2012 | inkhorn82

After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading → [Read more...]

Using bigmemory for a distance matrix

April 7, 2012 | Steven Mosher

The process of working on metadata and temperature series gives rise to several situations where I need to calculate the distance from every station to every other station. With a small number of stations this can be done easily on the fly with the result stored in a matrix. The ... [Read more...]

Go faster R for Google’s summer of code 2012

March 28, 2012 | Derek-Jones

The R Foundation has been accepted for Google’s summer of code and I thought I would suggest a few ideas for projects. My interests are in optimization and source code analysis, so obviously the suggestions involve these topics. There are an infinite number of possible optimizations that can be ... [Read more...]

Social Network Analysis with R

March 27, 2012 | Yanchang Zhao

By Yanchang Zhao, RDataMining.com If you have tried social network analysis or graph mining with R, you might have already come across package igraph before. The package is designed for graphs and network analysis in R. It can handle large … Continue reading → [Read more...]

Metadata Dubuque and UHI

March 21, 2012 | Steven Mosher

I’m in the process of remaking all the metadata from scratch and looking once again at the question of UHI. There are not any global conclusions we can draw from the data yet; I’m just in the process of checking out everything that is available that could be ... [Read more...]
1 2 3 4 5 16

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)