Posts Tagged ‘ Uncategorized ’

Visualising the Path of a Genetic Algorithm

April 23, 2012
By
Visualising the Path of a Genetic Algorithm

We quite regularly use genetic algorithms to optimise over the ad-hoc functions we develop when trying to solve problems in applied mathematics. However it’s a bit disconcerting to have your algorithm roam through a high dimensional solution space while not being able to picture what it’s doing or how close one solution is to another. … Continue reading...

Read more »

118 years of US State Weather Data

April 22, 2012
By
118 years of US State Weather Data

A recent post on the Junkcharts blog looked at US weather dataand the importance of explaining scales (which in this case went up to 118). Ultimately, it turns out that 118 is the rank of the data compared to the previous 117 years of data (in ascending order, so that 118 is the highest). At … Continue reading...

Read more »

PostgreSQL, Excel, R, and a Really Big Data Set!

April 19, 2012
By
PostgreSQL, Excel, R, and a Really Big Data Set!

At work I’ve started to work with the biggest data set I’ve ever seen!  First, let me qualify my use of the term “Big Data”.  The number of rows in the resultant data set (after much transformation and manipulation in … Continue reading

Read more »

User Input in R vs Python

April 18, 2012
By

Both R and Python have facilities where the coder can write a script which requests a user to input some information. In Python 2.6, the main function for this task is raw_input (in Python 3.0, it’s input()). In R, there are a series of functions that can be used to request an input from the user,

Read more »

Fun Editing R Graphs in Inkscape

April 12, 2012
By
Fun Editing R Graphs in Inkscape

Last week, I read a chapter out of Visualize This by Nathan Yau.  I was, of course, delighted to see that he was championing the use of R.  One really cool thing that I learned from his book, and was very … Continue reading

Read more »

Nick Stokes Distance code, now with Big Memory

April 12, 2012
By
Nick Stokes Distance code, now with Big Memory

In my last post I was struggling with getting a big memory version of the distance matrix to work fast. Nick and other readers had some suggestions and after puttering around with Nicks code I’ve adapted it to big memory and not impacted the run time very much. For comparison writing a 10K by 10K

Read more »

Eclipse + Rcpp + RInside = Magic

April 8, 2012
By
Eclipse + Rcpp + RInside = Magic

  I've been doing R/Java development for some time, creating packages both large and small with this tool chain. I have used Eclipse as my package development environment almost exclusively, but I didn't realize how much I relied on the IDE before I had to do some serious R/C++ package development. My first Rcpp package (wordcloud)

Read more »

Sampling and the Analysis of Big Data

April 8, 2012
By
Sampling and the Analysis of Big Data

After my last post, I came across a few articles supporting the opinion that if you have a good reason to take random samples from a “big” dataset, you’re not committing some kind of sin: Big Data Blasphemy: Why Sample? … Continue reading

Read more »

Using bigmemory for a distance matrix

April 7, 2012
By
Using bigmemory for a distance matrix

The process of working on metadata and temperature series gives rise to several situations where I need to calculate the distance from every station to every other station. With a small number of stations this can be done easily on the fly with the result stored in a matrix. The matrix has rows and columns

Read more »

Obama administration unveiled a Big Data Research and Development Initiative with $200 million

April 4, 2012
By
Obama administration unveiled a Big Data Research and Development Initiative with $200 million

Yanchang Zhao, RDataMining.com Obama administration unveiled a Big Data Research and Development Initiative with $200 million on March 29, 2012, to improve the ability to extract knowledge and insights from large and complex collections of digital data. Six Federal departments … Continue reading

Read more »