Like many people who crunch numbers frequently, I have increasingly been integrating Amazon’s cloud computing services into my daily workflow. In particular, I have been using their elastic cloud computing (EC2) on a regular basis. The service is an excellent way to offload computationally intensive work from your laptop for literally pennies on the dollar.
One drawback that I have found, however, is there are not any obvious pre-configured images, called AMIs, designed for scientific computing in the languages I use most: Python and R. The best public AMI I could find was an Ubuntu 10 image provided by the good people at MIT’s STARDEV Project, which includes several useful libraries pre-installed and optimized versions of core scientific Python libraries. This AMI is great, but was still missing several Python packages I use on a regular basis (NetworkX, scikits.learn, sympy, etc.), and had an old version of R with only base packages installed. This would simply not do.
Thus began the odyssey of modifying the StarCluster AMI to more fully support scientific computing in Python in R. I have now uploaded and made public the resulting image, which includes several hundred Python and R packages for scientific computing, statistics, machine learning, data mining and visualization. To access the AMI you can either search for the source name:
Or, access it directly with the AMI ID:
This will only interest those that have AWS accounts for scientific computing in these languages, but I hope for those of you in that niche it is a useful convenience. For those unfamiliar with EC2 I highly recommend this tutorial, and this more detailed set of instructions for work with EC2 on the command-line. Also, Amazon is very generous with research grants for teachers and students at all levels, so if cost is a barrier you should consider applying for an educational grant.