High-Performance in Cloud Computing

August 11, 2011

(This article was first published on cloudnumbers.com » R-project, and kindly contributed to R-bloggers)

Very often scientists are worried about performance and security in cloud computing. Especially, when talking about High-Performance Computing (HPC) in the cloud it is a very important aspect to proof efficient calculation speed in the cloud.

Cloud computing describes a new delivery model for IT services based on Internet protocols, and it typically involves provisioning of dynamically scalable and virtualized resource. The aspect of virtualization generates concerns about the performance of their application running on a virtualized machine.There are several good scientific publications which proof, that today virtualization technologies do not influence the performance of virtualized environments (e.g., High-performance aspects in virtualized infrastructures; Danciu, V.A.; Felde, N.G.;Kranzlmüller, D.; Lindinger, T.). Furthermore, the aspect of world-wide distributed hardware resources generates concerns about the performance of parallel computing applications and the communication between the machines.

cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. We started several benchmarks to proof the usability of high-performance computing in the cloud.

For our first benchmark we used the statistical software R (www.r-project.org) and solved 200 sudokus. We used the available R package sudoku to create and solve sudokus. Furthermore, we implemented the R packages multicore and snow (with MPI) to solve the sudokus distributed to several machines or cores. As hardware we used the High-CPU Extra Large Instance resources from Amazon with 8 virtual cores with 2.5 EC2 Compute Units each. The computer cluster was a small configuration with 4 instances provided and configured by cloudnumbers.com. To omit failures from network traffic or any other foreign influences we replicated the computation ten times and measured the average calculation times.

Figure 1 shows the computation time for solving 200 sudokus on 1 to 8 cores. Solving 200 sudokus on one machine takes less than 200 seconds and with 8 cores the computation time can be reduced to less than 29 seconds (speedup of factor 7). The boxplots show the variance in the computation time of the 10 replicates. There is nearly no variance in the computation time which is a very good indicator for stable computing resources.

Figure 1: Boxplots for computation times on different numbers of cores (on one multicore machine)


The second figure plots the speedup (serial computation time compared to the parallel computation time) for the multicore calculation and the snow calculation using MPI. As expected due to the design of the experiment we get linear speedup curves. The blue line ends at 8 cores because there are only 8 cores available in the multicore machine.

Figure 2: Speedup for solving 200 sudokus. The dotted line represents the theoretical maximal speedup.


A similar benchmark was executed at the super computer HLB2 (LRZ, Munich, Germany) which demonstrated very similar results (Parallel Computing with the R Language in a Supercomputing Environment; Markus Schmidberger and Ulrich Mansmann).

With this first performance benchmark we demonstrate that calculation times of applications are very stable on virtualized cloud resources and that there are no time delays by network traffic for communication between our computing nodes. Therefore, for this example with a limited number of resources the same speed improvements as on a local computer cluster environment can be expected from a computer cluster in the cloud. This conclusion holds for the computer cluster environments provided by cloudnumbers.com.

Please keep on following cloudnumebrs.com’s blog for further benchmarks in HPC cloud computing, or register and test for free now.

To leave a comment for the author, please follow the link and comment on their blog: cloudnumbers.com » R-project.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)