We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence .
H2O requires you to have Java installed in your system .So please install Java before trying with H20 .As always check the documentation before trying these exercise set .
Answers to the exercises are available here.
If you want to install the latest release from H20 , install it via this instructions .
Download the latest stable release from h20 and initialize the cluster
Check the cluster information via clusterinfo
You can see how h2o works via the demo function , Check H2O’s glm via demo method .
down load the loan.csv from H2O’s github repo and import it using H2O .
Check the type of imported loan data and notice that its not a dataframe , check the summary of the loan data .
Hint -use h2o.summary()
One might want to transfer a dataframe from R environment to H2O , use as.h2o to conver the mtcars dataframe as a H2OFrame
Check the dimension of the loan H2Oframe via h2o.dim
Find the colnames from the H2OFrame of loan data.
Check the histogram of the loan amount of loan H2Oframe .
Find the mean of loan amount by each home ownership group from the loan H2OFrame