Big Data Analytics with H20 in R Exercises -Part 1

September 22, 2017
By

(This article was first published on R-exercises, and kindly contributed to R-bloggers)


We have dabbled with RevoScaleR before , In this exercise we will work with H2O , another high performance R library which can handle big data very effectively .It will be a series of exercises with increasing degree of difficulty . So Please do this in sequence .
H2O requires you to have Java installed in your system .So please install Java before trying with H20 .As always check the documentation before trying these exercise set .
Answers to the exercises are available here.
If you want to install the latest release from H20 , install it via this instructions .

Exercise 1
Download the latest stable release from h20 and initialize the cluster

Exercise 2
Check the cluster information via clusterinfo

Exercise 3
You can see how h2o works via the demo function , Check H2O’s glm via demo method .

Exercise 4

down load the loan.csv from H2O’s github repo and import it using H2O .
Exercise 5
Check the type of imported loan data and notice that its not a dataframe , check the summary of the loan data .
Hint -use h2o.summary()

Exercise 6
One might want to transfer a dataframe from R environment to H2O , use as.h2o to conver the mtcars dataframe as a H2OFrame

Learn more about importing big data in the online course Data Mining with R: Go from Beginner to Advanced. In this course you will learn how to

  • work with different data import techniques,
  • know how to import data and transform it for a specific moddeling or analysis goal,
  • and much more.

Exercise 7

Check the dimension of the loan H2Oframe via h2o.dim

Exercise 8
Find the colnames from the H2OFrame of loan data.

Exercise 9

Check the histogram of the loan amount of loan H2Oframe .

Exercise 10
Find the mean of loan amount by each home ownership group from the loan H2OFrame

To leave a comment for the author, please follow the link and comment on their blog: R-exercises.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)