Blog Archives

sparklyr: a test drive on YARN

sparklyr: a test drive on YARN

sparklyr is a new R front-end for Apache Spark, developed by the good people at RStudio. It offers much more functionality compared to the existing SparkR interface by Databricks, allowing both dplyr-based data transformations, as well as access to the machine learning libraries of both Spark and H2O Sparkling Water. Moreover, the latest RStudio IDE v1.0 now offers native...

Read more »

Classification in Spark 2.0: “Input validation failed” and other wondrous tales

September 6, 2016
By

Spark 2.0 has been released since last July but, despite the numerous improvements and new features, several annoyances still remain and can cause headaches, especially in the Spark machine learning APIs. Today we’ll have a look at some of them, inspired by a recent answer of mine in a Stack Overflow question (the question was about Spark 1.6 but,...

Read more »

Installing the additional R packages in Oracle Big Data Lite VM 4.5.0

Oracle has just released version 4.5.0 of the Big Data Lite VM which, when it comes to R, still suffers from the issues we had pinpointed for the previous version 4.4.0 (and then some). The first attempt to install the additional packages fails with a ‘cannot open URL’ error: Fortunately, the warning about the proxy helps to locate the...

Read more »

How to use SparkR in Cloudera Hadoop

Suppose you are an avid R user, and you would like to use SparkR in Cloudera Hadoop; unfortunately, as of the latest CDH version (5.7), SparkR is still not supported (and, according to a recent discussion in the Cloudera forums, we shouldn’t expect this to happen anytime soon). Is there anything  you can do? Well, indeed there is. In...

Read more »

Installing the additional R packages in Oracle Big Data Lite VM 4.4.0

February 23, 2016
By

In the just-released version 4.4.0 of Oracle Big Data Lite VM, as in the previous one (4.3.0.1), there is a rather large number of additional R packages to be installed by the provided script install_additional_packages.sh, i.e. 28 packages without counting their dependencies (the respective number in version 4.2.1 was only 10). Unfortunately, what has also changed is the form...

Read more »

Using ROracle with Oracle Instant Client 12c

February 18, 2016
By

The other day, while setting up the new Oracle R Enterprise (ORE) 1.5 client packages in a Linux server, we installed the Oracle DB Instant Client v. 12.1, as advised in the relevant documentation. Problem was, ORE failed to load, in fact due to ROracle failure: Truth is, the file libclntsh.so.11.1 did not exist, but this was expected, simply...

Read more »

Querying Big Data SQL tables with Oracle R Enterprise

February 15, 2016
By
Querying Big Data SQL tables with Oracle R Enterprise

I was wondering recently if I could use Oracle R Enterprise (ORE) to query Big Data SQL tables (i.e. Oracle Database external tables based on HDFS or Hive data), since I have never seen such a combination mentioned in the relevant Oracle documentation and white papers. I am happy to announce that the answer is an unconditional yes. In...

Read more »

Manipulating Hive tables with Oracle R connectors for Hadoop

November 12, 2015
By

In this post, we’ll have a look at how easy it is to manipulate Hive tables using Oracle R connectors for Hadoop (ORCH, presently known as Oracle R Advanced Analytics for Hadoop – ORAAH). We will use the weblog data from Athens Datathon 2015, which we have already loaded in a Hive table named weblogs, as described in more...

Read more »

Log files exploration with Oracle Big Data Discovery 1.1

Log files exploration with Oracle Big Data Discovery 1.1

In a previous post, we described how we performed exploratory data analysis (EDA) in real-world log files, as provided by Skroutz.gr, the leading online company in Greece for online price comparison, in the context of Athens Datathon 2015. In the present post we will have a look at the same job as performed with Oracle Big Data Discovery (v....

Read more »

Installing RStudio & additional R packages in Oracle Big Data Lite VM 4.2.1

Installing RStudio & additional R packages in Oracle Big Data Lite VM 4.2.1

I was very happy to find out that, in the latest version (4.2.1) of Oracle Big Data Lite VM, all the R-related issues I had located and reported in the past (see here and here) have been resolved. Nevertheless, some new issues have emerged. Below are my findings and workarounds (if you are in a hurry, feel free to...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)