Blog Archives

Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

October 12, 2019
By
Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

Introduction In the previous part of this series, we looked at writing R functions that can be executed directly by Spark without serialization overhead with a focus on writing functions as combinations of dplyr verbs and investigated how the SQL is generated and Spark plans created. In this third part, we will look at how to write R functions that generate...

Read more »

Using Spark from R for performance with arbitrary code – Part 2 – Constructing functions by piping dplyr verbs

September 21, 2019
By
Using Spark from R for performance with arbitrary code – Part 2 – Constructing functions by piping dplyr verbs

Introduction In the first part of this series, we looked at how the sparklyr interface communicates with the Spark instance and what this means for performance with regards to arbitrarily defined R functions. We also examined how Apache Arrow can increase the performance of data transfers between the R session and the Spark instance. In this second part, we will look...

Read more »

Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

August 31, 2019
By
Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Introduction Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining the ability to use R...

Read more »

Posts

August 10, 2019
By

. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. To leave a comment for the author, please follow the link and comment on their blog:...

Read more »

Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

August 10, 2019
By
Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

Introduction In the previous post, we focused on setting up declarative Jenkins pipelines with emphasis on parametrizing builds and using environment variables across pipeline stages. In this post, we look at various tips that can be useful when automating R application testing and continuous integration, with regards to orchestrating parallelization, combining sources from multiple git repositories and ensuring proper access right...

Read more »

Using environment variables and parametrized builds for automating R applications with Jenkins

July 27, 2019
By
Using environment variables and parametrized builds for automating R applications with Jenkins

Introduction Jenkins is a popular open-source tool that helps teams with automation and implementation of continuous integration and deployment pipelines, comparable to for example Atlassian’s Bamboo, GitLab CI or to some extent Travis. In this post, we share some practical lessons learned when integrating R applications via Jenkins for the purpose of continuous integration and regression testing on runner nodes configured...

Read more »

How data.table’s fread can save you a lot of time and memory, and take input from shell commands

June 22, 2019
By
How data.table’s fread can save you a lot of time and memory, and take input from shell commands

Introduction Recently I was involved in a task that included reading and writing quite large amounts of data, totaling more than 1 TB worth of csvs without the standard big data infrastructure. After trying multiple approaches, the one that made this possible was using data.table’s reading and writing facilities - fread() and fwrite(). This motivated me to look at benchmarking data.table’s...

Read more »

How to interactively examine any R code – 4 ways to not just read the code, but delve into it step-by-step

May 25, 2019
By
How to interactively examine any R code – 4 ways to not just read the code, but delve into it step-by-step

Introduction As pointed out by a recent read the R source post on the R hub’s website, reading the actual code, not just the documentation is a great way to learn more about programming and implementation details. But there is one more activity to get even more hands-on experience and understanding of the code in practice. In this post, we provide...

Read more »

Porting and redirecting a Hugo-based blogdown website to an HTTPS-enabled custom domain and how to do it the easy way

May 11, 2019
By
Porting and redirecting a Hugo-based blogdown website to an HTTPS-enabled custom domain and how to do it the easy way

Introduction As we wrote in Should you start your R blog now?, blogging has probably never been more accessible to the general population, R users included. Usually, the simplest solution is to host your blog via a service that provides it for free, such as Netlify, GitHub or GitLab Pages. But what if you want to host that awesome blog...

Read more »

Setting up continuous multi-platform R package building, checking and testing with R-Hub, Docker and GitLab CI/CD for free, with a working example

April 27, 2019
By
Setting up continuous multi-platform R package building, checking and testing with R-Hub, Docker and GitLab CI/CD for free, with a working example

Introduction In the previous post, we looked at how to easily automate R analysis, modeling, and development work for free using GitLab’s CI/CD. Together with the fantastic R-hub project, we can use GitLab CI/CD to do much more. In this post, we will take it to the next level by using R-hub to test our development work on many different platforms...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)