Blog Archives

R is turning 20 years old next Saturday. Here is how much bigger, stronger and faster it got over the years

February 22, 2020
By
R is turning 20 years old next Saturday. Here is how much bigger, stronger and faster it got over the years

Introduction It is almost the 29th of February 2020! A day that is very interesting for R, because it marks 20 years from the release of R v1.0.0, the first official public release of the R programming language. In this post, we will look back on the 20 years of R with a bit of history and 3 interesting perspectives -...

Read more »

Releasing and open-sourcing the Using Spark from R for performance with arbitrary code series

January 4, 2020
By
Releasing and open-sourcing the Using Spark from R for performance with arbitrary code series

Introduction Over the past months, we published and refined a series of posts on Using Spark from R for performance with arbitrary code. Since the posts have grown in size and scope the blogposts were no longer the best medium to share the content ...

Read more »

4 great free tools that can make your R work more efficient, reproducible and robust

December 21, 2019
By
4 great free tools that can make your R work more efficient, reproducible and robust

Introduction It is Christmas time again! And just like last year, what better time than this to write about the great tools that are available to all interested in working with R. This post is meant as a praise to a few selected tools and packages that helped me to be more efficient and productive with R in 2019. In this...

Read more »

Using Spark from R for performance with arbitrary code – Part 5 – Exploring the invoke API from R with Java reflection and examining invokes with logs

November 23, 2019
By
Using Spark from R for performance with arbitrary code – Part 5 – Exploring the invoke API from R with Java reflection and examining invokes with logs

Introduction In the previous parts of this series, we have shown how to write functions as both combinations of dplyr verbs, SQL query generators that can be executed by Spark and how to use the lower-level API to invoke methods on Java object references from R. In this fifth part, we will look into more details around sparklyr’s invoke() API, investigate...

Read more »

Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

November 9, 2019
By
Using Spark from R for performance with arbitrary code – Part 4 – Using the lower-level invoke API to manipulate Spark’s Java objects from R

Introduction In the previous parts of this series, we have shown how to write functions as both combinations of dplyr verbs and SQL query generators that can be executed by Spark, how to execute them with DBI and how to achieve lazy SQL statements that only get executed when needed. In this fourth part, we will look at how to write...

Read more »

Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

October 12, 2019
By
Using Spark from R for performance with arbitrary code – Part 3 – Using R to construct SQL queries and let Spark execute them

Introduction In the previous part of this series, we looked at writing R functions that can be executed directly by Spark without serialization overhead with a focus on writing functions as combinations of dplyr verbs and investigated how the SQL is generated and Spark plans created. In this third part, we will look at how to write R functions that generate...

Read more »

Using Spark from R for performance with arbitrary code – Part 2 – Constructing functions by piping dplyr verbs

September 21, 2019
By
Using Spark from R for performance with arbitrary code – Part 2 – Constructing functions by piping dplyr verbs

Introduction In the first part of this series, we looked at how the sparklyr interface communicates with the Spark instance and what this means for performance with regards to arbitrarily defined R functions. We also examined how Apache Arrow can increase the performance of data transfers between the R session and the Spark instance. In this second part, we will look...

Read more »

Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

August 31, 2019
By
Using Spark from R for performance with arbitrary code – Part 1 – Spark SQL translation, custom functions, and Arrow

Introduction Apache Spark is a popular open-source analytics engine for big data processing and thanks to the sparklyr and SparkR packages, the power of Spark is also available to R users. This series of articles will attempt to provide practical insights into using the sparklyr interface to gain the benefits of Apache Spark while still retaining the ability to use R...

Read more »

Posts

August 10, 2019
By

. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. To leave a comment for the author, please follow the link and comment on their blog:...

Read more »

Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

August 10, 2019
By
Using parallelization, multiple git repositories and setting permissions when automating R applications with Jenkins

Introduction In the previous post, we focused on setting up declarative Jenkins pipelines with emphasis on parametrizing builds and using environment variables across pipeline stages. In this post, we look at various tips that can be useful when automating R application testing and continuous integration, with regards to orchestrating parallelization, combining sources from multiple git repositories and ensuring proper access right...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)