R⁶ — Reticulating Parquet Files

August 1, 2017
By

The reticulate package provides a very clean & concise interface bridge between R and Python which makes it handy to work with modules that have yet to be ported to R (going native is always better when you can do it). This post shows how to use reticulate to create parquet files directly from R... Continue reading →

Read more »

Let’s Talk Drawdowns (And Affiliates)

August 1, 2017
By
Let’s Talk Drawdowns (And Affiliates)

This post will be directed towards those newer in investing, with an explanation of drawdowns–in my opinion, a simple and … Continue reading →

Read more »

Showing Some Respect for Data Munging

August 1, 2017
By
Showing Some Respect for Data Munging

In this post, I'd like to focus on data munging, e.g. the process of acquiring and arranging data (typically in a tidy manner) prior to data analysis. It's common knowledge that data scientists spend an enormous amount of time munging data, but data analysis, modeling, and visualization get most of the attention at presentations, on blogs and in the...

Read more »

Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-5)

August 1, 2017
By
Hacking statistics or: How I Learned to Stop Worrying About Calculus and Love Stats Exercises (Part-5)

Statistics are often taught in school by and for people who like Mathematics. As a consequence, in those class emphasis is put on leaning equations, solving calculus problems and creating mathematics models instead of building an intuition for probabilistic problems. But, if you read this, you know a bit of R programming and have access Related exercise sets: Nonparametric Tests...

Read more »

Building a website with pkgdown: a short guide

August 1, 2017
By
Building a website with pkgdown: a short guide

As promised in my last post, here is a short guide with some tips and tricks for building a documentation website for an R package using pkgdown.In the end, this guide ended up way longer than I was expecting, but I hope you'll find it useful, although it often replicates information already available in pkgdown documentation !Prerequisites To build a website using pkgdown, all you need...

Read more »

What analysis programs drive conservation science?

July 31, 2017
By
What analysis programs drive conservation science?

What analysis programs drive conservation science? With the International Congress for Conservation Biology on at the end of July I was wondering, what analysis programs are supporting conservation science? And, what programs support spatial analysis ...

Read more »

How to use H2O with R on HDInsight

July 31, 2017
By

H2O.ai is an open-source AI platform that provides a number of machine-learning algorithms that run on the Spark distributed computing framework. Azure HDInsight is Microsoft's fully-managed Apache Hadoop platform in the cloud, which makes it easy to spin up and manage Azure clusters of any size. It's also easy to to run H2O on HDInsight: H2O AI Platform is...

Read more »

Counterfactual estimation on nonstationary data, be careful!!!

July 31, 2017
By
Counterfactual estimation on nonstationary data, be careful!!!

By Gabriel Vasconcelos In a recent paper that can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the … Continue reading →

Read more »

15 Jobs for R users (2017-07-31) – from all over the world

July 31, 2017
By
15 Jobs for R users (2017-07-31) – from all over the world

To post your R job on the next post Just visit this link and post a new R job to the R community. You can post a job for free (and there are also “featured job” options available for extra exposure). Current R jobs Job seekers: please follow the links below to learn more and apply for your R job of interest: Featured Jobs Freelance Data Scientists...

Read more »

Machine Learning Explained: Dimensionality Reduction

July 31, 2017
By
Machine Learning Explained: Dimensionality Reduction

Dealing with a lot of dimensions can be painful for machine learning algorithms. High dimensionality will increase the computational complexity, increase the risk of overfitting (as your algorithm has more degrees of freedom) and the sparsity of the data will grow. Hence, dimensionality reduction will project the data in a space with less dimension to The post Machine Learning...

Read more »

Google Vision API in R – RoogleVision

July 31, 2017
By
Google Vision API in R – RoogleVision

Using the Google Vision API in R Utilizing RoogleVision After doing my post last month on OpenCV and face detection, I started looking into other algorithms used for pattern detection in images. As it turns out, Google has done a phenomenal job with their Vision API. It’s absolutely incredible the amount of information it can

Read more »

Upcoming Talk at the Bay Area R Users Group (BARUG)

July 31, 2017
By
Upcoming Talk at the Bay Area R Users Group (BARUG)

Next Tuesday (August 8) I will be giving a talk at the Bay Area R Users Group (BARUG). The talk is titled Beyond Popularity: Monetizing R... The post Upcoming Talk at the Bay Area R Users Group (BARUG) appeared first on AriLamstein.com.

Read more »

sparklyr 0.6

July 30, 2017
By

We’re excited to announce a new release of the sparklyr package, available in CRAN today! sparklyr 0.6 introduces new features to: Distribute R computations using spark_apply() to execute arbitrary R code across your Spark cluster. You can now use all of your favorite R packages and functions in a distributed context. Connect to External Data Sources using spark_read_source(), spark_write_source(), spark_read_jdbc() and...

Read more »

Data visualization with googleVis exercises part 9

July 30, 2017
By
Data visualization with googleVis exercises part 9

Histogram & Calendar chart This is part 9 of our series and we are going to explore the features of two interesting types of charts that googleVis provides like histogram and calendar charts. Read the examples below to understand the logic of what we are going to do and then test yous skills with the Related exercise sets: Data Visualization...

Read more »

Matching, Optimal Transport and Statistical Tests

July 30, 2017
By
Matching, Optimal Transport and Statistical Tests

To explain the “optimal transport” problem, we usually start with Gaspard Monge’s “Mémoire sur la théorie des déblais et des remblais“, where the the problem of transporting a given distribution of matter (a pile of sand for instance) into another (an excavation for instance). This problem is usually formulated using distributions, and we seek the “optimal” transport from one...

Read more »

Scripting for data analysis (with R)

July 30, 2017
By
Scripting for data analysis (with R)

Course materials (GitHub) This was a PhD course given in the spring of 2017 at Linköping University. The course was organised by the graduate school Forum scientium and was aimed at people who might be interested in using R for data analysis. The materials developed from a part of a previous PhD course from a

Read more »

Understanding Overhead Issues in Parallel Computation

July 29, 2017
By
Understanding Overhead Issues in Parallel Computation

In my talk at useR! earlier this month, I emphasized the fact that a major impediment to obtaining good speed from parallelizing an algorithm is systems overhead of various kinds, including: Contention for memory/network. Bandwidth limits — CPU/memory, CPU/network, CPU/GPU. Cache coherency problems. Contention for I/O ports. OS and/or R limits on number of sockets … Continue reading Understanding...

Read more »

Memorable dataviz with the R program, talk awarded people’s choice prize

July 29, 2017
By

“Memorable dataviz with the R program” awarded people’s choice prize For the past two years Dr Nick Hamilton has invited me to give a talk on creating data visuals with the R program at the wonderful UQ Winterschool in Bioinformatics. This year...

Read more »

Tidy Time Series Analysis, Part 3: The Rolling Correlation

Tidy Time Series Analysis, Part 3: The Rolling Correlation

In the third part in a series on Tidy Time Series Analysis, we’ll use the runCor function from TTR to investigate rolling (dynamic) correlations. We’ll again use tidyquant to investigate CRAN downloads. This time we’ll also get some help from the...

Read more »

Forecasting workshop in Perth

July 29, 2017
By

On 26-28 September 2017, I will be running my 3-day workshop in Perth on “Forecasting: principles and practice” based on my book of the same name. Topics to be covered include seasonality and trends, exponential smoothing, ARIMA modelling, dynamic regression and state space models, as well as forecast accuracy methods and forecast evaluation techniques such as cross-validation. Workshop participants are expected...

Read more »

More documentation for Win-Vector R packages

July 29, 2017
By
More documentation for Win-Vector R packages

The Win-Vector public R packages now all have new pkgdown documentation sites! (And, a thank-you to Hadley Wickham for developing the pkgdown tool.) Please check them out (hint: vtreat is our favorite). The package sites: cdata replyr seplyr sigr vtre...

Read more »

Updated overbought/oversold plot function

July 29, 2017
By
Updated overbought/oversold plot function

A good six years ago I blogged about plotOBOS() which charts a moving average (from one of several available variants) along with shaded standard deviation bands. That post has a bit more background on the why/how and motivation, but as a teaser here is the resulting chart of the SP500 index (with ticker ^GSCP):   The code uses a few standard...

Read more »

R Markdown exercises part 1

July 29, 2017
By
R Markdown exercises part 1

INTRODUCTION R Markdown is one of the most popular data science tools and is used to save and execute code, create exceptional reports whice are easily shareable. The documents that R Markdown provides are fully reproducible and support a wide variety of static and dynamic output formats. Using markdown syntax, which provides an easy way Related exercise sets: How to...

Read more »

Stan Weekly Roundup, 28 July 2017

July 28, 2017
By

Here’s the roundup for this past week. Michael Betancourt added case studies for methodology in both Python and R, based on the work he did getting the ML meetup together: RStan workflow PyStan workflow Michael Betancourt, along with Mitzi Morris, Sean Talts, and Jonah Gabry taught the women in ML workshop at Viacom in NYC The post Stan Weekly...

Read more »

Learn parallel programming in R with these exercises for "foreach"

July 28, 2017
By

The foreach package provides a simple looping construct for R: the foreach function, which you may be familiar with from other languages like Javascript or C#. It's basically a function-based version of a "for" loop. But what makes foreach useful isn't iteration: it's the way it makes it easy to run those iterations in parallel, and save time on...

Read more »

Hacking Strings with stringi

July 28, 2017
By
Hacking Strings with stringi

In the last set of exercises, we worked on the basic concepts of string manipulation with stringr. In this one we will go further into hacking strings universe and learn how to use stringi package.Note that stringi acts as a backend of stringr but have many more useful string manipulation functions compared to stringr and Related exercise sets: Hacking strings...

Read more »

Analyzing “Wait-Delay” Settings in Common Crawl robots.txt Data with R

July 28, 2017
By
Analyzing “Wait-Delay” Settings in Common Crawl robots.txt Data with R

One of my tweets that referenced an excellent post about the ethics of web scraping garnered some interest: Apologies for a Medium link but if you do ANY web scraping, you need to read this #rstats // Ethics in Web Scraping https://t.co/y5YxvzB8Fd— boB Rudis (@hrbrmstr) July 26, 2017 If you load that up that tweet... Continue reading →

Read more »

simmer 3.6.3

July 28, 2017
By

The third update of the 3.6.x release of simmer, the Discrete-Event Simulator for R, is on CRAN. First of all and once again, I must thank Duncan Garmonsway (@nacnudus) for writing a new vignette: “The Bank Tutorial: Part II”. Among various fixes and performance improvements, this release provides a way of knowing the progress of a simulation.… Continuar leyendo simmer 3.6.3...

Read more »

Joy Division, Population Surfaces and Pioneering Electronic Cartography

July 28, 2017
By
Joy Division, Population Surfaces and Pioneering Electronic Cartography

There has been a resurgence of interest in data visualizations inspired by Joy Division’s Unknown Pleasures album cover. These so-called “Joy Plots” are easier to create thanks to the development of the “ggjoy” R package and also some nice code posted using D3. I produced a global population map (details here) using a similar technique in 2013 and since

Read more »

Search R-bloggers

Sponsors

Mango solutions









Zero Inflated Models and Generalized Linear Mixed Models with R

r-brain.io



Quantide: statistical consulting and training

ODSC2

ODSC1

datasociety

http://www.eoda.de





CRC R books series







Six Sigma Online Training



statcon.de

mljar.com

Contact us if you wish to help support R-bloggers, and place your banner here.