Analyzing Github pull requests with Neural Embeddings, in R

July 24, 2017
By

At the useR!2017 conference earlier this month, my colleague Ali Zaidi gave a presentation on using Neural Embeddings to analyze GitHub pull request comments (processed using the tidy text framework). The data analysis was done using R and distributed on Spark, and the resulting neural network trained using the Microsoft Cognitive Toolkit. You can see the slides here, and...

Read more »

Are computers needed to teach Data Science?

July 24, 2017
By

One of the many nice things about summer is the time and space it allows for blogging. And, after a very stimulating SRTL conference (Statistics Reasoning, Teaching and Learning)...

Read more »

Hacking Highcharter: observations per group in boxplots

July 24, 2017
By
Hacking Highcharter: observations per group in boxplots

Highcharts has long been a favourite visualisation library of mine, and I’ve written before about Highcharter, my preferred way to use Highcharts in R. Highcharter has a nice simple...

Read more »

Random Forests in R

July 24, 2017
By
Random Forests in R

Ensemble Learning is a type of Supervised Learning Technique in which the basic idea is to generate multiple Models on a training dataset and then simply combining(average) their Output...

Read more »

Stippling and TSP art in R: emulating StippleGen

July 24, 2017
By
Stippling and TSP art in R: emulating StippleGen

Stippling is the creation of a pattern simulating varying degrees of solidity or shading by using small dots (Wikipedia).StippleGen is a piece of software that renders images using stipple...

Read more »

Beneath the canvas

July 23, 2017
By
Beneath the canvas

Recently a blog post made its rounds on the internet describing how it is possible to speed up plot creation in ggplot2 by first creating a blank canvas and then later adding...

Read more »

Runtime vs. Success (Using IMDB)

July 23, 2017
By
Runtime vs. Success (Using IMDB)

The content in this blog comes from a shiny application proof of concept using IMDB movie data.  To view the application: IMDB Movie Data App on The post Runtime...

Read more »

Programming with dplyr by using dplyr

July 23, 2017
By

The title may seem tautological, but since the arrival of dplyr 0.7.x, there have been some efforts at using dplyr without actually using it that I can’t quite understand. The tidyverse has raised passions, for and against...

Read more »

Data Visualization with googleVis exercises part 8

July 23, 2017
By
Data Visualization with googleVis exercises part 8

Annotation & Sankey Charts In the eighth part of our series we are going to learn about the features of some interesting types of charts. More specifically we will...

Read more »

ggtern: Version 2.2.1 Released

July 22, 2017
By
ggtern: Version 2.2.1 Released

It has been a while since any kind of significant update has been released for the ggetern library, and the other day a minor update was submitted to CRAN,...

Read more »

RSiteCatalyst Version 1.4.13 Release Notes

July 22, 2017
By

This blog post will be fairly short, given the minor nature of the update. Several users complained about OAUTH2 authentication not working, which I didn’t know because I usually use...

Read more »

Tidy Time Series Analysis, Part 2: Rolling Functions

Tidy Time Series Analysis, Part 2: Rolling Functions

In the second part in a series on Tidy Time Series Analysis, we’ll again use tidyquant to investigate CRAN downloads this time focusing on Rolling Functions. If you haven’t...

Read more »

Tutorial: Using seplyr to Program Over dplyr

July 22, 2017
By

seplyr is an R package that makes it easy to program over dplyr 0.7.*. To illustrate this we will work an example. Suppose you had worked out a dplyr...

Read more »

What’s in our internal chaimagic package at work

July 21, 2017
By
What’s in our internal chaimagic package at work

At my day job I’m a data manager and statistician for an epidemiology project called CHAI lead by Cathryn Tonne. CHAI means “Cardio-vascular health effects of air pollution in...

Read more »

Stan Weekly Roundup, 21 July 2017

July 21, 2017
By

It was another productive week in Stan land. The big news is that Jonathan Auerbach reports that A team of Columbia students (mostly Andrew’s, including myself) recently won first...

Read more »

IEEE Spectrum 2017 Top Programming Languages

July 21, 2017
By
IEEE Spectrum 2017 Top Programming Languages

IEEE Spectrum has published its fourth annual ranking of of top programming languages, and the R language is again featured in the Top 10. This year R ranks at...

Read more »

How to create reports with R Markdown in RStudio

July 21, 2017
By
How to create reports with R Markdown in RStudio

Introduction R Markdown is one of the most popular data science tools and is used to save and execute code to create exceptional reports whice are easily shareable. The...

Read more »

Power analysis and sample size calculation for Agriculture

July 21, 2017
By
Power analysis and sample size calculation for Agriculture

Power analysis is extremely important in statistics since it allows us to calculate how many chances we have of obtaining realistic results. Sometimes researchers tend to underestimate this aspect...

Read more »

Inter-country inequality and the World Development Indicators by @ellis2013nz

July 21, 2017
By
Inter-country inequality and the World Development Indicators by @ellis2013nz

I recently read the high quality book Global Inequality by Branko Milanovic. When reading this sort of thing, I often find I can increase my engagement with a...

Read more »

Chart golf: the “demographic tsunami”

July 20, 2017
By
Chart golf: the “demographic tsunami”

“‘Demographic tsunami’ will keep Sydney, Melbourne property prices high” screams the headline. While the census showed Australia overall is aging, there’s been a noticeable lift in the number of...

Read more »

Surprising result when exploring Rcpp gallery

July 20, 2017
By
Surprising result when exploring Rcpp gallery

I’m starting to incorporate more Rcpp in my R work, and so decided to spend some time exploring the Rcpp Gallery. One example by John Merrill caught my eye. He provides a...

Read more »

How to make interactive maps with Census and local data in R

July 20, 2017
By
How to make interactive maps with Census and local data in R

So the goal here is to focus back on Greenville County and have even more granularity. I look at median house prices near Greenville and then overlay the park...

Read more »

Package bigstatsr: Statistics with matrices on disk (useR 2017)

July 20, 2017
By
Package bigstatsr: Statistics with matrices on disk (useR 2017)

In this post, I will talk about my package bigstatsr, which I’ve just presented in a lightning talk of 5 minutes at useR!2017. You can listen to me in...

Read more »

Visualizing Portfolio Volatility

July 20, 2017
By

This is the third post in our series on portfolio volatility, variance and standard deviation. If you want to start at the beginning with calculating portfolio volatility, have a...

Read more »

How to make maps with Census data in R

July 20, 2017
By
How to make maps with Census data in R

US Census Data The US Census collects a number of demographic measures and publishes aggregate data through its website. There are several ways to use Census data in R,...

Read more »

What analysis program do conservation scientists use?

July 20, 2017
By

International Congress for Conservation Biology: What analysis program do conservation scientists use? With the International Congress for Conservation Biology starting 23rd July I was wondering, what analysis programs are most...

Read more »

Quirks about running Rcpp on Windows through RStudio

July 20, 2017
By
Quirks about running Rcpp on Windows through RStudio

Quirks about running Rcpp on Windows through RStudio This is a quick note about some tribulations I had running Rcpp (v. 0.12.12) code through RStudio (v. 1.0.143) on a...

Read more »

Quickly Check your id Variables

July 20, 2017
By

Virtually every dataset has them; id variables that link a record to a subject and/or time point. Often one column, or a combination of columns, forms the unique id...

Read more »

Data Analysis for Life Sciences

July 20, 2017
By
Data Analysis for Life Sciences

Rafael Irizarry from the Harvard T.H. Chan School of Public Health has presented a number of courses on R and Biostatistics on EdX, and he recently also provided an...

Read more »

Search R-bloggers

Sponsors

Mango solutions





Zero Inflated Models and Generalized Linear Mixed Models with R

r-brain.io



Quantide: statistical consulting and training

ODSC1

ODSC2

datasociety

http://www.eoda.de







CRC R books series







Six Sigma Online Training



omictools

Contact us if you wish to help support R-bloggers, and place your banner here.