Coding algorithms in R for models written in Stan

September 27, 2019
By
Coding algorithms in R for models written in Stan

Hi all, On top of recommending the excellent autobiography of Stanislaw Ulam, this post is about using the software Stan, but not directly to perform inference, instead to obtain R functions to evaluate a target’s probability density function and its gradient. With which, one can implement custom methods, while still benefiting from the great work

Read more »

Mapping the Underlying Social Structure of Reddit

September 27, 2019
By
Mapping the Underlying Social Structure of Reddit

Reddit is a popular website for opinion sharing and news aggregation. The site consists of thousands of user-made forums, called subreddits, which cover a broad range of subjects, including politics, sports, technology, personal hobbies, and self-improvement. Given that most Reddit users contribute to multiple subreddits, one might think of Reddit as being organized into many overlapping communities. Moreover, one...

Read more »

100% Stacked Chicklets

September 27, 2019
By
100% Stacked Chicklets

I posted a visualization of email safety status (a.k.a. DMARC) of the Fortune 500 (2017 list) the other day on Twitter and received this spiffy request from @MarkAltosaar: Would you be willing to add the R code used to produce this to your vignette for ggchicklet? I would love to see how you arranged the... Continue reading →

Read more »

Handling dates and times in R: a free online course

September 27, 2019
By

If you ever need to work with data involving dates, times or durations in R, then take a look at this free course on LinkedIn Learning presented by Mark Niemann-Ross: R Programming in Data Science: Dates and Times. Here's the course overview from the introductory video: When did Mount St Helens last erupt? How many birds migrated south this...

Read more »

Gold-Mining Week 4 (2019)

September 27, 2019
By

Welcome to the 2019 Fantasy Football Season! Week 4 Gold Mining and Fantasy Football Projection Roundup now available. The post Gold-Mining Week 4 (2019) appeared first on Fantasy Football Analytics.

Read more »

101 Data Science Interview Questions, Answers, and Key Concepts

September 27, 2019
By
101 Data Science Interview Questions, Answers, and Key Concepts

Interviews are difficult for most people. You don't want to mess up during it, but you always think of a better response after-the-fact. Here are 101 data science interview questions with responses and suggestions from large tech companies like Amazon, Google, and Microsoft.

Read more »

Why Do We Plot Predictions on the x-axis?

September 27, 2019
By
Why Do We Plot Predictions on the x-axis?

When studying regression models, One of the first diagnostic plots most students learn is to plot residuals versus the model’s predictions (that is, with the predictions on the x-axis). Here’s a basic example. # build an "ideal" linear process. set.seed(34524) N = 100 x1 = runif(N) x2 = runif(N) noise = 0.25*rnorm(N) y = x1 … Continue reading Why...

Read more »

Conference abstract bi-grams – FOSS4GUK

September 27, 2019
By
Conference abstract bi-grams – FOSS4GUK

I helped run a conference last week. As part of this I produced a wordcloud from the conference abstracts, although pretty it could have been more informative of the conference content. This blog post shows you how to make a network of conference bi-grams.

Read more »

#FunDataFriday – Watson Studio

September 26, 2019
By
#FunDataFriday – Watson Studio

Watson Studio - An easy and free hosted data science platform for R, Python and a wide range of AI, BI and ETL style activities.

Read more »

More exploratory plots with ggplot2 and purrr: Adding conditional elements

More exploratory plots with ggplot2 and purrr: Adding conditional elements

This summer I was asked to collaborate on an analysis project with many response variables. As usual, I planned on automating my initial graphical data exploration through the use of functions and purrr::map() as I’ve written about previously. However, this particular project was a follow-up to a previous analysis. In the original analysis, different variables were analyzed on different scales....

Read more »

Updates to the rOpenSci image suite: magick, tesseract, and av

Updates to the rOpenSci image suite: magick, tesseract, and av

Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features. Magick 2.2 The magick package is one of the most powerful packages for image processing in R. It interfaces...

Read more »

Four Reasons to Apply Early to Data Science Bootcamps

September 26, 2019
By
Four Reasons to Apply Early to Data Science Bootcamps

There are some important benefits to applying early. Here are just a few reasons why you might want to reconsider procrastinating on your application.

Read more »

Illuminating the Illuminated: A First Look at the Voynich Manuscript

September 26, 2019
By
Illuminating the Illuminated: A First Look at the Voynich Manuscript

The Voynich Manuscript While the world abounds with strange phenomena ripe for analysis in their raw state, there is a peculiar pleasure in scrutinising arcane information curated and obscured by the human mind. The Voynich Manuscript is one of the most well-known and studied volumes of occult knowledge. The book’s...

Read more »

How to Prepare Data

September 26, 2019
By

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. For an example: consider the … Continue reading How...

Read more »

Multi-Armed Bandits as an A/B Testing Solution

September 26, 2019
By
Multi-Armed Bandits as an A/B Testing Solution

These days, most people are familiar with the concept of A/B testing. This is one of the most common ways to make advertising decisions, particularly in online marketing. In an A/B test, the customer base is divided into two or more groups, each of which is...

Read more »

Debuting in a VFL/AFL Grand Final is rare

September 26, 2019
By

When Marlion Pickett runs onto the M.C.G for Richmond in the AFL Grand Final this Saturday, he’ll be only the sixth player in 124 finals to debut on the big day. The sole purpose of this blog post is to illustrate how incredibly easy it is to figure this out, thanks to the dplyr and … Continue reading Debuting...

Read more »

Spatial networks in R with sf and tidygraph

Spatial networks in R with sf and tidygraph

Spatial networks in R with sf and tidygraph Lucas van der Meer, Robin Lovelace & Lorena Abad September 26, 2019 Introduction Street networks, shipping routes, telecommunication lines, river bassins. All examples of spatial networks: organized sys...

Read more »

August 2019: “Top 40” R packages

September 25, 2019
By
August 2019: “Top 40” R packages

Two hundred and twenty-seven new packages made it to CRAN in August. Quite a few were devoted to medical or genomic applications, and this is reflected in my “Top 40” selections, listed below in nine categories: Computational Methods, Data, Genomics, Machine Learning, Medicine and Pharma, Statistics, Time Series, Utilities, and Visualization. Computational Methods fmcmc v0.2-0: Provides a flexible Markov Chain Monte...

Read more »

Multiple imputation support in Finalfit

September 25, 2019
By
Multiple imputation support in Finalfit

We are using multiple imputation more frequently to “fill in” missing data in clinical datasets. Multiple datasets are created, models run, and results pooled so conclusions can be drawn. We’ve put some improvements into Finalfit on GitHub to make it easier to use with the mice package. These will go to CRAN soon but not … Continue reading "Multiple...

Read more »

EARL London review

September 25, 2019
By
EARL London review

We hope you enjoyed EARL London as much as we did! We’re just putting the finishing touches on our highlights recap, but until then you can view the presentations we have available here, and all the photos here. We’re so proud of the incredible speakers at this year’s EARL Conference – from tackling human trafficking, to benefitting the NHS,...

Read more »

How to build analytics platforms – Part 5: Data visualization and reliable results

September 25, 2019
By
How to build analytics platforms – Part 5: Data visualization and reliable results

What does a modern analytics platform need to offer companies real added value? Contradiction and connection at the same time: Information content vs. simple presentation A meaningful result or the answer to important questions is not everything. Making data and results comprehensible is just as important for companies as target-oriented analysis1. The more complex a

Read more »

Time series forecasting with random forest

September 25, 2019
By
Time series forecasting with random forest

This blog post looks at how we can improve predictive accuracy by combining forecasts from different models. Der Beitrag Time series forecasting with random forest erschien zuerst auf STATWORX.

Read more »

Preparing Data for Supervised Classification

September 24, 2019
By

Nina Zumel has been polishing up new vtreat for Python documentation and tutorials. They are coming out so good that I find to be fair to the R community I must start to back-port this new documentation to vtreat for R. vtreat is a package for systematically preparing data for supervised machine learning tasks such … Continue reading Preparing...

Read more »

new paper: “The metaRbolomics Toolbox in Bioconductor and beyond”

September 24, 2019
By
new paper: “The metaRbolomics Toolbox in Bioconductor and beyond”

Forget about Python being the prime data analysis platform: there are plenty of alternatives and R has been one of them. With CRAN, rOpenSci, Bioconductor (doi:10.1186/gb-2004-5-10-r80) the platform has three efforts where you can publish your R work. I think of them as scholarly journals: the peer review is strong with them. Anyways, over the years I did my...

Read more »

R trainings in Frankfurt!

September 24, 2019
By
R trainings in Frankfurt!

R is one of the leading programming languages for data analysis. In November 2019 we bring our popular data science trainings to Frankfurt! Introduction to R | 12 – 13 November 2019 The course is intended as an introduction to R and its basic functionalities and facilitates the introduction to R with practical tips and

Read more »

Super Solutions for Shiny Architecture 1 of 5: Using Session Data

September 24, 2019
By
Super Solutions for Shiny Architecture 1 of 5:  Using Session Data

TL;DR Learn how to use the session argument as a global list for passing parameters between the modules in advanced Shiny apps to simplify the objects’ flow in code. Session can help you organize the app content and simplify the objects flow logic. It is faster than managing all of the dependencies between modules manually.  Article Super Solutions for...

Read more »

RcppAnnoy 0.0.13

September 23, 2019
By
RcppAnnoy 0.0.13

A new release of RcppAnnoy is now on CRAN. RcppAnnoy is the Rcpp-based R integration of the nifty Annoy library by Erik Bernhardsson. Annoy is a small and lightweight C++ template header library for very fast approximate nearest neighbours—origina...

Read more »

DFIR Redefined Part 3: visNetwork for Network Data

September 23, 2019
By
DFIR Redefined Part 3: visNetwork for Network Data

In keeping with pending presentations for the Secure Iowa Conference and (ISC)2 Security Congress, I’m continuing the DFIR Redefined: Deeper Functionality for Investigators with R series (see Part 1 and Part 2). Incident responders and investigators, faced with an inundation of data and ever-evolving threat vectors, require skills enhancements and analytics optimization. DFIR Redefined is intended to explore such...

Read more »

Drake, Docker, and Gitlab-CI

September 23, 2019
By

For a number of reasons I’ve been trying out GitLab as a replacement for for both GitHub and various continuous integration systems, and have been exploring configurations useful for model-fitting pipelines. I turned one of these into an example repository that shows how to use GitLab together with the Rocker Docker images and the drake build system to reproducibly run a project...

Read more »

Search R-bloggers

Sponsors