## Working with US Census Data in R

November 6, 2018
By

If you need data about the American populace, there's no source more canonical than the US Census Bureau. The bureau publishes a wide range of public sets, and not just from the main Census conducted every 10 years: there are more than 100 additional surveys and programs published as well. To help R users access this rich source of...

## In-database xgboost predictions with R

November 6, 2018
By

Moving predictive machine learning algorithms into large-scale production environments can present many challenges. For example, problems arise when attempting to calculate prediction probabilities (“scores”) for many thousands of subjects using many thousands of features located on remote databases. xgboost (docs), a popular algorithm for classification and regression, and the model of choice in many winning Kaggle competitions, is no...

## Using httr to Detect HTTP(s) Redirects

November 6, 2018
By

The Summary In this short note I will write about the httr package and my need to detect whether or not an HTTP request had been redirected or not - it turns out this is quite easy. Along the way I will also show how to access information of an HTTP-...

## R plus Magento 2 REST API revisited: part 1- authentication and universal search

November 6, 2018
By

I wrote a post about getting Magento 2 data to R using REST API last year. Now I provide more examples of use and a wrapper over API that you can re-use to get data from Magento 2 to R in a bit more convenient way. Prerequisites Magento 2 I… The post R plus Magento 2 REST API revisited:...

## Source and List: Organizing R Shiny Apps

November 6, 2018
By

Keeping R Shiny code organized can be a challenge. One method to organize your Shiny UI and Server code is to use a combination of R’s list and source functions. Another method to organize you’re Shiny code is through modularization techniques. Here though, we’re going concentrate on the list and source options. If you feel comfortable with Shiny and...

November 6, 2018
By

Earlier this year my colleague Steve Vaisey was converting code in some course notes from Stata to R. He asked me a question about tidily converting from long to wide format when you have multiple value columns. This is a little more awkward than it should be, and I’ve run into the issue several times since then. I’m writing...

## Data viz challenge: Recreating FiveThirtyEight’s ‘Deadest Names’ graphic with ggplot2

November 6, 2018
By

I’ve recently begun reading through the book Modern Data Science with R, by Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. It’s quite clear and informative. One of the things I especially appreciate about it is that I’m not finding the math to be too cumbersome. That is, even for someone like me, The post Data viz...

## R-bloggers weekly – top R posts from last week (2018-10-28 till 2018-11-03)

November 6, 2018
By

Most liked R posts from last week, sorted based on the number of likes they got on twitter, enjoy: From webscraping data to releasing it as an R package to share with the world: a full tutorial (129 likes) How to Create a Correlation Matrix in R (124 likes) Machine Learning Basics – Random Forest (119 likes) Running R...

## More on sigr

November 6, 2018
By

If you’ve read our previous R Tip on using sigr with linear models, you might have noticed that the lm() summary object does in fact carry the R-squared and F statistics, both in the printed form: model_lm

## Interpreting Linear Prediction Models

Although linear models are one of the simplest machine learning techniques, they are still a powerful tool for predictions. This is particularly due to the fact that linear models are especially easy to interpret. Here, I discuss the most important aspects when interpreting linear models by example of ordinary least-squares regression using the airquality data set. The airquality data set The...

## Adding different annotation to each facet in ggplot

November 6, 2018
By

Help! The same annotations go on every facet! (with thanks to a student for sending me her attempt). This is a question I get fairly often and the answer is not straightforward especially for those that are relatively new to R and ggplot2. In this post, I will show you how to add different annotations… Continue reading Adding different...

## Can we predict the crawling of the Google-Bot?

November 6, 2018
By

Logfile analysis allows website owners a deep insight about how Google and other search engines crawl their pages. How often does a bot come by a page, which pages are rarely or not at all visited by the bots? For which pages does the bot get errors? All these and more questions can be answered Der Beitrag Can we predict...

## xts 0.11-2 on CRAN

November 6, 2018
By

xts version 0.11-2 was published to CRAN yesterday. xts provides data structure and functions to work with time-indexed data.  This is a bug-fix release, with notable changes below: The xts method for shift.time() is now registered. Thanks to Philippe Verspeelt for the report and PR (#268, #273). An if-statement in the xts constructor will no longer try to...

## Hidden Markov Model example in r with the depmixS4 package

November 6, 2018
By

Recently I developed a solution using a Hidden Markov Model and was quickly asked to explain myself. What are they The post Hidden Markov Model example in r with the depmixS4 package appeared first on Daniel Oehm | Gradient Descending.

## Cluster Analysis – Part 1: Introduction

November 6, 2018
By

What is Cluster Analysis? Cluster analysis is a collective term for various algorithms to find group structures in data. The groups are called clusters and are usually not known a priori. In contrast, classification procedures assign the observations to already...

## Happy 10th Bday, Rcpp – and welcome release 1.0 !!

November 5, 2018
By

Ten years ago today I wrote the NEWS.Rd entry in this screenshot for the very first Rcpp_release: First Rcpp release So Happy Tenth Birthday, Rcpp !! It has been quite a ride. Nearly 1500 packages on CRAN, or about one in nine (!!), rely on Rcpp to...

## A knot of threads: from CSHL to LCG-UNAM to Aldo Barrientos to diversity scholarship opportunities

November 5, 2018
By

I can’t tell you how many times I’ve started to write this post in my mind since May 2018. Today I’m finally typing it on the computer. This will be a rather long post that ties in several threads. I’ll talk about Cold Spring Harbor’s Biology of Genomes conference and its relationship to my undergrad in Mexico. I’ll also...

## Online resources for teaching

November 5, 2018
By

In this session I will try to show some utilities present in the web. One of them will help us to execute R code from the web, using an online compiler, without installing any kind of software in our computers. The other one, it can help us to solve optimization problems by a graphics way. We can draw the...

## A knot of threads: from CSHL to LCG-UNAM to Aldo Barrientos to diversity scholarship opportunities

November 5, 2018
By

I can’t tell you how many times I’ve started to write this post in my mind since May 2018. Today I’m finally typing it on the computer. This will be a rather long post that ties in several threads. I’ll talk about Cold Spring Harbor’s Biology of Genomes conference and its relationship to my undergrad in Mexico. I’ll also...

## ‘How do neural nets learn?’ A step by step explanation using the H2O Deep Learning algorithm.

November 5, 2018
By

In my last blogpost about Random Forests I introduced the codecentric.ai Bootcamp. The next part I published was about Neural Networks and Deep Learning. Every video of our bootcamp will have example code and tasks to promote hands-on learning. While the practical parts of the bootcamp will be using Python, below you will find the English R version of...

## Causal mediation estimation measures the unobservable

November 5, 2018
By

I put together a series of demos for a group of epidemiology students who are studying causal mediation analysis. Since mediation analysis is not always so clear or intuitive, I thought, of course, that going through some examples of simulating data for this process could clarify things a bit. Quite often we are interested in understanding the relationship between an...

## Tesseract 4 is here! State of the art OCR in R!

Last week Google and friends released the new major version of their OCR system: Tesseract 4. This release builds upon 2+ years of hard work and has completely overhauled the internal OCR engine. From the tesseract wiki: Tesseract 4.0 includes a new neural network-based recognition engine that delivers significantly higher accuracy (on document images) than the previous versions, in return...

## NG "roll returns" – inflection point?

November 5, 2018
By

After more than a decade of consistent losses from rolling a long NG position, the cumulative return has been positive for the past 12 months.  This has only occurred briefly during the 'polar vortex' of early 2014 and during the 2008 commodities 'super cycle' peak. For years, long only positions in NG have experienced losses on average in the last...

## EARL Houston: Interview with Hadley Wickham

November 5, 2018
By

Can you tell us about your upcoming keynote at EARL and what the key take-home messages will be for delegates? I’m going to talk about functional programming which I think is one of the most important programming techniques used with R. It’s not something you need on day 1 as a data scientist but it gives you some really...

## Peter Bull discusses the importance of human-centered design in data science.

November 5, 2018
By

Hugo Bowne-Anderson, the host of DataFramed, the DataCamp podcast, recently interviewed Peter Bull, a data scientist for social good and co-founder of Driven Data. Here is the podcast link. Introducing Peter Bull Hugo: Hi there, Peter, and welcome to Data Framed. Peter: ...

## Beyond Univariate, Single-Sample Data with MCHT

November 5, 2018
By
$Beyond Univariate, Single-Sample Data with MCHT$

Introduction I’ve spent the past few weeks writing about MCHT, my new package for Monte Carlo and bootstrap hypothesis testing. After discussing how to use MCHT safely, I discussed how to use it for maximized Monte Carlo (MMC) testing, then bootstrap testing. One may think I’ve said all I want to say about the package,…Read more Beyond Univariate, Single-Sample...

## Explore Your Dataset in R

November 5, 2018
By

Simple exploratory data analysis (EDA) using some very easy one line commands in R.

## “A Guide to Working With Census Data in R” is now Complete!

November 5, 2018
By

Two weeks ago I mentioned that I was clearing my calendar until I finished writing A Guide to Working with Census Data in R. Today I’m... The post “A Guide to Working With Census Data in R” is now Complete! appeared first on AriLamstein.com.

## Suppressed data (left-censored counts) by @ellis2013nz

November 5, 2018
By

This is a post about dealing with tables that have been subject to cell suppression or perturbation for confidentiality reasons. While some of the data world is only just waking up to issues of confidentiality and privacy, national statistics offices ...