Post-statistics: Lies, damned lies and data science patents

August 5, 2017
By
Post-statistics: Lies, damned lies and data science patents

US Patent (Wikipedia) Statistics is so important field in our daily lives nowadays, the emerging field of 50 years old data science that is applied to almost every human activity now, or post-statistics, a kind of post-rock,  fusing operations research, data mining, software and performance engineering and of course multitude fields of statistics to machine learning. Even though, the reputation of statistics...

Read more »

Multiple imputation for continuous and categorical data

August 5, 2017
By
Multiple imputation for continuous and categorical data

“The idea of imputation is both seductive and dangerous” (R.J.A Little & D.B. Rubin). Indeed, a predicted value is considered as an observed one and the uncertainty of prediction is ignored, conducting to bad inferences with missing values. That is why Multiple Imputation is recommended. The missMDA package quickly generates several imputed datasets with quantitative variables and/or categorical

Read more »

Years as coloured bars

August 5, 2017
By
Years as coloured bars

I keep seeing years represented by coloured bars. First it was that demographic tsunami chart. Then there are examples like the one on the right, which came up in a web search today. I even saw one (whispers) at work today. I get what they are trying to do – illustrate trends within categories over … Continue reading Years...

Read more »

Painting with Data

August 4, 2017
By
Painting with Data

The accidental aRt tumblr (mentioned here a few years ago) continues to provide a steady stream of images that wouldn't look out of place in a modern art gallery, but which in fact are data visualizations (mostly attempted in R), gone wrong. (Here's a typical recent entry.) But now, Giora Simchoni has taken this concept to the next level...

Read more »

Stan Weekly Roundup, 3 August 2017

August 4, 2017
By

You’d almost think we were Europeans based on how much we’ve slowed down over the summer. Imad Ali, Jonah Gabry, and Ben Goodrich finished the online pkgdown-style documentation for all the Stan Development Team supported R packages. They can be accessed via http://mc-stan.org/(package_name), e.g., rstan: http://mc-stan.org/rstan rstanarm: http://mc-stan.org/rstanarm shinystan: http://mc-stan.org/shinytan loo: http://mc-stan.org/loo bayesplot: http://mc-stan.org/bayesplot The The post Stan Weekly...

Read more »

Let’s Have Some Sympathy For The Part-time R User

August 4, 2017
By
Let’s Have Some Sympathy For The Part-time R User

When I started writing about methods for better "parametric programming" interfaces for dplyr for R dplyr users in December of 2016 I encountered three divisions in the audience: dplyr users who had such a need, and wanted such extensions. dplyr users who did not have such a need ("we always know the column names"). dplyr … Continue reading Let’s...

Read more »

R Markdown exercises part 2

August 4, 2017
By
R Markdown exercises part 2

INTRODUCTION R Markdown is one of the most popular data science tools and is used to save and execute code, create exceptional reports whice are easily shareable. The documents that R Markdown provides are fully reproducible and support a wide variety of static and dynamic output formats. Using markdown syntax, which provides an easy way Related exercise sets: How to...

Read more »

The package hdm for double selection inference with a simple example

August 4, 2017
By
The package hdm for double selection inference with a simple example

By Gabriel Vasconcelos In a late post I discussed the Double Selection (DS), a procedure for inference after selecting controls. I showed an example of the consequences of ignoring the variable selection step discussed in an article by Belloni, Chernozhukov … Continue reading →

Read more »

Sampling distribution of weighted Gini coefficient by @ellis2013nz

August 4, 2017
By
Sampling distribution of weighted Gini coefficient by @ellis2013nz

Calculating Gini coefficients Stats NZ release a series of working papers, and a recent one caught my eye because of my interest in inequality statistics (disclaimer - I am working at Stats NZ at the moment, but on completely different things). Workin...

Read more »

Saving High-Resolution ggplots: How to Preserve Semi-Transparency

August 4, 2017
By
Saving High-Resolution ggplots: How to Preserve Semi-Transparency

This article describes solutions for preserving semi-transparency when saving a ggplot2-based graphs into a high quality postscript (.eps) file format. Contents: Create a ggplot with semi-transparent color Save ggplots with semi-transparent colors Use cairo-based postscript graphics devices Export to powerpoint Create a ggplot with semi-transparent color To illustrate this, we start by creating ggplot2-based survival curves using the function ggsurvplot() in the...

Read more »

Clustering with FactoMineR

August 4, 2017
By
Clustering with FactoMineR

Here is a course with videos that present Hierarchical clustering and its complementary with principal component methods. Four videos present a course on clustering, how to determine the number of clusters, how to describe the clusters and how to perform the clustering when there are lots of individuals and/or lots of variables. Then  you will

Read more »

Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis

August 4, 2017
By
Dallas Animal Services: Shelter Intake Types vs. Outcomes Analysis

Thanks to Dallas OpenData anyone has access to the city animal shelter records.  If you lost or found a pet it could be that he or she spent some time in a shelter - I personally took lost dogs there. It's unfortunate but every year tens of thousands of animals find their way to shelters with significant fraction never finding way out. City...

Read more »

R for System Adminstration

August 3, 2017
By

Just getting back from the most fun meetup I have been to in quite some time: episode 23 (by their count) of Open Source Open Mic hosted by Matt Godbolt and Joe Walnes here in Chicago. Nothing but a sequence of lightning talks. Plus beer and pizza. S...

Read more »

An Iterative Approach to Data Science

August 3, 2017
By
An Iterative Approach to Data Science

It is the nature of boot camp.  We drink from the firehose because we only have 12 weeks to learn what university programs would spread out The post An Iterative Approach to Data Science appeared first on NYC Data Science Academy Blog.

Read more »

How we voted in South Carolina

August 3, 2017
By
How we voted in South Carolina

Purpose This post seeks to explore how Greenville, SC and surrounding areas voted in the 2016 election. It also demonstrates how to retrieve data from the Data.World site. To retrieve data from this site using the tools in this post, you have to create an account (easy to do if you have a Facebook, Twitter, or Github account). You...

Read more »

Initiating development of a chatbot with plumber and ngrok

August 3, 2017
By
Initiating development of a chatbot with plumber and ngrok

Chatbots have become a rage since some time now, with firms from various sectors investing in such bots that would reduce or remove the need of employing a call centre whilst maintaining similar levels of efficiency, if not greater. A notable example i...

Read more »

Passing user-supplied C++ functions with RcppXPtrUtils

August 3, 2017
By
Passing user-supplied C++ functions with RcppXPtrUtils

Sitting on top of R’s external pointers, the RcppXPtr class provides a powerful and generic framework for Passing user-supplied C++ functions to a C++ backend. This technique is exploited in the RcppDE package, an efficient C++ based implementation of the DEoptim package that accepts optimisation objectives as both R and compiled functions (see demo("compiled", "RcppDE") for further details). This solution has a couple of issues though: Some repetitive scaffolding...

Read more »

Impact of the conservation optimism hashtag

August 3, 2017
By
Impact of the conservation optimism hashtag

Impact of the conservation optimism hashtag The hashtag #conservationoptimism became popular during the recent International Congress for Conservation Biology symposium. Michael Burgass asked me what its twitter impact was, so here is a quick analysi...

Read more »

Text categorization with deep learning, in R

August 3, 2017
By

Given a short review of a product, like "I couldn't put it down!", can you predict what the product is? In that case it's pretty easy — it's for a book — but this general problem of text categorization comes up in a lot of natural language analysis problems. In his talk at useR!2017 (shown below), Microsoft data scientist...

Read more »

Numerical Differentiation with Finite Differences in R

August 3, 2017
By
Numerical Differentiation with Finite Differences in R

Part 1 of 7 in the series Numerical AnalysisNumerical differentiation is a method of approximating the derivative of a function at particular value . Often, particularly in physics and engineering, a function may be too complicated to merit the work necessary to find the exact derivative, or the function itself... The post Numerical Differentiation with Finite Differences in R appeared...

Read more »

Parallel Computing Exercises: Snow and Rmpi (Part-3)

August 3, 2017
By
Parallel Computing Exercises: Snow and Rmpi (Part-3)

The foreach statement, which was introduced in the previous set of exercises of this series, can work with various parallel backends. This set allows to train in working with backends provided by the snow and Rmpi packages (on a single machine with multiple CPUs). The name of the former package stands for “Simple Network of Related exercise sets: Parallel Computing...

Read more »

Rborist version 0-1.8 available from CRAN

August 3, 2017
By

Version 0-1.8 of the Rborist implementation of the Random Forest (TM) algorithm is now available from CRAN. Although most changes involve refactoring to accommodate future updates, there are several bug fixes and enhancements worth mentioning. New option maxLeaf allows a limit to be set on the number of terminal nodes (i.e., leaves) in each trained tree. In order to not to introduce behavior dependent upon...

Read more »

Generating Quadratic Primes: Euler Problem 27

August 2, 2017
By

Solution to Euler Problem 27 using the R language. Find the product of the coefficients for the quadratic expression that produces the most primes. Continue reading → The post Generating Quadratic Primes: Euler Problem 27 appeared first on The Devil is in the Data.

Read more »

Fun data: open data that is fun to analyse

August 2, 2017
By
Fun data: open data that is fun to analyse

Joe Russell, Adnan Fiaz Jeremy Singer-Vine sends out a newsletter every week where he highlights a number of interesting open datasets (you can explore all the datasets here). At Mango we are all for open data so we thought we would also share some of the open datasets we think are fun to explore. Open Food Facts Food prices North Korea Missile Tests Flight...

Read more »

RStudio Connect v1.5.4 – Now Supporting Plumber!

August 2, 2017
By
RStudio Connect v1.5.4 – Now Supporting Plumber!

We’re thrilled to announce support for hosting Plumber APIs in RStudio Connect: version 1.5.4. Plumber is an R package that allows you to define web APIs by adding special annotations to your existing R code – allowing you to make your R functions accessible to other systems. Below you can see the auto-generated “swagger” interface for a web API written...

Read more »

Applications in energy, retail and shipping

August 2, 2017
By
Applications in energy, retail and shipping

The Solutions section of the Cortana Intelligence Gallery provides more than two dozen working examples of applying machine learning, data science and artificial intelligence to real-world problems. Each solution provides sample data, scripts for model training and evaluation, and reporting of predictions. You can deploy a complete stack in Azure to implement the solution with the click of a...

Read more »

What makes an R talk popular? Scraping useR2017 attendance information to find out!

August 2, 2017
By
What makes an R talk popular? Scraping useR2017 attendance information to find out!

Click here to explore the data for yourself First off — I’ll admit that was my poor attempt at a click-bait title. But if you’re still reading the next paragraph, that means it was successful! Table of contents Background Am I...

Read more »

Data wrangling : Transforming (3/3)

August 2, 2017
By
Data wrangling : Transforming (3/3)

Data wrangling is a task of great importance in data analysis. Data wrangling, is the process of importing, cleaning and transforming raw data into actionable information for analysis. It is a time-consuming process which is estimated to take about 60-80% of analyst’s time. In this series we will go through this process. It will be Related exercise sets: Data table...

Read more »

[R] Kenntnis-Tage 2017: Register now and benefit from the Summer Special

August 2, 2017
By
[R] Kenntnis-Tage 2017: Register now and benefit from the Summer Special

On November 8 and 9, Kassel will once more become the meeting point for the German-speaking R community. From the usage of R in the automotive industry to risk analysis, from data mining with caret to R Markdown: The Kenntnis-Tage 2017 are again standing for an exciting program – always with a focus on … „ Kenntnis-Tage 2017:...

Read more »

Search R-bloggers

Sponsors

Mango solutions







Zero Inflated Models and Generalized Linear Mixed Models with R

r-brain.io



Quantide: statistical consulting and training

ODSC2

ODSC1

datasociety

http://www.eoda.de





CRC R books series







Six Sigma Online Training



statcon.de

mljar.com

Contact us if you wish to help support R-bloggers, and place your banner here.