Sentiment analysis using tidy data principles at DataCamp

August 23, 2017
By
Sentiment analysis using tidy data principles at DataCamp

I’ve been developing a course at DataCamp over the past several months, and I am happy to announce that it is now launched! The course is Sentiment Analysis in R: the Tidy Way and I am excited that it is now available for you to explore and learn...

Read more »

Recreating and updating Minard with ggplot2

August 23, 2017
By
Recreating and updating Minard with ggplot2

Minard's chart depicting Napoleon's 1812 march on Russia is a classic of data visualization that has inspired many homages using different time-and-place data. If you'd like to recreate the original chart, or create one of your own, Andrew Heiss has created a tutorial on using the ggplot2 package to re-envision the chart in R: The R script provided in...

Read more »

Basics of data.table: Smooth data exploration

August 23, 2017
By
Basics of data.table: Smooth data exploration

The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs. We will use data Related exercise sets: Vector exercises...

Read more »

Going Bayes #rstats

August 23, 2017
By
Going Bayes #rstats

Some time ago I started working with Bayesian methods, using the great rstanarm-package. Beside the fantastic package-vignettes, and books like Statistical Rethinking or Doing Bayesion Data Analysis, I also found the ressources from Tristan Mahr helpful to both better understand Bayesian analysis and rstanarm. This motivated me to implement tools for Bayesian analysis into my

Read more »

Rcpp now used by 10 percent of CRAN packages

August 23, 2017
By
Rcpp now used by 10 percent of CRAN packages

Over the last few days, Rcpp passed another noteworthy hurdle. It is now used by over 10 percent of packages on CRAN (as measured by Depends, Imports and LinkingTo, but excluding Suggests). As of this morning 1130 packages use Rcpp out of a total of...

Read more »

Simple practice: data wrangling the iris dataset

August 23, 2017
By

If you want to work on large data science projects (analyses and machine learning) you need to be able to perform dozens of small tasks ... For example, you'll need to be able to fluently perform dozens of little bits of data wrangling, just like this ... The post Simple practice: data wrangling the iris dataset appeared first on SHARP SIGHT...

Read more »

useR!2017 Roundup

August 23, 2017
By
useR!2017 Roundup

Organising useR!2017 was a challenge but a very rewarding experience. With about 1200 attendees of over 55 nationalities exploring an interesting program, we believe it is appropriate to call it a success - something the aftermovie only seems to confirm. Behind the Scenes To give you a glimpse behind the scenes of the conference organization, Maxim Nazarov held...

Read more »

Gender roles in film direction, analyzed with R

August 22, 2017
By
Gender roles in film direction, analyzed with R

What do women do in films? If you analyze the stage directions in film scripts — as Julia Silge, Russell Goldenberg and Amber Thomas have done for this visual essay for ThePudding — it seems that women (but not men) are written to snuggle, giggle and squeal, while men (but not women) shoot, gallop and strap things to other...

Read more »

Caching httr Requests? This means WAR[C]!

August 22, 2017
By

I’ve blathered about my crawl_delay project before and am just waiting for a rainy weekend to be able to crank out a follow-up post on it. Working on that project involved sifting through thousands of Web Archive (WARC) files. While I have a nascent package on github to work with WARC files it’s a tad... Continue reading →

Read more »

Some Neat New R Notations

August 22, 2017
By
Some Neat New R Notations

The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the following: library("seplyr") names

Read more »

So you (don’t) think you can review a package

August 22, 2017
By
So you (don’t) think you can review a package

Contributing to an open-source community without contributing code is an oft-vaunted idea that can seem nebulous. Luckily, putting vague ideas into action is one of the strengths of the rOpenSci Community, and their package onboarding system offers a chance to do just that. This was my first time reviewing a package, and, as with so many things in life, I...

Read more »

Onboarding visdat, a tool for preliminary visualisation of whole dataframes

August 22, 2017
By
Onboarding visdat, a tool for preliminary visualisation of whole dataframes

Take a look at the data This is a phrase that comes up when you first get a dataset. It is also ambiguous. Does it mean to do some exploratory modelling? Or make some histograms, scatterplots, and boxplots? Is it both? Starting down either path, you often encounter the non-trivial growing pains of working with a new dataset. The mix ups of...

Read more »

How to Create an Online Choice Simulator

August 21, 2017
By
How to Create an Online Choice Simulator

What is a choice simulator? A choice simulator is an online app or an Excel workbook that allows users to specify different scenarios and get predictions. Here is an example of a choice simulator. Choice simulators have...

Read more »

RStudio v1.1 Preview – Object Explorer

August 21, 2017
By
RStudio v1.1 Preview – Object Explorer

Today, we’re continuing our blog series on new features in RStudio 1.1. If you’d like to try these features out for yourself, you can download a preview release of RStudio 1.1. Object Explorer You might already be familiar with the Data Viewer in RStudio, which allows for the inspection of data frames and other tabular R objects available in your R...

Read more »

Introducing routr – Routing of HTTP and WebSocket in R

August 21, 2017
By
Introducing routr – Routing of HTTP and WebSocket in R

routr is now available on CRAN, and I couldn’t be happier. It’s release marks the completion of an idea that stretches back longer than my attempts to bring network visualization and ggplot2 together (see this post for ref). While my PhD was stil...

Read more »

Understanding gender roles in movies with text mining

August 21, 2017
By
Understanding gender roles in movies with text mining

I have a new visual essay up at The Pudding today, using text mining to explore how women are portrayed in film. The R code behind this analysis in publicly available on GitHub. I was so glad to work with the talented Russell Goldenberg and...

Read more »

Tidyer BLS data with the blscarpeR package

August 21, 2017
By
Tidyer BLS data with the blscarpeR package

The recent release of the blscrapeR package brings the “tidyverse” into the fold. Inspired by my recent collaboration with Kyle Walker on his excellent tidycensus package, blscrapeR has been optimized for use within the tidyverse as of the current ...

Read more »

Learning things we already know about stocks

August 21, 2017
By
Learning things we already know about stocks

This example groups stocks together in a network that highlights associations within and between the groups using only historical price data. The result is far from ground-breaking; you can already guess the output. For the most part, the stocks get grouped together into pretty obvious business sectors. Despite the obvious result, the process of teasing out latent groupings from historic...

Read more »

Using regression trees for forecasting double-seasonal time series with trend in R

August 21, 2017
By
Using regression trees for forecasting double-seasonal time series with trend in R

After blogging break caused by writing research papers, I managed to secure time to write something new about time series forecasting. This time I want to share with you my experiences with seasonal-trend time series forecasting using simple regression trees. Classification and regression tree (or decision tree) is broadly used machine learning method for modeling. They are favorite because...

Read more »

Simply Mapping

August 21, 2017
By
Simply Mapping

Give me fuel, give me fire, reduced deprivation's my desire - First attempts with simple features The latest edition of the Scottish Index of Multiple Deprivation (SIMD) was released last year, and has been getting a bit more promotion recen...

Read more »

Free simmer hexagon stickers!

August 21, 2017
By
Free simmer hexagon stickers!

Do you want to get your own simmer hexagon sticker? Just fill in this form and get one send to you for free. Check out r-simmer.org or CRAN for more information on simmer, a discrete-event simulation package for R.

Read more »

Highlights of the Data Science Track at Microsoft Ignite

August 21, 2017
By

I will be at the AI Summit in San Francisco next month, which means I can't make it to Ignite in Orlando this year. Which is a bit of a shame, because there's a fantastic Data Science track at Ignite. There are 25 sessions on offer, with presentations from my Microsoft colleagues on Microsoft R, Cognitive Toolkit, Bot Framework,...

Read more »

Bayesian A/B Testing Made Easy

August 21, 2017
By
Bayesian A/B Testing Made Easy

A/B Testing is a familiar task for many working in business analytics. Essentially, A/B Testing is a simple form of hypothesis testing with one control group and one treatment group. Classical frequentist methodology instructs the analyst to estimate the expected effect of the treatment, calculate the required sample size, and perform a test to determine Related exercise sets: Hacking statistics...

Read more »

Compare Tube Types with R – Repeated Measures ANOVA

August 21, 2017
By
Compare Tube Types with R – Repeated Measures ANOVA

Background Sometimes we might want to compare three or four tube types for a particular analyte on a group of patients or we might want to see if a particular analyte is stable over time in aliqioted samples. In these experiments are essentially doing the multivariable analogue of the paired t-test. In the tube-type experiment, … Continue reading Compare...

Read more »

Computer Vision Algorithms for R users

August 21, 2017
By

Just before the summer holidays, BNOSAC presented a talk called Computer Vision and Image Recognition algorithms for R users at the UseR conference. In the talk 6 packages on Computer Vision with R were introduced in front of an audience of about 250 p...

Read more »

Be careful not to control for a post-exposure covariate

August 20, 2017
By
Be careful not to control for a post-exposure covariate

A researcher was presenting an analysis of the impact various types of childhood trauma might have on subsequent substance abuse in adulthood. Obviously, a very interesting and challenging research question. The statistical model included adjustments for several factors that are plausible confounders of the relationship between trauma and substance use, such as childhood poverty. However, the model also include...

Read more »

Transfer Learning with augmented Data for Logo Detection

August 20, 2017
By
Transfer Learning with augmented Data for Logo Detection

The last months, I have worked on brand logo detection in R with Keras. Starting with a model from scratch adding more data and using a pretrained model. The goal is to build a (deep) neural net that is able to identify brand logos in images. Just ...

Read more »

dataMaid: Your personal assistant for cleaning up the data cleaning process

August 20, 2017
By
dataMaid: Your personal assistant for cleaning up the data cleaning process

As data analysts, we all have tasks that we enjoy more than others. Some like the exploratory analysis steps, some like statistical computing, while others enjoy visualizing and communicating the results of their analyses. But we have yet to meet a dat...

Read more »

dataMaid: Your personal assistant for cleaning up the data cleaning process

August 20, 2017
By
dataMaid: Your personal assistant for cleaning up the data cleaning process

As data analysts, we all have tasks that we enjoy more than others. Some like the exploratory analysis steps, some like statistical computing, while others enjoy visualizing and communicating the results of their analyses. But we have yet to meet a dat...

Read more »

Search R-bloggers

Sponsors

Mango solutions







Zero Inflated Models and Generalized Linear Mixed Models with R



Quantide: statistical consulting and training

ODSC2

ODSC1

datasociety

http://www.eoda.de





CRC R books series







Six Sigma Online Training



statcon.de

mljar.com



Contact us if you wish to help support R-bloggers, and place your banner here.