## New R Course: Sentiment Analysis in R – The Tidy Way

August 24, 2017
Hello, R users! This week we're continuing to bridge the gap between computers and human language with the launch Sentiment Analysis in R: The Tidy Way by Julia Silge! Text datasets are diverse and ubiquitous, and sentiment analysis provides an approa...

## Practical Guide to Principal Component Methods in R

August 24, 2017
Introduction Although there are several good books on principal component methods (PCMs) and related topics, we felt that many of them are either too theoretical or too advanced. This book provides a solid practical guidance to summarize, visu...

August 24, 2017
...

## H2O.ai: Going for a paddle

August 24, 2017
Owen Jones, Placement Student A quick disclaimer: This post isn’t called H2O.ai: Going for the 100m freestyle world record. I’m not trying to win a Kaggle competition. I’m not carrying out detailed, highly-controlled benchmarking tests. I’m not, in fact, claiming to be doing anything particularly useful at all. This is just me, just playing around with some code, just for...

## A simple function for installing R packages based on a folder with R scripts

August 24, 2017
- Whenever I buy a new computer or format an old one, I have the problem of reinstalling my R packages. If you are a heavy user, you will likely have a significant amount of packages used by...

## FedData – Getting assorted geospatial data into R

August 24, 2017
The package FedData has gone through software review and is now part of rOpenSci. FedData includes functions to automate downloading geospatial data available from several federated data sources (mainly sources maintained by the US Federal government). Currently, the package enables extraction from six datasets: The National Elevation Dataset (NED) digital elevation models (1 and 1/3 arc-second; USGS) The...

## My experience in switching from Windows 10 to Linux Mint 18.2

August 24, 2017
- It has been 8 months since I switched from Windows 10 to Linux Mint. In this post I’ll talk about my experience as a scholar and R user in this transition. My work is, simply put, to...

## Analyzing Google Trends Data in R

August 23, 2017
Google Trends shows the changes in the popularity of search terms over a given time (i.e., number of hits over time). It can be used to find search terms with growing or decreasing popularity or...

## Hard-nosed Indian Data Scientist Gospel Series – Part 1 : Incertitude around Tools and Technologies

August 23, 2017
Before recession a commercial tool was popular in the country, hence, uncertainty around tools and technology was not much; however, after recession, incertitude (i.e. uncertainty) around tools and technology have pre-occupied and occupying data sc...

## Digit fifth powers: Euler Problem 30

August 23, 2017
$Digit fifth powers: Euler Problem 30$

Euler problem 30 is another number crunching problem that deals with numbers to the power of five. Two other Euler problems dealt with raising numbers to a power. The previous problem looked at permutations of powers and problem 16 asks for … Continue reading → The post Digit fifth powers: Euler Problem 30 appeared first on The Devil is in the...

## Control Systems Toolbox – System Interconnection

August 23, 2017
Introduction Dynamic systems are usually represented by a model before they can be analyzed computationally. These dynamic systems are systems that change, evolve or have their states altered or varied with time based on a set of defined rules. Dynamic systems could be mechanical, electrical, electronic, biological, sociological, and so on. Many such systems are usually defined by a set...

## Sentiment analysis using tidy data principles at DataCamp

August 23, 2017
I’ve been developing a course at DataCamp over the past several months, and I am happy to announce that it is now launched! The course is Sentiment Analysis in R: the Tidy Way and I am excited that it is now available for you to explore and learn...

## Recreating and updating Minard with ggplot2

August 23, 2017
Minard's chart depicting Napoleon's 1812 march on Russia is a classic of data visualization that has inspired many homages using different time-and-place data. If you'd like to recreate the original chart, or create one of your own, Andrew Heiss has created a tutorial on using the ggplot2 package to re-envision the chart in R: The R script provided in...

## Basics of data.table: Smooth data exploration

August 23, 2017
The data.table package provides perhaps the fastest way for data wrangling in R. The syntax is concise and is made to resemble SQL. After studying the basics of data.table and finishing this exercise set successfully you will be able to start easing into using data.table for all your data manipulation needs. We will use data Related exercise sets: Vector exercises...

## Going Bayes #rstats

August 23, 2017
Some time ago I started working with Bayesian methods, using the great rstanarm-package. Beside the fantastic package-vignettes, and books like Statistical Rethinking or Doing Bayesion Data Analysis, I also found the ressources from Tristan Mahr helpful to both better understand Bayesian analysis and rstanarm. This motivated me to implement tools for Bayesian analysis into my

## Rcpp now used by 10 percent of CRAN packages

August 23, 2017
Over the last few days, Rcpp passed another noteworthy hurdle. It is now used by over 10 percent of packages on CRAN (as measured by Depends, Imports and LinkingTo, but excluding Suggests). As of this morning 1130 packages use Rcpp out of a total of...

## Simple practice: data wrangling the iris dataset

August 23, 2017
If you want to work on large data science projects (analyses and machine learning) you need to be able to perform dozens of small tasks ... For example, you'll need to be able to fluently perform dozens of little bits of data wrangling, just like this ... The post Simple practice: data wrangling the iris dataset appeared first on SHARP SIGHT...

## useR!2017 Roundup

August 23, 2017
Organising useR!2017 was a challenge but a very rewarding experience. With about 1200 attendees of over 55 nationalities exploring an interesting program, we believe it is appropriate to call it a success - something the aftermovie only seems to confirm. Behind the Scenes To give you a glimpse behind the scenes of the conference organization, Maxim Nazarov held...

## Gender roles in film direction, analyzed with R

August 22, 2017
What do women do in films? If you analyze the stage directions in film scripts — as Julia Silge, Russell Goldenberg and Amber Thomas have done for this visual essay for ThePudding — it seems that women (but not men) are written to snuggle, giggle and squeal, while men (but not women) shoot, gallop and strap things to other...

## Caching httr Requests? This means WAR[C]!

August 22, 2017
I’ve blathered about my crawl_delay project before and am just waiting for a rainy weekend to be able to crank out a follow-up post on it. Working on that project involved sifting through thousands of Web Archive (WARC) files. While I have a nascent package on github to work with WARC files it’s a tad... Continue reading →

## Some Neat New R Notations

August 22, 2017
The R package seplyr supplies a few neat new coding notations. An Abacus, which gives us the term “calculus.” The first notation is an operator called the “named map builder”. This is a cute notation that essentially does the job of stats::setNames(). It allows for code such as the following: library("seplyr") names

## So you (don’t) think you can review a package

August 22, 2017
Contributing to an open-source community without contributing code is an oft-vaunted idea that can seem nebulous. Luckily, putting vague ideas into action is one of the strengths of the rOpenSci Community, and their package onboarding system offers a chance to do just that. This was my first time reviewing a package, and, as with so many things in life, I...

## Onboarding visdat, a tool for preliminary visualisation of whole dataframes

August 22, 2017
Take a look at the data This is a phrase that comes up when you first get a dataset. It is also ambiguous. Does it mean to do some exploratory modelling? Or make some histograms, scatterplots, and boxplots? Is it both? Starting down either path, you often encounter the non-trivial growing pains of working with a new dataset. The mix ups of...

## How to Create an Online Choice Simulator

August 21, 2017
What is a choice simulator? A choice simulator is an online app or an Excel workbook that allows users to specify different scenarios and get predictions. Here is an example of a choice simulator. Choice simulators have...

## RStudio v1.1 Preview – Object Explorer

August 21, 2017
Today, we’re continuing our blog series on new features in RStudio 1.1. If you’d like to try these features out for yourself, you can download a preview release of RStudio 1.1. Object Explorer You might already be familiar with the Data Viewer in RStudio, which allows for the inspection of data frames and other tabular R objects available in your R...

## Introducing routr – Routing of HTTP and WebSocket in R

August 21, 2017
routr is now available on CRAN, and I couldn’t be happier. It’s release marks the completion of an idea that stretches back longer than my attempts to bring network visualization and ggplot2 together (see this post for ref). While my PhD was stil...

## Understanding gender roles in movies with text mining

August 21, 2017
I have a new visual essay up at The Pudding today, using text mining to explore how women are portrayed in film. The R code behind this analysis in publicly available on GitHub. I was so glad to work with the talented Russell Goldenberg and...

## Tidyer BLS data with the blscarpeR package

August 21, 2017
The recent release of the blscrapeR package brings the “tidyverse” into the fold. Inspired by my recent collaboration with Kyle Walker on his excellent tidycensus package, blscrapeR has been optimized for use within the tidyverse as of the current ...

August 21, 2017
This example groups stocks together in a network that highlights associations within and between the groups using only historical price data. The result is far from ground-breaking; you can already guess the output. For the most part, the stocks get grouped together into pretty obvious business sectors. Despite the obvious result, the process of teasing out latent groupings from historic...