Dataframes and the tidyverse

February 12, 2017
By
Dataframes and the tidyverse

The data frame is the primary structure for working with data in R. Whenever you have data that is arranged in a spreadsheet-like fashion, the default receptacle for that data in R is the data frame. In a data frame,

Read more »

Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?

February 12, 2017
By
Video Introduction to Bayesian Data Analysis, Part 1: What is Bayes?

This is video one of a three part introduction to Bayesian data analysis aimed at you who isn’t necessarily that well-versed in probability theory but that do know a little bit of programming. I gave a version of this tutorial at the UseR 2015 conf...

Read more »

Happy pbirthday class of 2016

February 12, 2017
By
Happy pbirthday class of 2016

Abstract Continuing the analysis of first names given to newborns in Berlin 2016, we solve the following problem: what is the probability, that in a school class of size \(n\) of these kids there will be at least two kids having the same first name? The answer to the problem for classes of size 26 is 34% and...

Read more »

Implementing the Gradient Descent Algorithm in R

February 12, 2017
By
Implementing the Gradient Descent Algorithm in R

A Brief Introduction Linear regression is a classic supervised statistical technique for predictive modelling which is based on the linear hypothesis: y = mx + c where y is the response or outcome variable, m is the gradient of the linear trend-line, x is the predictor variable and c is the intercept. The intercept is… Continue reading...

Read more »

Using R: tibbles and the t.test function

February 12, 2017
By
Using R: tibbles and the t.test function

A participant in the R course I’m teaching showed me a case where a tbl_df (the new flavour of data frame provided by the tibble package; standard in new RStudio versions) interacts badly with the t.test function. I had not seen this happen before. The reason is this: Interacting with legacy code A handful of

Read more »

ROPE and Equivalence Testing: Practically Equivalent?

February 12, 2017
By
ROPE and Equivalence Testing: Practically Equivalent?

In a previous post, I compared equivalence tests to Bayes factors, and pointed out several benefits of equivalence tests. But a much more logical comparison, and one I did not give enough attention to so far, is the ROPE procedure using Bayesian estimation. I’d like to thank John Kruschke for feedback on a draft of this blog post....

Read more »

Letting Travis keep a secret

February 12, 2017
By

More and more packages, be it for R or another language, are now interfacing different application programming interfaces (API) which are exposed to the web. And many of these may require an API key, or token, or account and password. Which traditionally poses a problem in automated tests such as those running on the popular Travis CI service...

Read more »

R in Open Data: Complaints in The Field of Freedom of Information data set from data.gov.rs

February 12, 2017
By
R in Open Data: Complaints in The Field of Freedom of Information data set from data.gov.rs

The notebooks (R, Rmd, and HTML files are provided in my GitHub repository) focus on an exploratory analysis of the open data set on the complaints in the field of freedom of information, provided at the Open Data Portal of the Republic of Serbia that is currently under development. The data set was kindly provided to the...

Read more »

Text mining and word cloud fundamentals in R : 5 simple steps you should know

February 11, 2017
By
Text mining and word cloud fundamentals in R : 5 simple steps you should know

Text mining methods allow us to highlight the most frequently used keywords in a paragraph of texts. One can create a word cloud, also referred as text cloud or tag cloud, which is a visual representation of text data. The procedure of creating...

Read more »

Who were the notable dead of Wikipedia?

February 11, 2017
By
Who were the notable dead of Wikipedia?

As described in my last post, I extracted all notable deaths from Wikipedia over the 2004-2016 period. In this post I want to explore this study population. Who were the notable dead? How old were notable dead? Let me assume here most entries of th...

Read more »

Extracting notable deaths from Wikipedia

February 11, 2017
By

I like Wikipedia. My husband likes it even more, he included it in his PhD thesis acknowledgements! I appreciate the efforts done for sharing knowledge, and also the apparently random stuff you can find on the website. In particular, I’ve been intrig...

Read more »

Were there more notable deaths than expected in 2016?

February 11, 2017
By
Were there more notable deaths than expected in 2016?

After exploring my study population of Wikipedia deaths, I want to analyse the time series of monthly counts of notable deaths. This is not a random interest of mine, my PhD thesis was about monitoring time series of count, the application being weekly...

Read more »

Conditional ggplot2 geoms in functions (QTL plots)

February 11, 2017
By
Conditional ggplot2 geoms in functions (QTL plots)

When running an analysis, I am usually combining functions from multiple packages. Most of these packages come with their own plotting functions. And while they are certainly convenient in that they allow me to get a quick glance at the data or the out...

Read more »

Tracking Exercise Trends with NHANES

February 11, 2017
By
Tracking Exercise Trends with NHANES

Contributed by Thomas Kassel. He is currently enrolled in the NYC Data Science Academy 17-week remote bootcamp program taking place from January-April 2017. This post is based on his second class project, R The post Tracking Exercise Trends with NHANES appeared first on NYC Data Science Academy Blog.

Read more »

Announcing the wrapr packge for R

February 11, 2017
By
Announcing the wrapr packge for R

Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved … Continue...

Read more »

Data Hacking with RDSTK 2

February 11, 2017
By
Data Hacking with RDSTK 2

RDSTK is a very versatile package. It includes functions to help you convert IP address to geo locations and derive statistics from them. It also allows you to input a body of text and convert it into sentiments. This is a continuation from the last exercise RDSTK 1 This package provides an R interface to Related exercise sets:

Read more »

ggalt 0.4.0 now on CRAN

February 11, 2017
By
ggalt 0.4.0 now on CRAN

I’m uncontainably excited to report that the ggplot2 extension package ggalt is now on CRAN. The absolute best part of this package is the R community members who contributed suggestions and new geoms, stats, annotations and integration features. This release would not be possible without the PRs from: Ben Bolker Ben Marwick Jan Schulz Carson... Continue reading...

Read more »

caffeR: an R wrapper for ‘caffe’

February 11, 2017
By

Authors: Christof Naumzik & Stefan Feuerriegel Caffe (http://caffe.berkeleyvision.org) provides a powerful framework for deep learning. It is developed and maintained by the Berkeley Vision and Learning Center (BVLC) and has received a great deal of traction lately. Caffe enables users to define and train custom-made neural networks without hard-coding. Furthermore, it allows users to execute … Continue...

Read more »

Update on R Consortium Projects

February 10, 2017
By
Update on R Consortium Projects

On January 31, the R Consortium presented a webinar with updates on various projects that have been funded (thanks to the R Consortium member dues) and are underway. Each project was presented by the project leader, a member of the R community. You can watch the recording of the webinar here, but here's a brief summary of what was...

Read more »

Three Tips for Training Excel Users in R

February 10, 2017
By
Three Tips for Training Excel Users in R

by Merav Yuravlivker, CEO of Data Society “I’m not a coder” or “I was never good at math” is a frequent refrain I hear when I ask professionals about their data analysis skills. Through popular culture and stereotypes, most people who don’t have a background in programming automatically underestimate their ability to create amazing things

Read more »

Visualizing and wrangling MCMC output in R with `MCMCvis`

February 10, 2017
By
Visualizing and wrangling MCMC output in R with `MCMCvis`

Model results can be thought of as a reward for the many hours of model design, troubleshooting, re-design, etc. that analyses often require. Following the potentially exhausting mental exercise to acquire these results, I think we’d all like the interpretation … Continue reading →

Read more »

anytime 0.2.1

February 10, 2017
By

An updated anytime package arrived at CRAN yesterday. This is release number nine, and the first with a little gap to the prior release on Christmas Eve as the features are stabilizing, as is the implementation. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character,...

Read more »

Shiny+Exams

February 10, 2017
By

Serving book exercises in the web with Shiny and Exams - As you may know, I recently published a R book in amazon. I decided to keep R exercises out of the book and serve them freely over the web....

Read more »

Poor Donald – his tweets keep getting more negative

February 10, 2017
By
Poor Donald – his tweets keep getting more negative

Last summer, David Robinson did this interesting text analysis of Donald Trump’s tweets and found that they more angry ones came from Android (which Trump is known to use). But he didn’t consider how Trump’s emotional state varies over time and he certainly couldn’t have...

Read more »

Percentile Calculations in Water Quality Regulations

February 9, 2017
By
Percentile Calculations in Water Quality Regulations

Demonstrating the various ways percentile calculations can be undertaken in R and specifically with respect to measuring turbidity in water supplies. Continue reading → The post Percentile Calculations in Water Quality Regulations appeared first on The Devil is in the Data.

Read more »

Any Forward Progress on p-Values?

February 9, 2017
By
Any Forward Progress on p-Values?

Statisticians have long known that the use of p-values has major problems. Some of us have long called for reform, weaning the profession away from these troubling beasts. At one point, I was pleased to see Frank Harrell suggest that R should stop computing them. That is not going to happen, but last year the … Continue...

Read more »

Advanced Econometrics: Nonlinearities

February 9, 2017
By

On Thursday, March 2nd, I will give the first lecture of the PhD course on advanced tools for econometrics, on nonlinearities. Slides are available online.

Read more »

Job trends for R and Python

February 9, 2017
By
Job trends for R and Python

When we last looked at job trends from indeed.com, job listings for "R statistics" were on the rise but were still around half the volume of listings for "SAS statistics". Three-and-a-half years later, R has overtaken SAS in job listings for "statistics". I added Python to the search this time; job listings for "Python statistics" have risen at a...

Read more »

RStudio Connect 1.4.2

February 9, 2017
By
RStudio Connect 1.4.2

We’re excited to announce the latest release of RStudio Connect: version 1.4.2. This release includes a number of notable features including an overhauled interface for parameterized R Markdown reports. The most notable feature in this release is the ability to publish parameterized R Markdown reports that are easier for anyone to customize. If you’re unfamiliar,

Read more »

Sponsors

Mango solutions









Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

ODSC1

ODSC2

datasociety

http://www.eoda.de







CRC R books series







Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.