The data frame is the primary structure for working with data in R. Whenever you have data that is arranged in a spreadsheet-like fashion, the default receptacle for that data in R is the data frame. In a data frame,

Abstract Continuing the analysis of first names given to newborns in Berlin 2016, we solve the following problem: what is the probability, that in a school class of size \(n\) of these kids there will be at least two kids having the same first name? The answer to the problem for classes of size 26 is 34% and...

A Brief Introduction Linear regression is a classic supervised statistical technique for predictive modelling which is based on the linear hypothesis: y = mx + c where y is the response or outcome variable, m is the gradient of the linear trend-line, x is the predictor variable and c is the intercept. The intercept is… Continue reading...

A participant in the R course I’m teaching showed me a case where a tbl_df (the new flavour of data frame provided by the tibble package; standard in new RStudio versions) interacts badly with the t.test function. I had not seen this happen before. The reason is this: Interacting with legacy code A handful of

In a previous post, I compared equivalence tests to Bayes factors, and pointed out several benefits of equivalence tests. But a much more logical comparison, and one I did not give enough attention to so far, is the ROPE procedure using Bayesian estimation. I’d like to thank John Kruschke for feedback on a draft of this blog post....

More and more packages, be it for R or another language, are now interfacing different application programming interfaces (API) which are exposed to the web. And many of these may require an API key, or token, or account and password. Which traditionally poses a problem in automated tests such as those running on the popular Travis CI service...

The notebooks (R, Rmd, and HTML files are provided in my GitHub repository) focus on an exploratory analysis of the open data set on the complaints in the field of freedom of information, provided at the Open Data Portal of the Republic of Serbia that is currently under development. The data set was kindly provided to the...

I like Wikipedia. My husband likes it even more, he included it in his PhD thesis acknowledgements! I appreciate the efforts done for sharing knowledge, and also the apparently random stuff you can find on the website. In particular, I’ve been intrig...

Contributed by Thomas Kassel. He is currently enrolled in the NYC Data Science Academy 17-week remote bootcamp program taking place from January-April 2017. This post is based on his second class project, R The post Tracking Exercise Trends with NHANES appeared first on NYC Data Science Academy Blog.

Recently Dirk Eddelbuettel pointed out that our R function debugging wrappers would be more convenient if they were available in a low-dependency micro package dedicated to little else. Dirk is a very smart person, and like most R users we are deeply in his debt; so we (Nina Zumel and myself) listened and immediately moved … Continue...

RDSTK is a very versatile package. It includes functions to help you convert IP address to geo locations and derive statistics from them. It also allows you to input a body of text and convert it into sentiments. This is a continuation from the last exercise RDSTK 1 This package provides an R interface to
Related exercise sets:

I’m uncontainably excited to report that the ggplot2 extension package ggalt is now on CRAN. The absolute best part of this package is the R community members who contributed suggestions and new geoms, stats, annotations and integration features. This release would not be possible without the PRs from: Ben Bolker Ben Marwick Jan Schulz Carson... Continue reading...

Authors: Christof Naumzik & Stefan Feuerriegel Caffe (http://caffe.berkeleyvision.org) provides a powerful framework for deep learning. It is developed and maintained by the Berkeley Vision and Learning Center (BVLC) and has received a great deal of traction lately. Caffe enables users to define and train custom-made neural networks without hard-coding. Furthermore, it allows users to execute … Continue...

On January 31, the R Consortium presented a webinar with updates on various projects that have been funded (thanks to the R Consortium member dues) and are underway. Each project was presented by the project leader, a member of the R community. You can watch the recording of the webinar here, but here's a brief summary of what was...

by Merav Yuravlivker, CEO of Data Society “I’m not a coder” or “I was never good at math” is a frequent refrain I hear when I ask professionals about their data analysis skills. Through popular culture and stereotypes, most people who don’t have a background in programming automatically underestimate their ability to create amazing things

An updated anytime package arrived at CRAN yesterday. This is release number nine, and the first with a little gap to the prior release on Christmas Eve as the features are stabilizing, as is the implementation. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character,...

Serving book exercises in the web with Shiny and Exams - As you may know, I recently published a R book in amazon. I decided to keep R exercises out of the book and serve them freely over the web....

Statisticians have long known that the use of p-values has major problems. Some of us have long called for reform, weaning the profession away from these troubling beasts. At one point, I was pleased to see Frank Harrell suggest that R should stop computing them. That is not going to happen, but last year the … Continue...

On Thursday, March 2nd, I will give the first lecture of the PhD course on advanced tools for econometrics, on nonlinearities. Slides are available online.

When we last looked at job trends from indeed.com, job listings for "R statistics" were on the rise but were still around half the volume of listings for "SAS statistics". Three-and-a-half years later, R has overtaken SAS in job listings for "statistics". I added Python to the search this time; job listings for "Python statistics" have risen at a...

We’re excited to announce the latest release of RStudio Connect: version 1.4.2. This release includes a number of notable features including an overhauled interface for parameterized R Markdown reports. The most notable feature in this release is the ability to publish parameterized R Markdown reports that are easier for anyone to customize. If you’re unfamiliar,