R Consortium grant applications due October 31

October 9, 2018
By
R Consortium grant applications due October 31

Since 2015, the R Consortium has funded projects of benefit to, and proposed by, the R community. Twice a year, the R Consortium Infrastructure Steering Committee reviews grant proposals and makes awards based on merit and funds available. (Those funds come, in turn, from the annual dues paid by R Consortium members.) If you'd like to propose a project...

Read more »

Geocoding with ggmap and the Google API

October 9, 2018
By
Geocoding with ggmap and the Google API

Some of the most popular articles on the Devil is in the Data show how to visualise spatial data creatively. In the old days, obtaining latitude and longitude required a physical survey, with Google maps, this has become a lot easier. The geocode function from the ggmap package extracts longitude and latitude from Google maps, … Continue reading "Geocoding...

Read more »

dqsample: A bias-free alternative to base::sample()

October 9, 2018
By
dqsample: A bias-free alternative to base::sample()

For many tasks in statistics and data science it is useful to create a random sample or permutation of a data set. Within R the function base::sample() is used for this task. Unfortunately this function uses a slightly biased algorithm for creating random integers within a given range. Most recently this issue has been discussed

Read more »

Big Data-2: Move into the big league:Graduate from R to SparkR

October 9, 2018
By

This post is a continuation of my earlier post Big Data-1: Move into the big league:Graduate from Python to Pyspark. While the earlier post discussed parallel constructs in Python and Pyspark, this post elaborates similar and key constructs in R and SparkR. While this post just focuses on the programming part of R and SparkR it … Continue reading Big...

Read more »

First steps of data exploration and visualization with Tidyverse

October 8, 2018
By
First steps of data exploration and visualization with Tidyverse

CategoriesIntroduction Tags Data Visualisation R Programming tidyverse Tips & Tricks In this post, I will show you, how to use visualization and transformation for exploring your data in R. I will use several functions that come with Tidyverse package. In general, there are two types of variables, categorical and continuous. In this section, I will show the best option to examine their distributions using the...

Read more »

In regression, we assume noise is independent of all measured predictors. What happens if it isn’t?

October 8, 2018
By
In regression, we assume noise is independent of all measured predictors. What happens if it isn’t?

A number of key assumptions underlie the linear regression model - among them linearity and normally distributed noise (error) terms with constant variance In this post, I consider an additional assumption: the unobserved noise is uncorrelated with any covariates or predictors in the model. In this simple model: \ \(Y_i\) has both a structural and stochastic...

Read more »

Parsing Metadata with R – A Package Story

Parsing Metadata with R – A Package Story

Every R package has its story. Some packages are written by experts, some by novices. Some are developed quickly, others were long in the making. This is the story of jstor, a package which I developed during my time as a student of sociology, working in a research project on the scientific elite within sociology. Writing the package has taught me many things...

Read more »

RStudio 1.2 Preview: Reticulated Python

October 8, 2018
By
RStudio 1.2 Preview: Reticulated Python

One of the primary focuses of RStudio v1.2 is improved support for other languages frequently used with R. Last week on the blog we talked about new features for working with SQL and D3. Today we’re taking a look at enhancements we’ve made around the reticulate package (an R interface to Python). The reticulate package makes it possible to embed...

Read more »

How to build your own Neural Network from scratch in R

October 8, 2018
By
How to build your own Neural Network from scratch in R

Last week I ran across this great post on creating a neural network in Python. It walks through the very basics of neural networks and creates a working example using Python. I enjoyed the simple hands on approach the author used, and I was interested to see how we might make the same model using R. In this post we...

Read more »

The world (population) is changing

The world (population) is changing

Last month, Max Roser presented a cartogram of the Earth’s population in 2018. He also provided some perspectives on its spatial distribution in an article on the worldinourdata.org, which I recommend. Links to the article were shared in many places, including in the blog post A Map of the World Where the Sizes of Countries Are Determined by Population. The author, Jason...

Read more »

Running the Same Task in Python and R

October 8, 2018
By
Running the Same Task in Python and R

According to a KDD poll fewer respondents (by rate) used only R in 2017 than in 2018. At the same time more respondents (by rate) used only Python in 2017 than in 2016. Let’s take this as an excuse to take a quick look at what happens when we try a task in both systems. … Continue reading Running...

Read more »

A question and an answer about recoding several factors simultaneously in R

October 8, 2018
By
A question and an answer about recoding several factors simultaneously in R

Data manipulation is a breeze with amazing packages like plyr and dplyr. Recoding factors, which could prove to be a daunting task especially for variables that have many categories, can easily be accomplished with these packages. However, it is important for those learning Data Science to understand how the basic R works. In this regard, I seek help from R...

Read more »

Andrew Gelman discusses election forecasting and polling. (Transcript)

October 8, 2018
By
Andrew Gelman  discusses election forecasting and polling. (Transcript)

Here is the podcast link. Introducing Andrew Gelman Hugo: Hi there, Andy, and welcome to DataFramed. Andrew: Hello. Hugo: Such a pleasure to have you...

Read more »

Announcing MCHT: An R Package for Bootstrap and Monte Carlo Hypothesis Testing

October 8, 2018
By
Announcing MCHT: An R Package for Bootstrap and Monte Carlo Hypothesis Testing

MCHT is an R package for bootstrap and Monte Carlo hypothesis testing currently available on GitHub.

Read more »

Are you buying an apartment? How to hack competition in the real estate market with data monitoring

October 8, 2018
By
Are you buying an apartment? How to hack competition in the real estate market with data monitoring

In the last couple of years, real estate companies have shifted their focus to the digital world, and now almost all investments have an online system showing what apartments are available. This is very convenient for their potential clients, as they can easily become familiar with the apartments on offer. Things become interesting when all Artykuł Are you buying...

Read more »

Prettify your Shiny Tables with DT: Exercises

October 8, 2018
By
Prettify your Shiny Tables with DT: Exercises

Have you ever wanted to make your Shiny tables interactive, more functional and look better? The DT package, which stands for “DataTables”, provides an R interface to the JavaScript library “DataTables”. It allows creating high standard tables by implementing the functionalities and design features that are available through the “DataTables” library. Even though the DT Related exercise sets: Parallel Computing...

Read more »

R and Python: How to Integrate the Best of Both into Your Data Science Workflow

R and Python: How to Integrate the Best of Both into Your Data Science Workflow

From Executive Business Leadership to Data Scientists, we all agree on one thing: A data-driven transformation is happening. Artificial Intelligence (AI) and more specifically, Data Science, are redefining how organizations extract insights from their ...

Read more »

September 2018: Top 40 New Packages

October 7, 2018
By
September 2018: Top 40 New Packages

September was another relatively slow month for new package activity on CRAN: “only” 126 new packages by my count. My Top 40 list is heavy on what I characterize as “utilities”: packages that either extend R in some fashion or make it easier to do things in R. This month, the packages I selected fall into eight categories: Data,...

Read more »

Distinguish yourself in CRAN person() with ORCID

Distinguish yourself in CRAN person() with ORCID

Proper identification of individuals is crucial for acknowledging and studying their scientific work, be it journal articles or pieces of software. In this tech note, one year after CRAN started supporting ORCIDs, we shall explain why and how to use unique author identifiers in DESCRIPTION files. Why use ORCIDs on CRAN? When analyzing the authorship of CRAN packages, one can look at authors’ names and email...

Read more »

Predicting height based on DNA mutations

October 7, 2018
By
Predicting height based on DNA mutations

In this post, I show some results of predicting height based on DNA mutations. This analysis aims at reproducing the analysis of this paper using my own analysis tools in. I use a new dataset composed of 500,000 adults from UK, and genotyped over hund...

Read more »

Partially additive (generalized) linear model trees

October 7, 2018
By
Partially additive (generalized) linear model trees

The PALM tree algorithm for partially additive (generalized) linear model trees is introduced along with the R package palmtree. One potential application is modeling of treatment-subgroup interactions while adjusting for global additive effe...

Read more »

First release and update dates of R Packages statistics

October 7, 2018
By
First release and update dates of R Packages statistics

R has been around long time and the packages have evolved through the years as well. From the initial releases, updates, to new packages. Like many open-source and community driven languages, R is not an exception. And getting the first…Read more ›

Read more »

Mining Sent Email for Self-Knowledge

October 7, 2018
By
Mining Sent Email for Self-Knowledge

How can we use data analytics to increase our self-knowledge? Along with biofeedback from digital devices like FitBit, less structured sources such as sent emails can provide insights. E.g. here it seems my communication took a sudden more positive tu...

Read more »

Subsetting in the presence of NAs

October 6, 2018
By

In R, we can subset a data frame df easily by putting the conditional in square brackets after df. For example, if I want all the rows in df which have value equal to 1 in the column colA, all … Continue reading →

Read more »

Make a trailer for your slidedeck with av

October 6, 2018
By

rOpenSci post-doc hacker Jeroen Ooms has just released a cool new package, av, that he wrote “will become the video counterpart of the magick package which for working with images.”. av provides bindings to the FFmepg libraries for editing videos. It’s already become a renderer for gganimate by Thomas Lin Pedersen, but av allows more than making...

Read more »

Analyzing the Greatest Strikers in Football II: Visualizing Data

October 6, 2018
By
Analyzing the Greatest Strikers in Football II: Visualizing Data

This is the second part of Analyzing the Greatest Strikers in Football. In the first part, we created the function get_goals() which allows us to conveniently scrape detailed information of players career goals from transfermarkt.co.uk. In this part, we are going to explore the data. library(tidyverse) # for data wrangling library(lubridate) # for date formats library(ggimage) # adding images to ggplot library(patchwork) # attaching...

Read more »

The “Gold Standard” of Data Science Project Management

October 6, 2018
By
The “Gold Standard” of Data Science Project Management

The “Gold Standard” for Data Science Project ManagementThe inspiration for this post came most recently from a slide-deck by Ming Tang, a Bioinformatician at Harvard, and a new Chromebook Data Science course offered by Jeffery Leek from John Hopkin...

Read more »

Quick Significance Calculations for A/B Tests in R

October 6, 2018
By

Introduction Let’s take a quick look at a very important and common experimental problem: checking if the difference in success rates of two Binomial experiments is statistically significant. This can arise in A/B testing situations such as online advertising, sales, and manufacturing. We already share a free video course on a Bayesian treatment of planning … Continue reading Quick...

Read more »

RcppCCTZ 0.2.4

October 6, 2018
By

A new release 0.2.4 of RcppCCTZ is now on CRAN. RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil tim...

Read more »

Search R-bloggers


Sponsors

Mango solutions





mckinsey.com

Zero Inflated Models and Generalized Linear Mixed Models with R



datasciencego.com

Quantide: statistical consulting and training

ODSC west

ODSC2 west

datasociety

http://www.eoda.de









Six Sigma Online Training

mljar.com

Our ads respect your privacy. Read our Privacy Policy page to learn more.

Contact us if you wish to help support R-bloggers, and place your banner here.