nnetsauce for R

January 30, 2020
By
nnetsauce for R

nnetsauce is now available to R users (currently, a development version). As a reminder, for those who are interested, the following page illustrates different use-cases for the nnetsauce, including deep learning application examples. This post from September 18 is about an Adaptive Boosting (boosting) algorithm variant available in the nnetsauce. This other post from September 25 presents a Bootstrap...

Read more »

A Shiny App for Tracking Moral Networks

January 30, 2020
By
A Shiny App for Tracking Moral Networks

Background This is a post outlining a ShinyApp that I made for visualising inter-participant agreement on quesions relating to Haidt’s Moral Foundations (e.g., Haidt and Joseph 2008). This is part of a line of research on moral judgements, inspired by DAFINET project, where I aim to investigate the role of agreement with others in the robustness of moral judgements. It...

Read more »

Comparing Ensembl GTF and cDNA

January 30, 2020
By
Comparing Ensembl GTF and cDNA

It seems that most people think Ensembl’s GTF file and cDNA fasta file mean the same transcripts: Watch out! @ensembl's Fasta and GTF annotation files available via https://t.co/2AhCSnL7py do not match (there are transcripts in the GTF not found in the Fasta file. Anyone else expected...

Read more »

An efficient way to install and load R packages

January 30, 2020
By

What is a R package and how to use it? Inefficient way to install and load R packages More efficient way What is a R package and how to use it? Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections...

Read more »

another easy Riddler

January 30, 2020
By
another easy Riddler

A quick riddle from the Riddler In a two-person game, Abigail and Zian both choose between a and z. Abigail win one point with probability .9 if they choose (a,a) and with probability 1 if they choose (a,z), and two points with probability .4 if they choose (z,z) and with probability .6 if they choose

Read more »

Building the R Community in Southern Africa

January 30, 2020
By
Building the R Community in Southern Africa

By Heather Turner, Chair of Forwards, the R Foundation taskforce for underrepresented groups in the R Community In this post I will give the background to the Forwards Southern Africa... The post Building the R Community in Southern Africa appeared first on R Consortium.

Read more »

Supplement to ‘Nonparametric estimation of the service time distribution in discrete-time queueing networks’

Great news: a scientific article I have co-authored has been accepted for publication and can now be found online here or via the DOI 10.1016/j.spa.2020.01.011. Yes, my list of publications has been amended 1. This article has been through quite a leng...

Read more »

Create a Notebook to Explore Country-Level CO2 Emissions With a Few Clicks

Create a Notebook to Explore Country-Level CO2 Emissions With a Few Clicks

Assume that you have some new data that you want to explore. The new CRAN version of the ‘ExPanDaR’ package helps by providing a (customized) R notebook containing all building blocks of an exploratory data analysis with a few clicks. Install the Package and Start ExPanD First, you need to install the package. I recommend installing the Github development version of the package as it...

Read more »

Applied Bayesian Statistics Using Stan and R

January 29, 2020
By
Applied Bayesian Statistics Using Stan and R

Whether researchers occasionally turn to Bayesian statistical methods out of convenience or whether they firmly subscribe to the Bayesian paradigm for philosophical reasons: The use of Bayesian statistics in the social sciences is becoming increasingly widespread. However, seemingly high entry costs still keep many applied researchers from embracing Bayesian methods. Next to a lack of familiarity with the underlying...

Read more »

R Consortium Simplifies Membership Structure to Increase Opportunities for Silver Level Members

January 29, 2020
By

Two membership levels now available: Platinum and Silver SAN FRANCISCO, January 29, 2020 – The R Consortium, a Linux Foundation project supporting the R Foundation and R community, today announced... The post R Consortium Simplifies Membership Structure to Increase Opportunities for Silver Level Members appeared first on R Consortium.

Read more »

Monitoring Website SSL/TLS Certificate Expiration Times with R, {openssl}, {pushoverr}, and {DT}

January 29, 2020
By
Monitoring Website SSL/TLS Certificate Expiration Times with R, {openssl}, {pushoverr}, and {DT}

macOS R users who tend to work on the bleeding edge likely noticed some downtime at this past weekend. Part of the issue was an SSL/TLS certificate expiration situation. Moving forward, we can monitor this with R using the super spiffy {openssl} and {pushoverr} packages whilst also generating a daily report with {rmarkdown} and... Continue reading →

Read more »

Evaluate your R model with MLmetrics

January 28, 2020
By
Evaluate your R model with MLmetrics

This post will explore using R’s MLmetrics to evaluate machine learning models. MLmetrics provides several functions to calculate common metrics for ML models, including AUC, precision, recall, accuracy, etc. Building an example model Firstly, we need to build a model to use as an example. For this post, we’ll be using a dataset on pulsar The post Evaluate your...

Read more »

Do my data follow a normal distribution ? A note on the most widely used distribution and how to test for normality in R

January 28, 2020
By
Do my data follow a normal distribution ? A note on the most widely used distribution and how to test for normality in R

What is a normal distribution? Empirical rule Parameters Probabilities and standard normal distribution Areas under the normal distribution in R and by hand Ex. 1 In R By hand Ex. 2 In R By hand Ex. 3 In R By hand Ex. 4 In R By hand Ex. 5 Why is the normal distribution so crucial in statistics? How to test the normality assumption Histogram Density plot QQ-plot Normality test References What is a normal distribution? The normal distribution is a function that defines how...

Read more »

The significance of the region on the salary of engineers in Sweden

The significance of the region on the salary of engineers in Sweden

So far I have analysed the effect of experience, education, gender and year on the salary of engineers in Sweden. In this post, I will have a look at the effect of the region on the salary of engineers in Sweden. Statistics Sweden use NUTS (Nomenclature des Unités Territoriales Statistiques), which is the EU’s hierarchical regional division, to specify the...

Read more »

sparklyr 1.1: Foundations, Books, Lakes and Barriers

January 28, 2020
By
sparklyr 1.1: Foundations, Books, Lakes and Barriers

Today we are excited to share that sparklyr 1.1 is now available on CRAN! In a nutshell, you can use sparklyr to scale datasets across computing clusters running Apache Spark. For this particular release, we would like to highlight the following new features: Delta Lake enables database-like properties in Spark. Spark 3.0 preview is now available through sparklyr. Barrier Execution paves the way...

Read more »

RStudio, PBC

January 28, 2020
By
RStudio, PBC

We started the RStudio project because we were excited and inspired by R. The creators of R provided a flexible and powerful foundation for statistical computing; then made it free and open so that it could be improved collaboratively and its benefits could be shared by the widest possible audience. It’s better for everyone if the tools used for research...

Read more »

Shiny CRUD

January 28, 2020
By
Shiny CRUD

NOTE: This post assumes knowledge of R and Shiny and some familiarity with databases. If you are new to R and Shiny, there are great learning resources at https://shiny.rstudio.com/. If you are comfortable with R and Shiny, but the idea of persistent d...

Read more »

Data re-Shaping in R and in Python

January 28, 2020
By

Nina Zumel and I have a two new tutorials on fluid data wrangling/shaping. They are written in a parallel structure, with the R version of the tutorial being almost identical to the Python version of the tutorial. This reflects our opinion on the “which is better for data science R or Python?” They both are … Continue reading Data...

Read more »

Does Australia need More Fires (but the Right Kind)? A Multi-Agent Simulation

January 28, 2020
By
Does Australia need More Fires (but the Right Kind)? A Multi-Agent Simulation

We have all watched with great horror the catastrophic fires in Australia. Over many years scientists have been studying simulations to understand the underlying dynamics better. They tell us, that “what Australia needs is more fires, but of the right kind”. What do they mean by that? One such simulation of fire is based on … Continue reading "Does...

Read more »

Fisher’s exact test in R: independence test for a small sample

January 27, 2020
By

Introduction Hypotheses Example Data Observed frequencies Expected frequencies Fisher’s exact test in R Conclusion and interpretation References Introduction After presenting the Chi-square test of independence by hand and in R, this article focuses on the Fisher’s exact test. Independence tests are used to determine if there is a significant relationship between two categorical variables. There exists two different types of independence test: the Chi-square test (the most common) the Fisher’s exact test On...

Read more »

An Intuitive Look at Binomial Probability in a Bayesian Context

January 27, 2020
By
An Intuitive Look at Binomial Probability in a Bayesian Context

Binomial probability is the relatively simple case of estimating the proportion of successes in a series of yes/no trials. The perennial example is estimating the proportion of heads in a series of coin flips where each trial is independent and has possibility of heads or tails. Because of its relative simplicity, the binomial case is a great place to...

Read more »

Some everyday data tasks: a few hints with R (revisited)

One year ago, I published a post titled ‘Some everyday data tasks: a few hints with R’. In that post, I considered four data tasks, that we all need to accomplish daily, i.e. subsetting sorting casting melting In that post, I used the methods I was more familiar with. And, as a long-time R user, I have mainly incorporated in my workflow all the...

Read more »

Call BEAST2 for Bayesian evolutionary analysis from R

Call BEAST2 for Bayesian evolutionary analysis from R

babette 1 is a package to work with BEAST2 2, a software platform for Bayesian evolutionary analysis from R. babette is a spin-off of my own academic research. As a PhD I work on models of diversification: mathematical descriptions of how species form new species. Instead of working on a species’ individuals, I work on species as evolutionary lineages. A good way to show the...

Read more »

R as a tool for Systems Administration

January 27, 2020
By

When talking about languages to use in Production in data science, R is usually not part of the conversation and if it is, it's referenced as a secondary language. One of the main reasons this occurs is because R it’s commonly associated with being more suitable for statistical analysis and languages like Python and JavaScript, The post R as a...

Read more »

Simulating parametric survival model with parametric bootstrap to capture uncertainty

January 27, 2020
By
Simulating parametric survival model with parametric bootstrap to capture uncertainty

I recently released an R package on CRAN calledsurvParamSim for parametric survival simulation, and here want to describe a bit more on details & motivations behind developing this package. Parametric Survival Simulation with Parameter Uncertainty • survParamSim The purpose of survParamSim is to pac…

Read more »

On the relationship of the sample size and the correlation

January 27, 2020
By
On the relationship of the sample size and the correlation

This has bugged me for some time now. There is a “common knowledge” that the correlation size is dependent on the variability, i.e. higher the variability – higher the correlation. However, when you put this into practice, there seems to be a confusion on what this really means. To analyse this I have divided this … Continue reading On...

Read more »

Going to rstudio::conf? Meet Business Science, Accelerate Your Career

January 26, 2020
By
Going to rstudio::conf? Meet Business Science, Accelerate Your Career

I can’t tell you how excited I am to be a sponsor at rstudio::conf(2020) this year. This is my 2nd year attending, and my 1st time as a sponsor. It’s an amazing honor. And, my team and I are here for 1 Reason: To help you accelerate your career. Le...

Read more »

Survival Analysis – Fitting Weibull Models for Improving Device Reliability in R

January 26, 2020
By
Survival Analysis – Fitting Weibull Models for Improving Device Reliability in R

It’s time to get our hands dirty with some survival analysis! In this post, I’ll explore reliability modeling techniques that are applicable to Class III medical device testing. My goal is to expand on what I’ve been learning about GLM’s and get comfortable fitting data to Weibull distributions. I don’t have a ton of experience with Weibull analysis so...

Read more »

rstudio::conf 2020 Workshops

January 26, 2020
By
rstudio::conf 2020 Workshops

rstudio::conf 2020 got underway today with a huge training event featuring eighteen workshops taught by some of the most experienced and sought after instructors in the R Community. The workshops covered a wide range of topics including the Tidyverse, machine learning, deep learning, JavaScript, Shiny, R Markdown, package building, geospatial statistics, visualization, teaching R and working as an RStudio professional...

Read more »

Search R-bloggers

Sponsors