R for trial and model-based cost-effectiveness analysis

November 14, 2019
By

29-30 June 2020, University College London Training event (29 June): Torrington (1-19) 113 - Public Cluster, 1-19 in Torrington Place (https://goo.gl/maps/RtR3Ypug2Dq), University College London, United Kingdom Main workshop (30 June): Room G12, 1-19 in Torrington Place (https://goo.gl/maps/RtR3Ypug2Dq), University College London, United Kingdom. Background and objectives It is our pleasure to announce a workshop and training event...

Read more »

The hidden diagnostic plots for the lm object

November 14, 2019
By
The hidden diagnostic plots for the lm object

When plotting an lm object in R, one typically sees a 2 by 2 panel of diagnostic plots, much like the one below: This link has an excellent explanation of each of these 4 plots, and I highly recommend giving … Continue reading →

Read more »

Gold-Mining Week 11 (2019)

November 14, 2019
By

Week 11 Gold Mining and Fantasy Football Projection Roundup now available. The post Gold-Mining Week 11 (2019) appeared first on Fantasy Football Analytics.

Read more »

IPO Exploration Part Two

November 13, 2019
By

In a previous post, we explored IPOs and IPO returns by sector and year since 2004. Today, let’s investigate how portfolios formed with those IPOs have performed. We will need to grab the price histories of the tickers, then form portfolios, then calculate their performance, and then rank those performances in some way. Since there are several hundred IPOs for...

Read more »

workloopR: Analysis of work loops and other data from muscle physiology experiments in R

workloopR: Analysis of work loops and other data from muscle physiology experiments in R

Studies of muscle physiology often rely on closed-source, proprietary software for not only recording data but also for data wrangling and analyses. Although specialized software might be necessary to record data from highly-specialized equipment, data wrangling and analyses should be free from this constraint. It’s becoming more common for researchers to provide code along with published papers (but usually...

Read more »

Machine Learning in R: Start with an End-to-End Test

November 13, 2019
By
Machine Learning in R: Start with an End-to-End Test

As a data scientist, you will likely be asked one day to automate your analysis and port your models to production environments. When that happens you cross the blurry line between data science and software engineering, and become a machine learning engineer. I’d like to share a few tips on how to make that transition

Read more »

Durban EDGE DataQuest

November 12, 2019
By
Durban EDGE DataQuest

The Durban EDGE (Economic Development and Growth in eThekwini) DataQuest was held at UKZN (Westville Campus) on 13 November 2019. Participants were tasked with creating something interesting and useful with the civic data on the new Durban EDGE Open Data Portal developed by Open Data Durban. These datasets were available: EThekwini Water and Sanitation Durban Skills Audit 2016 EThekwini Financial Statistics Survey EThekwini Rate...

Read more »

The Colour of Everything

November 12, 2019
By
The Colour of Everything

I’m happy to announce that farver 2.0 has landed on CRAN. This is a big release comprising of a rewrite of much of the internals along with a range of new functions and improvements. Read on to find out what this is all about. The case for farver The first version of farver really came out of necessity as I identified a major performance...

Read more »

Automating update of a fiscal database for the Euro Area

November 12, 2019
By
Automating update of a fiscal database for the Euro Area

Our purpose is to write a program to automatically update a quarterly fiscal database for the Euro Area. The main difficulty of this exercise is to build long series that go as far as the 1980’s. We use two sources to build the database: the historical database developed in Paredes et al. (2014), which stops in 2013, and the latest Eurostat...

Read more »

When Cross-Validation is More Powerful than Regularization

November 12, 2019
By
When Cross-Validation is More Powerful than Regularization

Regularization is a way of avoiding overfit by restricting the magnitude of model coefficients (or in deep learning, node weights). A simple example of regularization is the use of ridge or lasso regression to fit linear models in the presence of collinear variables or (quasi-)separation. The intuition is that smaller coefficients are less sensitive to … Continue reading When...

Read more »

Logistic Regression in R: A Classification Technique to Predict Credit Card Default

November 12, 2019
By
Logistic Regression in R: A Classification Technique to Predict Credit Card Default

Logistic Regression is one of the most popular classification techniques. In this sneak peek from Data Science Dojo's bootcamp, you'll learn about this popular algorithm and go through a real-world problem to practice.

Read more »

AzureR updates: AzureStor, AzureVM, AzureGraph, AzureContainers

November 12, 2019
By

Some major updates to AzureR packages this week! As well as last week's AzureRMR update, there are changes to AzureStor, AzureVM, AzureGraph and AzureContainers. All of these are live on CRAN. AzureStor 3.0.0 There are substantial enhancements to multiple-file transfers (up and down). You can supply a vector of pathnames to storage_upload/download as the source and destination arguments. Alternatively...

Read more »

Azure AI and Machine Learning talk series

November 12, 2019
By
Azure AI and Machine Learning talk series

At last week's Microsoft Ignite conference in Orlando, our team delivered a series of 6 talks about AI and machine learning applications with Azure. The videos from each talk are linked below, and you can watch every talk from the conference online (no registration necessary). Each of our talks also comes with a companion Github repository, where you can...

Read more »

My AP Statistics Class First R Programming Assignment Using RStudio

November 12, 2019
By
My AP Statistics Class First R Programming Assignment Using RStudio

My AP Stats class has started their first R programming assignment this week. I gave them the code for them to type in and play with. This will give them some experience with RStudio and basic function commands. I have a total of six assignments for them to complete over the next few months. All

Read more »

RcppAnnoy 0.0.14

November 12, 2019
By
RcppAnnoy 0.0.14

A new minor release of RcppAnnoy is now on CRAN, following the previous 0.0.13 release in September. RcppAnnoy is the Rcpp-based R integration of the nifty Annoy library by Erik Bernhardsson. Annoy is a small and lightweight C++ template header libr...

Read more »

dplyr and Oracle database with odbc on windows

November 12, 2019
By
dplyr and Oracle database with odbc on windows

RStudio makes Oracle accessibility from R easier via odbc and connections Pane1. Personally, I find it’s not so easy. As it finally works for me, I will detail some snippets here. After tens of try it seems good to share some tricks2. This blog post is also a notepad for me. Oracle and R configuration is a step where we potentially waste...

Read more »

Teach R to see by Borrowing a Brain

November 12, 2019
By
Teach R to see by Borrowing a Brain

It has been an old dream to teach a computer to see, i.e. to hold something in front of a camera and let the computer tell you what it sees. For decades it has been exactly that: a dream – because we as human beings are able to see, we just don’t know how we … Continue reading "Teach...

Read more »

An API for @racently

November 11, 2019
By
An API for @racently

@racently is a side project that I have been nursing along for a couple of years. It addresses a problem that I have as a runner: my race results are distributed across a variety of web sites. This makes it difficult to create a single view on my running performance (or lack thereof) over time. I suspect that I...

Read more »

Trying the ckanr Package

November 11, 2019
By
Trying the ckanr Package

How resources are grouped in CKAN Initialising ckanr and exploring groups of resources Connect to CKAN with dplyr and download from one resource Downloading all resources from a dataset In previous blog posts (Hacking dbplyr for CKAN, Getting Open Data into R from CKAN) I have been exploring how to download data from the NHS Scotland open data platform into R. I’ve recently...

Read more »

Trying the ckanr Package

November 11, 2019
By
Trying the ckanr Package

How resources are grouped in CKAN Initialising ckanr and exploring groups of resources Connect to CKAN with dplyr and download from one resource Downloading all resources from a dataset In previous blog posts (Hacking dbplyr for CKAN, Getting Open Data into R from CKAN) I have been exploring how to download data from the NHS Scotland open data platform into R. I’ve recently...

Read more »

Community Call – Last Night, Testing Saved my Life

Community Call – Last Night, Testing Saved my Life

To the uninitiated, software testing may seem variously boring, daunting or bogged down in obscure terminology. However, it has the potential to be enormously useful for people developing software at any level of expertise, and can often be put into practice with relatively little effort. Our 1-hour Call will include two speakers and at least 20 minutes for Q &...

Read more »

What can we really expect to learn from a pilot study?

November 11, 2019
By
What can we really expect to learn from a pilot study?

I am involved with a very interesting project - the NIA IMPACT Collaboratory - where a primary goal is to fund a large group of pragmatic pilot studies to investigate promising interventions to improve health care and quality of life for people living with Alzheimer’s disease and related dementias. One of my roles on the project team is to...

Read more »

Using R and H2O Isolation Forest For Data Quality

November 11, 2019
By
Using R and H2O Isolation Forest For Data Quality

Introduction: We will identify anomalous patterns in data, this process is useful, not only to find inconsistencies and errors but also to find abnormal data behavior, being useful even to find cyber attacks on organizations. On this article there is more information as reference: Data Quality and Anomaly Detection Thoughts For Web Analytics Before starting we need the next software installed and working: - R...

Read more »

Free Training: Mastering Data Structures in R

November 11, 2019
By

Next week I will be delivering a free online R training. This is a new course I've created called Mastering Data Structures in R. This course is for you if:You are new to R, and want a rigorous introduction to R as a programming languageYou know how to analyze data in R, but want to The post Free Training:...

Read more »

Scraping Machinery Parts

November 10, 2019
By
Scraping Machinery Parts

I’ve been exploring the feasibility of aggregating data on prices of replacement parts for heavy machinery. There are a number of websites which list this sort of data. I’m focusing on the static sites for the moment. I’m using are R with {rvest} (and a few other Tidyverse packages thrown in for good measure). library(glue) library(dplyr) library(purrr) library(stringr) library(rvest) The data are paginated. Fortunately the URL...

Read more »

Geocoding with Tidygeocoder

November 10, 2019
By
Geocoding with Tidygeocoder

Tidygeocoder is a newly published R package which provides a tidyverse-style interface for geocoding. It returns latitude and longitude coordinates in tibble format from addresses using the US Census or Nominatim (OSM) geocoder services. In this post I will demonstrate how to use it for plotting a few Washington, DC landmarks...

Read more »

Statistical uncertainty with R and pdqr

November 10, 2019
By
Statistical uncertainty with R and pdqr

CRAN has accepted my 'pdqr' package. Here are important examples of how it can be used to describe and evaluate statistical uncertainty. Prologue I am glad to announce that my latest, long written R package ‘pdqr’ is accepted to CRAN. It provides tools for creating, transforming and summarizing custom random variables...

Read more »

A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

November 10, 2019
By
A comparison of methods for predicting clothing classes using the Fashion MNIST dataset in RStudio and Python (Part 1)

Florianne Verkroost is a PhD candidate at Nuffield College at the University of Oxford. With a passion for data science and a background in mathematics and econometrics. She applies her interdisciplinary knowledge to computationally address societal problems of inequality. In this series of blog posts, I will compare different machine and deep learning methods to predict clothing categories from images...

Read more »

Cleaning the Table

November 10, 2019
By

While I’m talking about getting data into R this weekend, here’s another quick example that came up in class this week. The mortality data in the previous example were nice and clean coming in the door. That’s usually not the case. Data can be and usually is messy in all kinds of ways. One of the most common, particularly...

Read more »

Search R-bloggers

Sponsors