A Single Parameter Family Characterizing Probability Model Performance

October 29, 2020 | jmount

Introduction We’ve been writing on the distribution density shapes expected for probability models in ROC (receiver operator characteristic) plots, double density plots, and normal/logit-normal densities frameworks. I thought I would re-approach the issue with a specific family of examples. Definitions Let’s define a “probability model” as a ...
[Read more...]

inverse Gaussian trick [or treat?]

October 28, 2020 | xi'an

When preparing my mid-term exam for my undergrad mathematical statistics course, I wanted to use the inverse Gaussian distribution IG(μ,λ) as an example of exponential family and include a random generator question. As shown above by a Fortran computer code from Michael, Schucany and Haas, a simple version can be ...
[Read more...]

The Double Density Plot Contains a Lot of Useful Information

October 27, 2020 | jmount

The double density plot contains a lot of useful information. This is a plot that shows the distribution of a continuous model score, conditioned on the binary categorical outcome to be predicted. As with most density plots: the y-axis is an abstract quantity called density picked such that the area […]
[Read more...]

A CRAN Downloads Experiment

October 27, 2020 | Welcome on Swimming + Data Science

I’ve done an experiment regarding package downloads from CRAN (or the RStudio CRAN mirror at least) and now it’s time to share the results. When the first version (0.0.1.0) of SwimmeR was released on CRAN in October of 2019 it had very few features - just a couple functions for ...
[Read more...]

A CRAN Downloads Experiment

October 27, 2020 | Swimming + Data Science

I’ve done an experiment regarding package downloads from CRAN (or the RStudio CRAN mirror at least) and now it’s time to share the results.
library(dplyr)
library(ggplot2)
library(purrr)
library(dlstats)
library(flextable)

flextable_style <- function(x) {
  x %>%
    flextable() %>%
    bold(part = "header") %>% # bold header
    bg(bg = "#D3D3D3", part = "header") %>% # puts gray background behind the header row
    align_nottext_col(align = "center", header = TRUE, footer = TRUE) %>% # center alignment
    autofit()
}
Introduction When the first version (0.0.1.0) of SwimmeR was released on CRAN in October of 2019 it had very few features - just a couple ...
[Read more...]

artificial EM

October 27, 2020 | xi'an

When addressing an X validated question on the use of the EM algorithm when estimating a Normal mean, my first comment was that it was inappropriate since there is no missing data structure to anchor by (right preposition?). However I then reflected upon the infinite number of ways to demarginalise ... [Read more...]

Tapping Yelp data with Apache Drill from Mac using {sergeant}

October 26, 2020 | R on Redwall Analytics

Click to see package details
# Libraries
packages <- 
  c("tidyverse",
    "sergeant",
    "tictoc"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction At Redwall, we have been in nonstop exploration of new data sets over the last couple of years. As our data grows and the targets of interest get bigger, we have been finding the old method of loading csv’s from disc, and ...
[Read more...]
1 43 44 45 46 47 1,778