Predicting large and imbalanced data set using the R package tidymodels

April 13, 2020 | 0 Comments

Introduction Data exploration Summary of the variables Missing values imbalanced data building the recipe Building the workflow random forest model model training model evaluation Model tuning: logistic regression model Session information Introduction The super easy way, at least for me, to deploy machine learning models is by making use of ... [Read more...]

Scraping Failed Tabulizer PDFs with AWS Textract – Part 4

April 13, 2020 | 0 Comments

# Libraries packages 0) { install.packages(setdiff(packages, rownames(installed.packages()))) } invisible(lapply(packages, library, character.only = TRUE)) knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%') Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we discovered that we were able to accurately extract ~95% of ... [Read more...]

inSilecoMisc 0.4.0 (part 1/2)

April 13, 2020 | 0 Comments

inSilecoMisc inSilecoMisc is an R 📦 I have been maintaining for four years now. It was originally designed as a convenient way to share handy functions. Instead of stacking them in my .Rprofile, I created a package and made it available on GitHub. inSilecoMisc is therefore a set of miscellaneous functions, ... [Read more...]

SLOPE 0.2.0

April 13, 2020 | 0 Comments

Introduction to SLOPE SLOPE (Bogdan et al. 2015) stands for sorted L1 penalized estimation and is a generalization of OSCAR (Bondell and Reich 2008). As the name suggests, SLOPE is a type of \(\ell_1\)-regularization. More specifically, SLOPE fits generalized linear models regularized with the sorted \(\ell_1\) norm. The objective in SLOPE ...
[Read more...]

PCA and the #TidyTuesday best hip hop songs ever

April 13, 2020 | 0 Comments

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’...
[Read more...]

rOpenSci Dev Guide 0.4.0: Updates

April 13, 2020 | 0 Comments

rOpenSci Software Peer Review’s guidance has been compiled in an online book for more than one year now. We’ve just released its fourth version. To find out what’s new in our dev guide 0.4.0, you can read the changelog, or this blog post for more digested information. Note ... [Read more...]

Dr. Julia Silge InteRview

April 13, 2020 | 0 Comments

Today I interviewed Dr. Julia Silge, the creator of janeaustenr::, tidytext::, qualtRics::, and author of Text Mining with R. I’m still recovering from a hand surgery, and this time the interview was done by using a voice-to-text app and email. ... [Read more...]

Where does the output of Rscript go?

April 13, 2020 | 0 Comments

We often run R interactively, through Rstudio or in the terminal. But you can also run Rscripts without manual intervention. Using Rscript. But where does the output go? Warning: This post is very linux/unix (macos) centred, I don’t know how this works in Windows. Also I’m using ... [Read more...]

Yes, unbalanced randomization can improve power, in some situations

April 13, 2020 | 0 Comments

Last time I provided some simulations that suggested that there might not be any efficiency-related benefits to using unbalanced randomization when the outcome is binary. This is a quick follow-up to provide a counter-example where the outcome in a two-group comparison is continuous. If the groups have different amounts of ...
[Read more...]

Multilevel Correlations: A New Method for Common Problems

April 13, 2020 | 0 Comments

In this tutorial, we will introduce multilevel correlations (or hierarchical / random-effects correlations) and how to compute them using the new correlations package from the easystats suite. You can install the updated version and load the package as follows: install.packages("correlation") library(correlation) Data Imagine we have an experiment in ...
[Read more...]

wrapped Normal distribution

April 13, 2020 | 0 Comments

One version of the wrapped Normal distribution on (0,1) is expressed as a sum of Normal distributions with means shifted by all relative integers which, while a parameterised density, has imho no particular statistical appeal over the use of other series. It was nonetheless the centre of a series of questions ...
[Read more...]

The ‘spam comments’ puzzle: tidy simulation of stochastic processes in R

April 13, 2020 | 0 Comments

Previously in this series: The “lost boarding pass” puzzle The “deadly board game” puzzle The “knight on an infinite chessboard” puzzle The “largest stock profit or loss” puzzle The “birthday paradox” puzzle The “Spelling Bee honeycomb” puzzle Feller’s “coin-tossing” puzzle I love 538’s Riddler column, and the April 10 puzzle ... [Read more...]

Tutorial: Web Scraping in R with rvest

April 13, 2020 | 0 Comments

Learn how to do web scraping in R by using the rvest package to scrape data about the weather in this free R web scraping tutorial. The post Tutorial: Web Scraping in R with rvest appeared first on Dataquest.
[Read more...]

Biterm topic modelling for short texts

April 13, 2020 | 0 Comments

A few weeks ago, we published an update of the BTM (Biterm Topic Models for text) package on CRAN. Biterm Topic Models are especially usefull if you want to find topics in collections of short texts. Short texts are typically a twitter message, a short answer on a survey, the ... [Read more...]

Hosting a Virtual useR Meetup

April 13, 2020 | 0 Comments

By Rachael Dempsey, Senior Enterprise Advocate at RStudio / Greater Boston useR Organizer Last month, the Boston useR Group held our very first virtual meetup and opened this up to... The post Hosting a Virtual useR Meetup appeared first on R Consortium.
[Read more...]

K is for Keep or Drop Variables

April 13, 2020 | 0 Comments

A few times in this series, I've wanted to display part of a dataset, such as key variables, like Title, Rating, and Pages. The tidyverse allows you to easily keep or drop variables, either temporarily or permanently, with the select function. For inst...
[Read more...]

psychonetrics 0.7, meta-analysis preprint and online SEM course

April 13, 2020 | 0 Comments

Version 0.7 of the psychonetrics package is now on CRAN! This version is a major restructure of the package leading to a lot of new functionality as well as much faster computations. In addition, a new pre-print is now online describing meta-analysis procedures now implemented in psychonetrics. Free course on Structural ... [Read more...]

pins 0.4: Versioning

April 12, 2020 | 0 Comments

A new release of pins is available on CRAN today. This release adds support to time travel across dataset versions, which improves collaboration and protects your code from breaking when remote resources change unexpectedly. [Read more...]

Build a static website with R Shiny

April 12, 2020 | 0 Comments

Sounds stupid? Yes, it’s kind of throwing away 99% of Shiny’s power; and you can always build a static website with R markdown, blogdown, or bookdown. Anyway, please keep reading as it will save you time if you are an R users who want to make a portfolio website ... [Read more...]
1 43 44 45 46 47 1,680