April 2020

Evaluating Mass Muni CAFR Tabulizer Results – Part 3

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "rlist",
    "stringr",
    "pdftools",
    "readxl"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction This post is a continuation Tabulizer and pdftools Together as Super-powers - Part 2 where we showed how combining pdftools and tabulizer together could lead to better, more scaleable data extraction on a large number of slightly varying pdfs. Although the full process used to extract data from all ... [Read more...]

Scraping Failed Tabulizer PDFs with AWS Textract – Part 4

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "stringr",
    "rlist",
    "paws.machine.learning",
    "paws.storage",
    "paws.common",
    "tabulizer",
    "pdftools",
    "keyring",
    "listviewer"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we discovered that we were able to accurately extract ~95% of targeted data using tabulizer, but that might not have been good enough for some applications. In this post, we will show how to subset specific pages of PDFs using ... [Read more...]

inSilecoMisc 0.4.0 (part 1/2)

April 13, 2020 | R-bloggers on inSileco

inSilecoMisc inSilecoMisc is an R ???? I have been maintaining for four years now. It was originally designed as a convenient way to share handy functions. Instead of stacking them in my .Rprofile, I created a package and made it available on GitHub. inSilecoMisc is therefore a set of miscellaneous functions, ... [Read more...]

SLOPE 0.2.0

April 13, 2020 | R on Johan Larsson

Introduction to SLOPE SLOPE (Bogdan et al. 2015) stands for sorted L1 penalized estimation and is a generalization of OSCAR (Bondell and Reich 2008). As the name suggests, SLOPE is a type of \(\ell_1\)-regularization. More specifically, SLOPE fits generalized linear models regularized with the sorted \(\ell_1\) norm. The objective in SLOPE ...
[Read more...]

PCA and the #TidyTuesday best hip hop songs ever

April 13, 2020 | Rstats on Julia Silge

Lately I’ve been publishing screencasts demonstrating how to use the tidymodels framework, from first steps in modeling to how to tune more complex models. Today, I’m exploring a different part of the tidymodels framework; I’m showing how to implement principal component analysis via recipes with this week’...
[Read more...]

Dr. Julia Silge InteRview

April 13, 2020 | Pachá

Today I interviewed Dr. Julia Silge, the creator of janeaustenr::, tidytext::, qualtRics::, and author of Text Mining with R. I’m still recovering from a hand surgery, and this time the interview was done by using a voice-to-text app and email. ... [Read more...]

Where does the output of Rscript go?

April 13, 2020 | Roel M. Hogervorst

We often run R interactively, through Rstudio or in the terminal. But you can also run Rscripts without manual intervention. Using Rscript. But where does the output go? Warning: This post is very linux/unix (macos) centred, I don’t know how this works in Windows. Also I’m using ... [Read more...]

Multilevel Correlations: A New Method for Common Problems

April 13, 2020 | R on easystats

In this tutorial, we will introduce multilevel correlations (or hierarchical / random-effects correlations) and how to compute them using the new correlations package from the easystats suite. You can install the updated version and load the package as follows:
install.packages("correlation")
library(correlation)
Data Imagine we have an experiment in which 10 individuals completed a ...
[Read more...]

wrapped Normal distribution

April 13, 2020 | xi'an

One version of the wrapped Normal distribution on (0,1) is expressed as a sum of Normal distributions with means shifted by all relative integers which, while a parameterised density, has imho no particular statistical appeal over the use of other series. It was nonetheless the centre of a series of questions ...
[Read more...]

Biterm topic modelling for short texts

April 13, 2020 | Super User

A few weeks ago, we published an update of the BTM (Biterm Topic Models for text) package on CRAN. Biterm Topic Models are especially usefull if you want to find topics in collections of short texts. Short texts are typically a twitter message, a short answer on a survey, the ... [Read more...]

Hosting a Virtual useR Meetup

April 13, 2020 | R Consortium

By Rachael Dempsey, Senior Enterprise Advocate at RStudio / Greater Boston useR Organizer Last month, the Boston useR Group held our very first virtual meetup and opened this up to... The post Hosting a Virtual useR Meetup appeared first on R Consortium.
[Read more...]

K is for Keep or Drop Variables

April 13, 2020 | Unknown

A few times in this series, I've wanted to display part of a dataset, such as key variables, like Title, Rating, and Pages. The tidyverse allows you to easily keep or drop variables, either temporarily or permanently, with the select function. For inst...
[Read more...]
1 9 10 11 12 13 17

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)