Articles by R on Redwall Analytics

Learning SQL and Exploring XBRL with secdatabase.com – Part 1

September 9, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "DBI",
    "reticulate",
    "keyring",
    "RAthena"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction In A Walk Though of Accessing Financial Statements with XBRL in R - Part 1, we showed how to use R to extract Apple financial statement data from the SEC Edgar website. This would be a cumbersome process to scale across sectors, but works well for a single company. ...
[Read more...]

Using drake for ETL and building Shiny app for 900k CT real estate sales

July 21, 2020 | R on Redwall Analytics

# R Libraries for this blogdown post
# See Github for libraries used in drake project
library(data.table)
library(DT)

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
Introduction The State of Connecticut requires each of its 169 municipalities to report real estate sales used in the assessment process. All reported transactions by towns are published on the Office of Policy and Management (OPM) website. In the past, annual databases were disclosed with differing storage formats each year (...
[Read more...]

Visualizing Big MT Cars with Python plotnine-Part 2

May 11, 2020 | R on Redwall Analytics

# R Libraries
library("reticulate")

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
# Choose Python 3.7 miniconda
reticulate::use_condaenv(
  condaenv = "r-reticulate",
  required = TRUE
  )
# Install Python packages
lapply(c("plotnine"), function(package) {
       conda_install("r-reticulate", package, pip = TRUE)
})
# Python libraries
from datatable import *
import numpy as np
import plotnine as p9 
import re
Introduction In this post, we start out where we left off in Exploring Big MT Cars with Python datatable and plotnine-Part 1. In the chunk below, we load our cleaned up big MT Cars data set in order to be able to refer directly to the variable ...
[Read more...]

Exploring Big MT Cars with Python datatable-Part 1

May 6, 2020 | R on Redwall Analytics

# R Libraries
library("reticulate")
library("skimr")

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
# Install Python packages
lapply(c("datatable", "pandas"), function(package) {
       conda_install("r-reticulate", package, pip = TRUE)
})
# Python libraries
from datatable import *
import numpy as np
import re
import pprint
Introduction As mentioned in our last series Parsing Mass Municipal PDF CAFRs with Tabulizer, pdftools and AWS Textract - Part 1 and A Walk Though of Accessing Financial Statements with XBRL in R - Part 1, this is a year of clean-up. Redwall Analytics is going through this year, ... [Read more...]

Evaluating Mass Muni CAFR Textract Results – Part 5

April 23, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "reticulate",
    "paws.machine.learning",
    "paws.common",
    "keyring",
    "pdftools",
    "listviewer",
    "readxl"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we showed how to use pdftools and tabulizer to subset a group of PDFs, the AWS paws SDK package to store the PDF in s3, and Textract machine learning to extract a block response object using its “asynchronous” process. ... [Read more...]

Evaluating Mass Muni CAFR Tabulizer Results – Part 3

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "rlist",
    "stringr",
    "pdftools",
    "readxl"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction This post is a continuation Tabulizer and pdftools Together as Super-powers - Part 2 where we showed how combining pdftools and tabulizer together could lead to better, more scaleable data extraction on a large number of slightly varying pdfs. Although the full process used to extract data from all ... [Read more...]

Scraping Failed Tabulizer PDFs with AWS Textract – Part 4

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "stringr",
    "rlist",
    "paws.machine.learning",
    "paws.storage",
    "paws.common",
    "tabulizer",
    "pdftools",
    "keyring",
    "listviewer"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we discovered that we were able to accurately extract ~95% of targeted data using tabulizer, but that might not have been good enough for some applications. In this post, we will show how to subset specific pages of PDFs using ... [Read more...]

Tabulizer and pdftools Together as Super-powers – Part 2

April 5, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "stringr",
    "rlist",
    "tabulizer",
    "pdftools",
    "parallel",
    "DT"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction This post will be a continuation of Parsing of Mass Municipal PDF CAFR’s with Tabulizer, pdftools and AWS Textract - Part 1 dealing with extracting data from PDFs using R. When Redwall discovered pdftools, and its pdf_data() function, which maps out every word on a pdf page ... [Read more...]

Parsing Mass Municipal PDF CAFRs with Tabulizer, pdftools and AWS Textract – Part 1

March 30, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "rlist",
    "stringr",
    "DT",
    "janitor",
    "readxl",
    "xlsx"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction Redwall Analytics had the pleasure of collaborating with Marc Joffe, of Reason Foundation, in its October 2018 post Replicating Yankee Institute Risk Score Over 15 Years for 150 Connecticut towns. This involved taking a well organized public dataset from the State’s website, and analyzing and building an application to view ... [Read more...]

Tracking R&D spending by 700 Listed US Pharma Companies – Part 2

February 17, 2020 | R on Redwall Analytics

# Re-load data previously stored for purposes of this blog post
pharma <- 
  fread("~/Desktop/David/Projects/xbrl_investment/data/pharma_inc.csv")
Introduction In A Walk Though of Accessing Financial Statements with XBRL in R - Part 1, we went through the first steps of pulling XBRL data for a single company from Edgar into R. Although an improvement over manual plugging of numbers into a Excel, there is still a way ...
[Read more...]

A Through the Cycle Geo-Spatial Analysis of CT Town Finances

February 10, 2019 | R on Redwall Analytics

Introduction In an earlier post, Reviewing Fairfield County Municipal Fiscal Indicators Since 2001, we used 17 years of individual Town Comprehensive Annual Financial Reports (CAFR) aggregated in Connecticut’s Municipal Fiscal Indicator’s to compare 15 Fairfield County towns. The challenge was that the graphs became crowded even with that small number of ... [Read more...]

Analysis of Connecticut Tax Load by Income Bracket

January 8, 2019 | R on Redwall Analytics

Introduction This brief study finds that Connecticut residents pay $62-63 billion annually in total taxes (including: Federal, State, Municipal Real Estate, Sales, FICA, Medicare) on adjusted gross income of $165-167 billion (an effective tax rate of 37-38%). Some taxes, such as FICA and Medicare, might be considered forms of savings ...
[Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)