Articles by R on Redwall Analytics

Exploring Stock Market Listing Mortality since 1986

August 28, 2021 | R on Redwall Analytics

Click to see R set-up code
# Libraries
if(!require("pacman")) {
  install.packages("pacman")
}
pacman::p_load(
  data.table,
  re2,
  scales,
  ggplot2,
  plotly, 
  DT,
  patchwork,
  survival,
  ggfortify,
  scales)

# Set knitr params
knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
NOTE: The read time for this post is overstated because of the formatting of the Plotly code. There are ~2,500 words, so read time should be ~10 minutes. Click to see R code generating plot
# Load function to plot dual y-axis plot
source("train_sec.R")

# Get data series from FRED
symbols <- c("CP", "GDP", "WASCUR")
start_date <- '1947-01-01'
end_date <- '2021-07-30'
quantmod::getSymbols(
  Symbols = symbols,
  src = "FRED",
  start_date = start_date,
  end_date = end_date
)
[1] "CP"     "GDP"    "WASCUR"
# Merge series and convert to dt
d <- as.data.table(merge(WASCUR/GDP, CP/GDP, join = "inner"))

# Build superimposed dual y-axis line plot
sec <- with(d, train_sec(CP, WASCUR))
p <- 
  ggplot(d, aes(index)) +
    geom_line(aes(y = CP),
              colour = "blue", 
              size = 1) +
    geom_line(aes(y = sec$fwd(WASCUR)),
              colour = "red", 
              size = 1) +
    scale_y_continuous(
      "Corporate Profits to GDP",
      labels = scales::percent,
      sec.axis = sec_axis(
        ~ sec$rev(.),
        name = "Compensation of Employees to GDP",
        labels = scales::percent)
    ) +
    scale_x_date(date_breaks = "10 years",
                 date_labels = "%Y") + 
    labs(title = "Labor vs Capital",
         x = "Year",
         caption = "Source: Lots of places") +
    theme_bw(base_size = 22)
Introduction The rise in monopoly power particularly ...
[Read more...]

When Yahoo Finance doesn’t have de-listed tickers needed

August 18, 2021 | R on Redwall Analytics

Click to see R set-up code
# Libraries
if(!require("pacman")) {
  install.packages("pacman")
}
pacman::p_load(
  data.table
  )

# Set knitr params
knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction As we discussed in our last post Introducing the Redwall ‘Red Flag’ Explorer with New Constructs Data, we were able to test the response of 125,000 quarterly and annual financial statements to incidence of “red flag” ratios, but some of the most interesting ... [Read more...]

Introducing the Redwall ‘Red Flag’ Explorer with New Constructs Data

August 8, 2021 | R on Redwall Analytics

Click to see R set-up code
# Libraries
if(!require("pacman")) {
  install.packages("pacman")
}
pacman::p_load(
  data.table,
  scales,
  ggplot2,
  plotly, 
  DT)

# Set knitr params
knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)

# Load annual data only
path <- 
  "~/Desktop/David/Projects/new_constructs_targets/_targets/objects/"
red_flags <- 
  readRDS(paste0(path, "nc_annual_red_flags"))
annual_data <- 
  readRDS(paste0(path, "nc_annual_final"))
Key Findings 1999-2000 was an exceptional period for both “Red Flag” prevalence and return differentiation, though apparent benefits of the strategy appear in most periods. Approximately 2.0% of filings we checked had 5 or more “Red Flags” among annual and quarterly filings, so sparsity is ...
[Read more...]

Tapping Yelp data with Apache Drill from Mac using {sergeant}

October 26, 2020 | R on Redwall Analytics

Click to see package details
# Libraries
packages <- 
  c("tidyverse",
    "sergeant",
    "tictoc"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction At Redwall, we have been in nonstop exploration of new data sets over the last couple of years. As our data grows and the targets of interest get bigger, we have been finding the old method of loading csv’s from disc, and ...
[Read more...]

Building a career changer resume with R {vitae} package

October 6, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("vitae",
    "tibble",
    "spelling"
    )

if (length(setdiff(packages, rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction This will be a post about building a resume (curriculum vitae) with the R {vitae} package, by a professional who somehow managed to spend 25 years without one. I am also making one of the more unusual career transitions, moving from investment research sales to look for interesting challenges ... [Read more...]

Exploring 30 years of local CT weather history with R

September 21, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "ggplot2",
    "stringr",
    "skimr",
    "janitor",
    "glue"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%',
  cache = TRUE
)
EPA AirData Air Quality Monitors Introduction As our journey with open source software continues, there is a growing list of things we have tried, but were unable to or took too long to figure out, so moved on. Sometimes its a blog or twitter post, others a new package ...
[Read more...]

Learning SQL and Exploring XBRL with secdatabase.com – Part 1

September 9, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "DBI",
    "reticulate",
    "keyring",
    "RAthena"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(
  comment = NA,
  fig.width = 12,
  fig.height = 8,
  out.width = '100%'
)
Introduction In A Walk Though of Accessing Financial Statements with XBRL in R - Part 1, we showed how to use R to extract Apple financial statement data from the SEC Edgar website. This would be a cumbersome process to scale across sectors, but works well for a single company. ...
[Read more...]

Using drake for ETL and building Shiny app for 900k CT real estate sales

July 21, 2020 | R on Redwall Analytics

# R Libraries for this blogdown post
# See Github for libraries used in drake project
library(data.table)
library(DT)

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
Introduction The State of Connecticut requires each of its 169 municipalities to report real estate sales used in the assessment process. All reported transactions by towns are published on the Office of Policy and Management (OPM) website. In the past, annual databases were disclosed with differing storage formats each year (...
[Read more...]

Visualizing Big MT Cars with Python plotnine-Part 2

May 11, 2020 | R on Redwall Analytics

# R Libraries
library("reticulate")

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
# Choose Python 3.7 miniconda
reticulate::use_condaenv(
  condaenv = "r-reticulate",
  required = TRUE
  )
# Install Python packages
lapply(c("plotnine"), function(package) {
       conda_install("r-reticulate", package, pip = TRUE)
})
# Python libraries
from datatable import *
import numpy as np
import plotnine as p9 
import re
Introduction In this post, we start out where we left off in Exploring Big MT Cars with Python datatable and plotnine-Part 1. In the chunk below, we load our cleaned up big MT Cars data set in order to be able to refer directly to the variable ...
[Read more...]

Exploring Big MT Cars with Python datatable-Part 1

May 6, 2020 | R on Redwall Analytics

# R Libraries
library("reticulate")
library("skimr")

knitr::opts_chunk$set(
  fig.width = 15,
  fig.height = 8,
  out.width = '100%')
# Install Python packages
lapply(c("datatable", "pandas"), function(package) {
       conda_install("r-reticulate", package, pip = TRUE)
})
# Python libraries
from datatable import *
import numpy as np
import re
import pprint
Introduction As mentioned in our last series Parsing Mass Municipal PDF CAFRs with Tabulizer, pdftools and AWS Textract - Part 1 and A Walk Though of Accessing Financial Statements with XBRL in R - Part 1, this is a year of clean-up. Redwall Analytics is going through this year, ... [Read more...]

Evaluating Mass Muni CAFR Textract Results – Part 5

April 23, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "reticulate",
    "paws.machine.learning",
    "paws.common",
    "keyring",
    "pdftools",
    "listviewer",
    "readxl"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we showed how to use pdftools and tabulizer to subset a group of PDFs, the AWS paws SDK package to store the PDF in s3, and Textract machine learning to extract a block response object using its “asynchronous” process. ... [Read more...]

Evaluating Mass Muni CAFR Tabulizer Results – Part 3

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "rlist",
    "stringr",
    "pdftools",
    "readxl"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction This post is a continuation Tabulizer and pdftools Together as Super-powers - Part 2 where we showed how combining pdftools and tabulizer together could lead to better, more scaleable data extraction on a large number of slightly varying pdfs. Although the full process used to extract data from all ... [Read more...]

Scraping Failed Tabulizer PDFs with AWS Textract – Part 4

April 13, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "stringr",
    "rlist",
    "paws.machine.learning",
    "paws.storage",
    "paws.common",
    "tabulizer",
    "pdftools",
    "keyring",
    "listviewer"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction In Evaluating Mass Muni CAFR Tabulizer Results - Part 3, we discovered that we were able to accurately extract ~95% of targeted data using tabulizer, but that might not have been good enough for some applications. In this post, we will show how to subset specific pages of PDFs using ... [Read more...]

Tabulizer and pdftools Together as Super-powers – Part 2

April 5, 2020 | R on Redwall Analytics

# Libraries
packages <- 
  c("data.table",
    "stringr",
    "rlist",
    "tabulizer",
    "pdftools",
    "parallel",
    "DT"
    )

if (length(setdiff(packages,rownames(installed.packages()))) > 0) {
  install.packages(setdiff(packages, rownames(installed.packages())))  
}

invisible(lapply(packages, library, character.only = TRUE))

knitr::opts_chunk$set(comment=NA, fig.width=12, fig.height=8, out.width = '100%')
Introduction This post will be a continuation of Parsing of Mass Municipal PDF CAFR’s with Tabulizer, pdftools and AWS Textract - Part 1 dealing with extracting data from PDFs using R. When Redwall discovered pdftools, and its pdf_data() function, which maps out every word on a pdf page ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)