New package: GetEdgarData

[This article was first published on R on msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Every company traded in the US stock market must report its quarterly and yearly documents to the SEC and the public in general. This includes its accounting statements (10-K, 10-K) and any other corporate event that is relevant to investors.

Edgar is the interface where we can search for a company’s filling information. By looking up a company’s CIK code, one can find all previous filling information. A complete list of available forms can be found in this link.

Package GetEdgarData allows the user import the financial documents from such fillings directly into R. Unlike other packages, the information is not taken from the filling’s xml files, but the structured datasets at the DERA (Division of Economic and Risk Analysis) section . This means we can import a large amount of structured financial data very quickly. The downside is that the available data starts at 2009.

Like many other packages I’ve wrote for data grabbing, the queries are saved locally using package memoise. This means that the second time you ask for a particular year of data, the function will load a local copy, and will not download the data again.

Both new packages, GetEdgarData and GetQuandlData (blog post) are going to be part of the second edition of my book “Analyzing Financial Data with R” (see first edition here). My expectation is to publish the new book in early 2020.

Installation

# not in CRAN yet (need to test it further)
#install.packages('GetEdgarData')

# from github
devtools::install_github('msperlin/GetEdgarData')

Example 01 – Apples Quarterly Net Profit

The first step in using GetEdgarData is finding information about available companies:

library(GetEdgarData)
library(tidyverse)

my_year <- 2018
type_form <- '10-K'

df_info <- get_info_companies(years = my_year, 
                              type_data = 'yearly', 
                              type_form = type_form)

glimpse(df_info)
## Observations: 450
## Variables: 13
## $ current_name     <chr> "AIR PRODUCTS & CHEMICALS INC /DE/", "ALICO INC…
## $ former_name      <chr> NA, "ALICO LAND DEVELOPMENT CO", "SKYWORKS SOLU…
## $ change_date_name <dbl> NA, 19740219, 20020627, NA, 19730319, 19920703,…
## $ sic_code         <dbl> 2810, 100, 3674, 3674, 3674, 3590, 1311, 3841, …
## $ country          <chr> "US", "US", "US", "US", "US", "US", "US", "US",…
## $ state            <chr> "PA", "FL", "MA", "MA", "CA", "WI", "HI", "NJ",…
## $ city             <chr> "ALLENTOWN", "FT. MYERS,", "WOBURN", "NORWOOD",…
## $ cik              <dbl> 2969, 3545, 4127, 6281, 6951, 6955, 10048, 1079…
## $ id_file          <chr> "0000002969-18-000044", "0000003545-18-000108",…
## $ form             <chr> "10-K", "10-K", "10-K", "10-K", "10-K", "10-K",…
## $ year             <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,…
## $ quarter          <chr> "FY", "FY", "FY", "FY", "FY", "Q4", "FY", "FY",…
## $ sic_desc         <chr> "INDUSTRIAL INORGANIC CHEMICALS", "AGRICULTURAL…

We find information about 450 companies for the 10-K documents in the year of 2018. Digging deeper we find that the official name of Apple is ‘APPLE INC’. Let’s use it to download the financial information since 2009.

my_company <- 'APPLE INC'
my_years <- 2009:2018
type_data <- 'quarterly'

df_fin_reports <- get_edgar_fin_data(companies = my_company,
                                     years = my_years,
                                     type_data = type_data)

glimpse(df_fin_reports)
## Observations: 752
## Variables: 17
## $ current_name     <chr> "APPLE INC", "APPLE INC", "APPLE INC", "APPLE I…
## $ former_name      <chr> "APPLE COMPUTER INC", "APPLE COMPUTER INC", "AP…
## $ change_date_name <dbl> 19970808, 19970808, 19970808, 19970808, 1997080…
## $ sic_code         <dbl> 3571, 3571, 3571, 3571, 3571, 3571, 3571, 3571,…
## $ cik              <dbl> 320193, 320193, 320193, 320193, 320193, 320193,…
## $ id_file          <chr> "0000320193-18-000145", "0000320193-18-000145",…
## $ form             <chr> "10-K", "10-K", "10-K", "10-K", "10-K", "10-K",…
## $ year             <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,…
## $ quarter          <chr> "FY", "FY", "FY", "FY", "FY", "FY", "FY", "FY",…
## $ tag              <chr> "NetIncomeLoss", "NetIncomeLoss", "NetIncomeLos…
## $ version          <chr> "us-gaap/2018", "us-gaap/2018", "us-gaap/2018",…
## $ ref_date         <date> 2018-03-31, 2018-06-30, 2018-09-30, 2018-03-31…
## $ unit_ref         <chr> "USD", "USD", "USD", "USD", "USD", "USD", "USD"…
## $ value_ref        <dbl> 1.3822e+10, 1.1519e+10, 1.4125e+10, 2.3422e+10,…
## $ qtrs             <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ coreg            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
## $ sic_desc         <chr> "ELECTRONIC COMPUTERS", "ELECTRONIC COMPUTERS",…

And now we filter for the net income (id tag = ‘NetIncomeLoss’) and plot the resulting dataframe:

net_income <- df_fin_reports %>%
  filter(tag == 'NetIncomeLoss')

p <- ggplot(net_income, 
            aes(x = ref_date, y = value_ref)) +
  geom_col() + 
  labs(title = 'APPLE Quarterly Net Income (10-Q)',
       subtitle = paste0(min(my_years), ' - ', max(my_years)),
       x = '',
       y = 'Net Income ($)',
       caption = paste0('Data from EDGAR <https://www.sec.gov/edgar/searchedgar/companysearch.html>', '\n',
                        'Downloaded with package GetEdgarData') )

print(p)

Example 02 – Quarterly Net Profit of Many Companies

The package is really handy for fetching information for many companies. This is due to the fact that the SEC/DERA stores data of all companies by year and the package creates a local cache of the resulting data. This means that, by fetching data for one company, we indirectly have information for all companies.

Let’s see an example by selecting four random companies and creating the same previous graph:

set.seed(5)
my_companies <- sample(df_info$current_name, 4)
my_years <- 2009:2018
type_data <- 'quarterly'

net_income <- get_edgar_fin_data(companies = my_companies,
                                 years = my_years,
                                 type_data = type_data) %>%
  filter(tag == 'NetIncomeLoss')

p <- ggplot(net_income, 
            aes(x = ref_date, y = value_ref)) +
  geom_col() + facet_wrap(~current_name, scales = 'free') + 
  labs(title = 'Quarterly Net Income for Four Random companies',
       subtitle = paste0(min(my_years), ' - ', max(my_years)),
       x = '',
       y = 'Net Income ($)',
       caption = paste0('Data from EDGAR <https://www.sec.gov/edgar/searchedgar/companysearch.html>', '\n',
                        'Downloaded in R with package GetEdgarData') )

print(p)

Give it a try and, if you’ve found any problem or bug, let me know at .

To leave a comment for the author, please follow the link and comment on their blog: R on msperlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)