New package: simfinR

[This article was first published on R on msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In my latest post I wrote about package GetEdgarData, which downloaded structured data from the SEC. I’ve been working on this project and soon realized that the available data at the SEC/DERA section is not complete. For example, all Q4 statements are missing. This seems to be the way all exchanges release the financial documents. I’ve found the same problem here in the Brazilian exchange.

It came to my attention that there is an alternative way of fetching corporate data and adjusted prices, the SimFin project. From its own website:

Our core goal is to make financial data as freely available as possible because we believe that  having the right tools for investing/research shouldn't be the privilege of those that can afford to spend thousands of dollars per year on data.

The platform is free with a daily limit of 2000 api calls. This is not bad and should suffice for most users. If you need more calls, the premium version is just 10 euros a month, a fraction of what other data vendors usually request.

Package simfinR, available in Github and soon in CRAN, facilitates all calls to the simfin API. It first makes sure the requested data exists and only then calls the api. As usual, all api queries are saved locally using package memoise. This means that the second time you ask for a particular data about a company/year, the function will load a local copy, and will not call the web api.

Package GetEdgarData, however, will be discontinued. I’ll keep the files in Github but will no longer develop it. It takes a lot of time to write and maintain R packages, and I fell that simfinR has far more potential.

As mentioned before, both new packages, GetQuandlData and simfinR will be part of my next book, “Analyzing Financial and Economic Data with R”, which should be released in early 2020.

Installation

# not in CRAN yet (need to test it further)
#install.packages('simfinR')

# from Github
devtools::install_github('msperlin/simfinR')

Example 01 – Apples Quarterly Net Profit

The first step in using simfinR is finding information about available companies:

library(simfinR)
library(tidyverse)

# You need to get your own api key at https://simfin.com/
my_apy_key <- readLines('~/Dropbox/.api_key_simfin.txt')

# get info
df_info_companies <- simfinR_get_available_companies(my_apy_key)

# check it
glimpse(df_info_companies)
## Observations: 2,564
## Variables: 3
## $ simId  <int> 171401, 901704, 901866, 45730, 378251, 896477, 418866, 79…
## $ ticker <chr> "ZYXI", "ZYNE", "ZVO", "ZUMZ", "ZTS", "ZS", "ZNGA", "ZIOP…
## $ name   <chr> "ZYNEX INC", "Zynerba Pharmaceuticals, Inc.", "Zovio Inc"…

We find information about 2564 companies. Digging deeper we find that the simfin id of Apple is 111052. Let’s use it to download the annual financial information since 2009.

id_companies <- 111052 # id of APPLE INC
type_statements <- 'pl' # profit/loss
periods = 'FY' # final year
years = 2009:2018

df_fin_FY <- simfinR_get_fin_statements(id_companies,
                                     type_statements = type_statements,
                                     periods = periods,
                                     year = years,
                                     api_key = my_apy_key)

glimpse(df_fin_FY)
## Observations: 580
## Variables: 13
## $ company_name   <chr> "APPLE INC", "APPLE INC", "APPLE INC", "APPLE INC…
## $ company_sector <chr> "Computer Hardware", "Computer Hardware", "Comput…
## $ type_statement <fct> pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, p…
## $ period         <fct> FY, FY, FY, FY, FY, FY, FY, FY, FY, FY, FY, FY, F…
## $ year           <int> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2…
## $ ref_date       <date> 2009-12-31, 2009-12-31, 2009-12-31, 2009-12-31, …
## $ acc_name       <chr> "Revenue", "Sales & Services Revenue", "Financing…
## $ acc_value      <dbl> 4.2905e+10, NA, NA, NA, -2.5683e+10, NA, NA, NA, …
## $ tid            <chr> "1", "3", "5", "6", "2", "7", "8", "9", "4", "10"…
## $ uid            <chr> "1", "0", "0", "0", "2", "0", "0", "0", "4", "10"…
## $ parent_tid     <chr> "4", "1", "1", "1", "4", "2", "2", "2", "19", "19…
## $ display_level  <chr> "0", "1", "1", "1", "0", "1", "1", "1", "0", "0",…
## $ check_possible <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …

And now we plot the results of the “Net Income” (profit/loss) for all years:

net_income <- df_fin_FY %>% 
              filter(acc_name == 'Net Income')

p <- ggplot(net_income,
            aes(x = ref_date, y = acc_value)) +
  geom_col()  + 
  labs(title = 'Yearly Profit of APPLE INC',
       x = '',
       y = 'Yearly Profit',
       subtitle = '',
       caption = 'Data from simfin <https://simfin.com/>') + 
  theme_bw()

print(p)

Not bad!

We can also grab data for all quarters:

type_statements <- 'pl' # profit/loss
periods = c('Q1', 'Q2', 'Q3', 'Q4') # final year
years = 2009:2018

df_fin_quarters <- simfinR_get_fin_statements(id_companies,
                                     type_statements = type_statements,
                                     periods = periods,
                                     year = years,
                                     api_key = my_apy_key)

glimpse(df_fin_quarters)
## Observations: 2,320
## Variables: 13
## $ company_name   <chr> "APPLE INC", "APPLE INC", "APPLE INC", "APPLE INC…
## $ company_sector <chr> "Computer Hardware", "Computer Hardware", "Comput…
## $ type_statement <fct> pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, pl, p…
## $ period         <fct> Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q1, Q…
## $ year           <int> 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2009, 2…
## $ ref_date       <date> 2009-03-31, 2009-03-31, 2009-03-31, 2009-03-31, …
## $ acc_name       <chr> "Revenue", "Sales & Services Revenue", "Financing…
## $ acc_value      <dbl> 1.188e+10, NA, NA, NA, -7.373e+09, NA, NA, NA, 4.…
## $ tid            <chr> "1", "3", "5", "6", "2", "7", "8", "9", "4", "10"…
## $ uid            <chr> "1", "0", "0", "0", "2", "0", "0", "0", "4", "10"…
## $ parent_tid     <chr> "4", "1", "1", "1", "4", "2", "2", "2", "19", "19…
## $ display_level  <chr> "0", "1", "1", "1", "0", "1", "1", "1", "0", "0",…
## $ check_possible <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, …

And plot the results:

net_income <- df_fin_quarters %>% 
              filter(acc_name == 'Net Income')

p <- ggplot(net_income,
            aes(x = period, y = acc_value)) +
  geom_col() + facet_grid(~year, scales = 'free') + 
  labs(title = 'Quarterly Profit of APPLE INC',
       x = 'Quarters',
       y = 'Net Profit') + 
  theme_bw()

print(p)

Nice and impressive profit record. The first quarter (Q1) seems to present the best performance, probably due to end of year holidays.

Example 02 - Quarterly Net Profit of Many Companies

Package simfinR can also fetch information for many companies in a single call. Let’s run another example by selecting four random companies and creating the same previous graph:

set.seed(5)
my_ids <- sample(df_info_companies$simId, 4)
type_statements <- 'pl' # profit/loss
periods = 'FY' # final year
years = 2010:2018

df_fin <- simfinR_get_fin_statements(id_companies = my_ids,
                                     type_statements = type_statements,
                                     periods = periods,
                                     year = years,
                                     api_key = my_apy_key)

net_income <- df_fin %>% 
              filter(acc_name == 'Net Income')

p <- ggplot(net_income,
            aes(x = ref_date, y = acc_value)) +
  geom_col() + 
  labs(title = 'Annual Profit/Loss of Four Companies',
       x = '',
       y = 'Net Profit/Loss') + 
  facet_wrap(~company_name, scales = 'free_y') + 
  theme_bw()

print(p)

Example 03: Fetching price data

The simfin project also provides adjusted prices of stocks. Have a look:

set.seed(5)
my_ids <- sample(df_info_companies$simId, 4)
type_statements <- 'pl' # profit/loss
periods = 'FY' # final year
years = 2009:2018

df_price <- simfinR_get_price_data(id_companies = my_ids,
                                     api_key = my_apy_key)


p <- ggplot(df_price,
            aes(x = ref_date, y = close_adj)) +
  geom_line() + 
  labs(title = 'Adjusted stock prices for four companies',
       x = '',
       y = 'Adjusted Stock Prices') + 
  facet_wrap(~company_name, scales = 'free_y') + 
  theme_bw()

print(p)

As you can see, the data is comprehensive and should suffice for many different corporate finance research topics.

Give it a try and, if you’ve found any problem or bug, please let me know at .

To leave a comment for the author, please follow the link and comment on their blog: R on msperlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)