Package GetDFPData

[This article was first published on R on msperlin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Financial statements of companies traded at B3 (formerly Bovespa), the Brazilian stock exchange, are available in its website. Accessing the data for a single company is straightforward. In the website one can find a simple interface for accessing this dataset. An example is given here. However, gathering and organizing the data for a large scale research, with many companies and many dates, is painful. Financial reports must be downloaded or copied individually and later aggregated. Changes in the accounting format thoughout time can make this process slow, unreliable and irreproducible.

Package GetDFPData provides a R interface to all annual financial statements available in the website and more. It not only downloads the data but also organizes it in a tabular format and allows the use of inflation indexes. Users can select companies and a time period to download all available data. Several information about current companies, such as sector and available quarters are also at reach. The main purpose of the package is to make it easy to access financial statements in large scale research, facilitating the reproducibility of corporate finance studies with B3 data.

The positive aspects of GetDFDData are:

  • Easy and simple R and web interface
  • Changes in accounting format are internally handled by the software
  • Access to corporate events in the FRE system such as dividend payments, changes in stock holder composition, changes in governance listings, board composition and compensation, debt composition, and a lot more!
  • The output data is automatically organized using tidy data principles (long format)
  • A cache system is employed for fast data acquisition
  • Completely free and open source!

Installation

The package is available in CRAN (release version) and in Github (development version). You can install any of those with the following code:

# Release version in CRAN
install.packages('GetDFPData') # not in CRAN yet

# Development version in Github
devtools::install_github('msperlin/GetDFPData')

Shinny interface

The web interface of GetDFPData is available at http://www.msperlin.com/shiny/GetDFPData/.

How to use GetDFPData

The starting point of GetDFPData is to find the official names of companies in B3. Function gdfpd.search.company serves this purpose. Given a string (text), it will search for a partial matches in companies names. As an example, let’s find the official name of Petrobras, one of the largest companies in Brazil:

library(GetDFPData)
library(tibble)

gdfpd.search.company('petrobras',cache.folder = tempdir())
## 
## Reading info file from github
## Found 46811 lines for 701 companies  [Actives =  520  Inactives =  182 ]
## Last file update:  2017-10-19
## Caching RDATA into tempdir()
## 
## Found 2 companies:
## PETROBRAS DISTRIBUIDORA SA             | situation = ATIVO | first date = 2016-12-31 | last date - 2017-12-31
## PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS | situation = ATIVO | first date = 1998-12-31 | last date - 2017-12-31
## [1] "PETROBRAS DISTRIBUIDORA SA"            
## [2] "PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS"

Its official name in Bovespa records is PETRÓLEO BRASILEIRO S.A. - PETROBRAS. Data for quarterly and annual statements are available from 1998 to 2017. The situation of the company, active or canceled, is also given. This helps verifying the availability of data.

The content of all available financial statements can be accessed with function gdfpd.get.info.companies. It will read and parse a .csv file from my github repository. This will be periodically updated for new information. Let’s try it out:

df.info <- gdfpd.get.info.companies(type.data = 'companies', cache.folder = tempdir())
## 
## Reading info file from github
## Found 46811 lines for 701 companies  [Actives =  520  Inactives =  182 ]
## Last file update:  2017-10-19
## Caching RDATA into tempdir()
glimpse(df.info)
## Observations: 702
## Variables: 16
## $ name.company         <chr> "521 PARTICIPAÇOES S.A. - EM LIQUIDAÇÃO E...
## $ id.company           <int> 16330, 16284, 108, 20940, 21725, 19313, 1...
## $ cnpj                 <dbl> 1.547749e+12, 1.851771e+12, 6.066481e+13,...
## $ date.registration    <date> 1997-07-11, 1997-05-30, 1969-05-16, 2007...
## $ date.constitution    <date> 1996-07-30, 1997-04-02, 1944-08-18, 1976...
## $ city                 <chr> "RIO DE JANEIRO", "RIO DE JANEIRO", "PORT...
## $ estate               <chr> "RJ", "RJ", "RS", "SP", "SP", "SP", "SP",...
## $ situation            <chr> "ATIVO", "ATIVO", "CANCELADA", "CANCELADA...
## $ situation.operations <chr> "LIQUIDAÇÃO EXTRAJUDICIAL", "FASE OPERACI...
## $ listing.segment      <chr> NA, "Tradicional", "Tradicional", "Tradic...
## $ main.sector          <chr> NA, "Financeiro e Outros", "Materiais Bás...
## $ sub.sector           <chr> NA, "Outros", "Siderurgia e Metalurgia", ...
## $ segment              <chr> NA, "Outros", "Siderurgia", "Açucar e Alc...
## $ tickers              <chr> NA, "QVQP3B", NA, NA, "ADHM3", "AELP3", "...
## $ first.date           <date> 1998-12-31, 2001-12-31, 2009-12-31, 2009...
## $ last.date            <date> 2017-12-31, 2017-12-31, 2009-12-31, 2009...

This file includes several information that are gathered from Bovespa: names of companies, official numeric ids, listing segment, sectors, traded tickers and, most importantly, the available dates. The resulting dataframe can be used to filter and gather information for large scale research such as downloading financial data for a specific sector.

Downloading financial information for ONE company

All you need to download financial data with GetDFPData are the official names of companies, which can be found with gdfpd.search.company, the desired starting and ending dates and the type of financial information (individual or consolidated). Let’s try it for PETROBRAS:

name.companies <- 'PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS'
first.date <- '2017-01-01'
last.date  <- '2018-01-01'

df.reports <- gdfpd.GetDFPData(name.companies = name.companies, 
                               first.date = first.date,
                               last.date = last.date,
                               cache.folder = tempdir())
## Found cache file. Loading data..
## 
## Downloading data for 1 companies
## First Date: 2017-01-01
## Laste Date: 2018-01-01
## Inflation index: dollar
## 
## Downloading inflation data
##  Caching inflation RDATA into tempdir()  Done
## 
## Inputs looking good! Starting download of files:
## 
## PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS
##  Available periods: 2017-12-31
## 
## 
## Processing 9512 - PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS
##  Finding info from Bovespa | downloading and reading data | saving cache
##  Processing 9512 - PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS | date 2017-12-31
##      Acessing DFP data | downloading file | reading file | saving cache
##      Acessing FRE data | downloading file | reading file | saving cache
##      Acessing FCA data | downloading file | reading file | saving cache

The resulting object is a tibble, a data.frame type of object that allows for list columns. Let’s have a look in its content:

glimpse(df.reports)
## Observations: 1
## Variables: 43
## $ company.name                     <chr> "PETRÓLEO BRASILEIRO  S.A.  -...
## $ company.code                     <int> 9512
## $ cnpj                             <chr> "33000167000101"
## $ date.company.constitution        <date> 1953-10-03
## $ date.cvm.registration            <date> 1977-07-20
## $ company.tickers                  <chr> "PETR3;PETR4"
## $ min.date                         <date> 2017-12-31
## $ max.date                         <date> 2017-12-31
## $ n.periods                        <int> 1
## $ company.segment                  <chr> "Corporate Governance - Level 2"
## $ current.stockholders             <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ current.stock.composition        <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.files                    <list> [<2, 2>]
## $ fr.assets                        <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.liabilities                   <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.income                        <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.cashflow                      <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.assets.consolidated           <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.liabilities.consolidated      <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.income.consolidated           <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.cashflow.consolidated         <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ fr.auditing.report               <list> [<PETRÓLEO BRASILEIRO  S.A. ...
## $ history.dividends                <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.stockholders             <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.capital.issues           <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.mkt.value                <list> [<PETRÓLEO BRASILEIRO  S.A. ...
## $ history.capital.increases        <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.capital.reductions       <list> [<>]
## $ history.stock.repurchases        <list> [<>]
## $ history.other.stock.events       <list> [<>]
## $ history.compensation             <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.compensation.summary     <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.transactions.related     <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.debt.composition         <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.governance.listings      <list> [<PETRÓLEO BRASILEIRO  S.A. ...
## $ history.board.composition        <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.committee.composition    <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.family.relations         <list> [<>]
## $ history.family.related.companies <list> [<>]
## $ history.auditing                 <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.responsible.docs         <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.stocks.details           <list> [<c("PETRÓLEO BRASILEIRO  S....
## $ history.dividends.details        <list> [<PETRÓLEO BRASILEIRO  S.A. ...

Object df.reports only has one row since we only asked for data of one company. The number of rows increases with the number of companies, as we will soon learn with the next example. All financial statements for the different years are available within df.reports. For example, the assets statements for all desired years of PETROBRAS are:

df.income.long <- df.reports$fr.income[[1]]

glimpse(df.income.long)
## Observations: 26
## Variables: 6
## $ name.company       <chr> "PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS", "...
## $ ref.date           <date> 2017-12-31, 2017-12-31, 2017-12-31, 2017-1...
## $ acc.number         <chr> "3.01", "3.02", "3.03", "3.04", "3.04.01", ...
## $ acc.desc           <chr> "Receita de Venda de Bens e/ou Serviços", "...
## $ acc.value          <dbl> 227964000, -156109000, 71855000, -44876000,...
## $ acc.value.infl.adj <dbl> 68925439.9, -47199915.3, 21725524.6, -13568...

The resulting dataframe is in the long format, ready for processing. In the long format, financial statements of different years are stacked. In the wide format, we have the year as columns of the table.

If you want the wide format, which is the most common way that financial reports are presented, you can use function gdfpd.convert.to.wide. See an example next:

df.income.wide <- gdfpd.convert.to.wide(df.income.long)

knitr::kable(df.income.wide )
acc.number acc.desc name.company 2017-12-31
3.01 Receita de Venda de Bens e/ou Serviços PETRÓLEO BRASILEIRO S.A. - PETROBRAS 227964000
3.02 Custo dos Bens e/ou Serviços Vendidos PETRÓLEO BRASILEIRO S.A. - PETROBRAS -156109000
3.03 Resultado Bruto PETRÓLEO BRASILEIRO S.A. - PETROBRAS 71855000
3.04 Despesas/Receitas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS -44876000
3.04.01 Despesas com Vendas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -18490000
3.04.02 Despesas Gerais e Administrativas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -6465000
3.04.03 Perdas pela Não Recuperabilidade de Ativos PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.04.04 Outras Receitas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.04.05 Outras Despesas Operacionais PETRÓLEO BRASILEIRO S.A. - PETROBRAS -26635000
3.04.06 Resultado de Equivalência Patrimonial PETRÓLEO BRASILEIRO S.A. - PETROBRAS 6714000
3.05 Resultado Antes do Resultado Financeiro e dos Tributos PETRÓLEO BRASILEIRO S.A. - PETROBRAS 26979000
3.06 Resultado Financeiro PETRÓLEO BRASILEIRO S.A. - PETROBRAS -21860000
3.06.01 Receitas Financeiras PETRÓLEO BRASILEIRO S.A. - PETROBRAS 2917000
3.06.02 Despesas Financeiras PETRÓLEO BRASILEIRO S.A. - PETROBRAS -24777000
3.07 Resultado Antes dos Tributos sobre o Lucro PETRÓLEO BRASILEIRO S.A. - PETROBRAS 5119000
3.08 Imposto de Renda e Contribuição Social sobre o Lucro PETRÓLEO BRASILEIRO S.A. - PETROBRAS -5565000
3.08.01 Corrente PETRÓLEO BRASILEIRO S.A. - PETROBRAS -1494000
3.08.02 Diferido PETRÓLEO BRASILEIRO S.A. - PETROBRAS -4071000
3.09 Resultado Líquido das Operações Continuadas PETRÓLEO BRASILEIRO S.A. - PETROBRAS -446000
3.10 Resultado Líquido de Operações Descontinuadas PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.10.01 Lucro/Prejuízo Líquido das Operações Descontinuadas PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.10.02 Ganhos/Perdas Líquidas sobre Ativos de Operações Descontinuadas PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.11 Lucro/Prejuízo do Período PETRÓLEO BRASILEIRO S.A. - PETROBRAS -446000
3.99 Lucro por Ação - (Reais / Ação) PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.99.01 Lucro Básico por Ação PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0
3.99.02 Lucro Diluído por Ação PETRÓLEO BRASILEIRO S.A. - PETROBRAS 0

Downloading financial information for SEVERAL companies

If you are doing serious research, it is likely that you need financial statements for more than one company. Package GetDFPData is specially designed for handling large scale download of data. Let’s build a case with two selected companies:

my.companies <- c('PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS',
                  'BANCO DO ESTADO DO RIO GRANDE DO SUL SA')

first.date <- '2016-01-01'
last.date  <- '2017-01-01'
type.statements <- 'individual'

df.reports <- gdfpd.GetDFPData(name.companies = my.companies, 
                               first.date = first.date,
                               last.date = last.date,
                               cache.folder = tempdir())
## Found cache file. Loading data..
## 
## Downloading data for 2 companies
## First Date: 2016-01-01
## Laste Date: 2017-01-01
## Inflation index: dollar
## 
## Downloading inflation data
##  Found cache file. Loading data..    Done
## 
## Inputs looking good! Starting download of files:
## 
## BANCO DO ESTADO DO RIO GRANDE DO SUL SA
##  Available periods: 2016-12-31
## PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS
##  Available periods: 2016-12-31
## 
## 
## Processing 1210 - BANCO DO ESTADO DO RIO GRANDE DO SUL SA
##  Finding info from Bovespa | downloading and reading data | saving cache
##  Processing 1210 - BANCO DO ESTADO DO RIO GRANDE DO SUL SA | date 2016-12-31
##      Acessing DFP data | downloading file | reading file | saving cache
##      Acessing FRE data | downloading file | reading file | saving cache
##      Acessing FCA data | downloading file | reading file | saving cache
## Processing 9512 - PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS
##  Finding info from Bovespa
##      Found BOV cache file
##  Processing 9512 - PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS | date 2016-12-31
##      Acessing DFP data | downloading file | reading file | saving cache
##      Acessing FRE data | downloading file | reading file | saving cache
##      Acessing FCA data | downloading file | reading file | saving cache

And now we can check the resulting tibble:

glimpse(df.reports)
## Observations: 2
## Variables: 43
## $ company.name                     <chr> "BANCO DO ESTADO DO RIO GRAND...
## $ company.code                     <int> 1210, 9512
## $ cnpj                             <chr> "92702067000196", "3300016700...
## $ date.company.constitution        <date> 1928-09-12, 1953-10-03
## $ date.cvm.registration            <date> 1977-07-20, 1977-07-20
## $ company.tickers                  <chr> "BRSR3;BRSR5;BRSR6", "PETR3;P...
## $ min.date                         <date> 2016-12-31, 2016-12-31
## $ max.date                         <date> 2016-12-31, 2016-12-31
## $ n.periods                        <int> 1, 1
## $ company.segment                  <chr> "Corporate Governance - Level...
## $ current.stockholders             <list> [<c("BANCO DO ESTADO DO RIO ...
## $ current.stock.composition        <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.files                    <list> [<2, 2>, <2, 2>]
## $ fr.assets                        <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.liabilities                   <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.income                        <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.cashflow                      <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.assets.consolidated           <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.liabilities.consolidated      <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.income.consolidated           <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.cashflow.consolidated         <list> [<c("BANCO DO ESTADO DO RIO ...
## $ fr.auditing.report               <list> [<BANCO DO ESTADO DO RIO GRA...
## $ history.dividends                <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.stockholders             <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.capital.issues           <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.mkt.value                <list> [<BANCO DO ESTADO DO RIO GRA...
## $ history.capital.increases        <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.capital.reductions       <list> [<>, <>]
## $ history.stock.repurchases        <list> [<>, <>]
## $ history.other.stock.events       <list> [<>, <>]
## $ history.compensation             <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.compensation.summary     <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.transactions.related     <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.debt.composition         <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.governance.listings      <list> [<BANCO DO ESTADO DO RIO GRA...
## $ history.board.composition        <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.committee.composition    <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.family.relations         <list> [<>, <>]
## $ history.family.related.companies <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.auditing                 <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.responsible.docs         <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.stocks.details           <list> [<c("BANCO DO ESTADO DO RIO ...
## $ history.dividends.details        <list> [<BANCO DO ESTADO DO RIO GRA...

Every row of df.reports will provide information for one company. Metadata about the corresponding dataframes such as min/max dates is available in the first columns. Keeping a tabular structure facilitates the organization and future processing of all financial data. We can use tibble df.reports for creating other dataframes in the long format containing data for all companies. See next, where we create dataframes with the assets and liabilities of all companies:

df.assets <- do.call(what = rbind, args = df.reports$fr.assets)
df.liabilities <- do.call(what = rbind, args = df.reports$fr.liabilities)

df.assets.liabilities <- rbind(df.assets, df.liabilities)

As an example, let’s use the resulting dataframe for calculating and analyzing a simple liquidity index of a company, the total of current (liquid) assets (Ativo circulante) divided by the total of current short term liabilities (Passivo Circulante), over time.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
my.tab <- df.assets.liabilities %>%
  group_by(name.company, ref.date) %>%
  summarise(Liq.Index = acc.value[acc.number == '1.01']/ acc.value[acc.number == '2.01'])

my.tab
## # A tibble: 2 x 3
## # Groups:   name.company [?]
##   name.company                            ref.date   Liq.Index
##   <chr>                                   <date>         <dbl>
## 1 BANCO DO ESTADO DO RIO GRANDE DO SUL SA 2016-12-31     1.18 
## 2 PETRÓLEO BRASILEIRO  S.A.  - PETROBRAS  2016-12-31     0.716

Now we can visualize the information using ggplot2:

library(ggplot2)

p <- ggplot(my.tab, aes(x = ref.date, y = Liq.Index, fill = name.company)) +
  geom_col(position = 'dodge' )
print(p)

Exporting financial data

The package includes function gdfpd.export.DFP.data for exporting the financial data to an Excel or zipped csv files. See next:

my.basename <- 'MyExcelData'
my.format <- 'csv' # only supported so far
gdfpd.export.DFP.data(df.reports = df.reports, 
                      base.file.name = my.basename,
                      type.export = my.format)

The resulting Excel file contains all data available in df.reports.

To leave a comment for the author, please follow the link and comment on their blog: R on msperlin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)