Package GetLattesData

September 4, 2017
By

(This article was first published on R and Finance, and kindly contributed to R-bloggers)

Downloading and reading bibliometric data from Lattes –

Lattes is the largest and unique platform for
academic curriculumns. There you can find information about the academic
work of ALL Brazilian scholars. It includes institution of PhD,
current employer, field of work, all publications metadata and many
more. It is an unique and reliable source of information for
bibliometric studies.

I’ve been working with Lattes data for some time. Here I present a short
list of papers that have used this data.

Package GetLattesData is a wrap up of the functions that I’ve been
using for acessing the dataset. It’s main innovation is the possibility
of downloading data directly from Lattes, without any kind of manual
work.

Installation

The package is not yet in CRAN. It should be there in a couple of days.
In the meanwhile, you can install it using devtools.

#install.packages('devtools')
devtools::install_github('msperlin/GetLattesData')

Example of usage

Let’s consider a simple example of downloading information for a group
of scholars. I selected a couple of coleagues at my university. Their
Lattes id can be easilly found in Lattes
website
. After searching for a name, notice the
internet address of the resulting CV, such as
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4713546D3.
Lattes ID is the final 10 digit code of this address. In our case, it is
'K4713546D3'.

Since we all work in the business department of UFRGS, the quality of
our publications is locally set by the Qualis ranking of field
'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'.
Qualis is the local journal ranking in Brazil. You can read more about
Qualis in Wikipedia and
here

Now, based on the two sets of information, vector of ids and field of
Qualis, we can use GetLattesData to download all up to date
information about the researchers:

library(GetLattesData)

# ids from EA-UFRGS
my.ids <- c('K4713546D3', 'K4440252H7', 
            'K4783858A0', 'K4723925J2')

# qualis for the field of management
field.qualis = 'ADMINISTRAÇÃO PÚBLICA E DE EMPRESAS, CIÊNCIAS CONTÁBEIS E TURISMO'

l.out <- gld_get_lattes_data(id.vec = my.ids, field.qualis = field.qualis)

## 
## Downloading file  /tmp/Rtmp9ODS2F/K4713546D3_2017-09-04.zip
## Downloading file  /tmp/Rtmp9ODS2F/K4440252H7_2017-09-04.zip
## Downloading file  /tmp/Rtmp9ODS2F/K4783858A0_2017-09-04.zip
## Downloading file  /tmp/Rtmp9ODS2F/K4723925J2_2017-09-04.zip
## Reading  K4713546D3_2017-09-04.zip -  Marcelo Scherer Perlin  found 18  papers
## Reading  K4440252H7_2017-09-04.zip -  Marcelo Brutti Righi    found 42  papers
## Reading  K4783858A0_2017-09-04.zip -  João Luiz Becker    found 58  papers
## Reading  K4723925J2_2017-09-04.zip -  Denis Borenstein    found 64  papers

The output my.l is a list with two items:

names(l.out)

## [1] "tpesq"   "tpublic"

The first is a dataframe with information about researchers:

tpesq <- l.out$tpesq
str(tpesq)

## 'data.frame':    4 obs. of  9 variables:
##  $ name           : chr  "Marcelo Scherer Perlin" "Marcelo Brutti Righi" "João Luiz Becker" "Denis Borenstein"
##  $ last.update    : Date, format: "2017-08-29" "2017-08-02" ...
##  $ phd.institution: chr  "University of Reading" "Universidade Federal de Santa Maria" "University Of California At Los Angeles" "University of Strathclyde"
##  $ phd.start.year : chr  "2007" "2013" "1982" "1991"
##  $ phd.end.year   : chr  "2010" "2015" "1986" "1995"
##  $ country.origin : Factor w/ 1 level "Brasil": 1 1 1 1
##  $ Major Field    : chr  "CIENCIAS_SOCIAIS_APLICADAS" "CIENCIAS_SOCIAIS_APLICADAS" "CIENCIAS_SOCIAIS_APLICADAS" "ENGENHARIAS"
##  $ Minor Field    : chr  "Administração" "Administração" "Administração" "Engenharia de Produção"
##  $ id.file        : chr  "K4713546D3_2017-09-04.zip" "K4440252H7_2017-09-04.zip" "K4783858A0_2017-09-04.zip" "K4723925J2_2017-09-04.zip"

and the second dataframe containing information about all publications,
including Qualis and SJR:

tpublic <- l.out$tpublic
str(tpublic)

## 'data.frame':    182 obs. of  13 variables:
##  $ name         : chr  "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" "Marcelo Scherer Perlin" ...
##  $ article.title: chr  "Análise do Perfil dos Acadêmicos e de suas Publicações Científicas em Administração" "The Brazilian scientific output published in journals: A study based on a large CV database" "THE FORECASTING POWER OF INTERNET SEARCH QUERIES IN THE BRAZILIAN FINANCIAL MARKET" "A multistage stochastic programming asset-liability management model: an application to the Brazilian pension fund industry" ...
##  $ year         : chr  "2017" "2017" "2017" "2017" ...
##  $ language     : chr  "Português" "Inglês" "Inglês" "Inglês" ...
##  $ journal.title: chr  "RAC. Revista de Administração Contemporânea (Impresso)" "Journal of Informetrics" "RAM. REVISTA DE ADMINISTRAÇÃO MACKENZIE (ONLINE)" "OPTIMIZATION AND ENGINEERING" ...
##  $ ISSN         : chr  "1415-6555" "1751-1577" "1678-6971" "1389-4420" ...
##  $ start.page   : chr  "62" "18" "184" "349" ...
##  $ end.page     : chr  "83" "31" "210" "368" ...
##  $ order.aut    : chr  "2" "1" "3" "3" ...
##  $ n.authors    : chr  "3" "5" "3" "5" ...
##  $ qualis       : chr  "A2" NA "B1" "A2" ...
##  $ SJR          : num  NA 2.029 NA 0.481 NA ...
##  $ H.SJR        : int  NA 50 NA 29 NA NA 45 NA NA NA ...

An application of GetLattesData

Based on GetLattesData and other packages, it is easy to create
academic reports for a large number of researchers. See next, where we
plot the number of publications for each researcher, conditioning on
Qualis ranking.

library(ggplot2)

p <- ggplot(tpublic, aes(x = qualis)) +
  geom_bar(position = 'identity') + facet_wrap(~name) +
  labs(x = paste0('Qualis: ', field.qualis))
print(p)

We can also use dplyr to do some simple assessment of academic
productivity:

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

my.tab <- tpublic %>%
  group_by(name) %>%
  summarise(n.papers = n(),
            max.SJR = max(SJR, na.rm = T),
            mean.SJR = mean(SJR, na.rm = T),
            n.A1.qualis = sum(qualis == 'A1', na.rm = T),
            n.A2.qualis = sum(qualis == 'A2', na.rm = T),
            median.authorship = median(as.numeric(order.aut), na.rm = T ))

knitr::kable(my.tab)
name n.papers max.SJR mean.SJR n.A1.qualis n.A2.qualis median.authorship
Denis Borenstein 64 3.674 1.3165610 22 15 2
João Luiz Becker 58 3.885 0.8090000 5 13 2
Marcelo Brutti Righi 42 1.767 0.3961111 6 14 1
Marcelo Scherer Perlin 18 2.029 0.7755000 2 3 1

To leave a comment for the author, please follow the link and comment on their blog: R and Finance.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)