‘inegiR’ – an R package for Mexican official statistics

November 4, 2015
By

(guest post by Eduardo Flores)

Introduction

For anyone interested in using data from INEGI (the official statistics agency of Mexico), it was sometimes a hasle to look-up all the information in their BIE data base.

Of course, their interface is useful for users who need some data series fast because of the export to excel functions. But for users like myself, it soon became a hassle to download excels and them import them into R.

Furthermore, the API has some documentation that is somewhat friendly but requieres some knowledge of xml or json.

I soon found myself writing some functions to work this API and so the “inegiR” package was born on github and later uploaded to CRAN.

The package uses two main functions to query a data series or the information in the DENUE database. (For users who are not aware: the DENUE is a repository of millions of businesses across the country, updated via census).

The remaining functions serve as elegant wrappers to perform common tasks. For exampleinflacion_general() to download monthly inflation data. Other functions make transformations easier on-the-fly, such as YoY() to calculate a percentage change from a year ago (year-over-year).

Here are some examples.

Example 1: downloading a data series

Install

To get the CRAN version (as of Nov-2015):

install.packages(inegiR)
library(inegiR)

To download dev version on github, using devtools:

#install.packages("devtools")
library(devtools)
install_github("Eflores89/inegiR")
  #dependiencies: zoo, XML, plyr, jsonlite
library(inegiR)

Download data

There are roughly two ways to download data series: the “general” and the “short” way (provided there is a wrapper function available).

In the first case, the function parses a URL provided by the user. All the URL’s for each data series can be found in the INEGI development site. You must also sign up for an API token in that same site with your email.

Let us save the imaginary token:

token <- "abc123"

Now, I wish to find the rate of inflation (which in the case of INEGI is a percent change of the INPC data series).

This is the corresponding URL for INPC data.series:

urlINPC <- "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/216064/00000/es/false/xml/"

JSON format is also accepted and is interchangeable (do not use the “?callback?” sign provided by INEGI’s documentation):

urlINPC2 <- "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/216064/00000/es/false/json/"

Now, we are going to download this data series as a data.frame.

INPC <- serie_inegi(urlINPC, token)

# take a look
tail(INPC)
# Fechas         Valores
# 2014-12-01   116.05900000
# 2015-01-01   115.95400000
# 2015-02-01   116.17400000
# 2015-03-01   116.64700000
# 2015-04-01   116.34500000
# 2015-05-01   115.76400000

The optional “metadata” parameter in serie_inegi allows us to also download the metadata information from the data series, which includes date of update, units, frequency, etc.

If “metadata” is set to TRUE, the information is parsed as a list of two elements: the metadata and the data frame.

INPC_Metadata <- serie_inegi(urlINPC, token, metadata = TRUE)
class(INPC_Metadata)
# [1] "list"

To access any of these elements, simply use as a list:

# date of last update
INPC_Metadata$MetaData$UltimaActualizacion
[1] "2015/06/09"

Now that we have the INPC data series, we must apply a year-over-year change. For this we use the handy YoY() function, which let’s us choose the amount of periods to compare over (12 if you want year over year for a monthly series):

Inflation <- YoY(INPC$Valores, 
                 lapso = 12, 
                 decimal=FALSE)

# if we want a dataframe, we simply build like this
Inflation_df <- cbind.data.frame(Fechas = INPC$Fechas, 
                                 Inflation = Inflation)

tail(Inflation_df)
# Fechas        Inflation
# 2014-12-01    4.081322
# 2015-01-01    3.065642
# 2015-02-01    3.000266
# 2015-03-01    3.137075
# 2015-04-01    3.062327
# 2015-05-01    2.876643

If we want to graph, we could simply:

library(ggplot2)

ggplot(Inflation_df, 
       aes(x = Fechas, y = Inflation))+
  geom_line()+
  labs(title = "Historical Inflation Rate for Mexico", 
       x = "Date", y = "Rate")

Which could be seen here:

inflation

You can also easily “trim” the outputs of these functions with the ultimos() function, which is just a fancy “tail” that let’s you choose the number of observations.

For example, if we wanted to graph only the last 24 months:

ggplot(ultimos(Inflation_df, n = 24), 
       aes(x = Fechas, y = Inflation))+
  geom_line()+
  labs(title = "Inflation Rate for Mexico (24 months)", 
       x = "Date", y = "Rate")

inflation2

This method works for any URL obtained from the INEGI documentation, but for the most used indicators, the package has built-in shortcut wrappers.

Let us obtain the same data series (inflation) via one of these specified shortcut functions:

Inflation_fast <- inflacion_general(token)
tail(Inflation_fast)
# Fechas        Inflacion
# 2014-12-01    4.081322
# 2015-01-01    3.065642
# 2015-02-01    3.000266
# 2015-03-01    3.137075
# 2015-04-01    3.062327
# 2015-05-01    2.876643

As you can see, the function basically does all the transformations in one step.

Example 2: downloading statistics from DENUE

To access the DENUE, it is necesary to look at another API here and obtain a different token for these queries.

# new token
token_denue <- "abcdef1234"

To download the businesses in a certain radius, we need a few coordinates. Let’s use the ones around Monterrey, Mexico’s main square:

latitud_macro<-"25.669194"
longitud_macro<-"-100.309901"

Now, we download into a data.frame the list of businesses in a 250 meter radius.

NegociosMacro <- denue_inegi(latitud = latitud_macro, 
                             longitud = longitud_macro, 
                             token_denue)

Let’s see only the first rows and columns…

head(NegociosMacro)[,1:2]
#     id                                       Nombre
# 2918696                   ESTACIONAMIENTO GRAN PLAZA
# 2918698             TEATRO DE LA CIUDAD DE MONTERREY
# 2918723                           CONGRESO DE ESTADO
# 2918793               SECRETARIA DE SALUD DEL ESTADO
# 2974150                           BIBLIOTECA CENTRAL
# 2974215      SOTANO RECURSOS HUMANOS Y ADQUISICIONES

If you would like to change some parameters, this is accepted. For example a 1km radius and only businesses with “Restaurante” in the description.

RestaurantsMacro <- denue_inegi(latitud = latitud_macro, 
                                longitud = longitud_macro, 
                                token_denue, 
                                metros = 1000, 
                                keyword = "Restaurante")

I really hope this is useful and streamlines the process of getting INEGI data into R and finally into models, which is what I will be using it for!



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)