(guest post by Eduardo Flores)
For anyone interested in using data from INEGI (the official statistics agency of Mexico), it was sometimes a hasle to look-up all the information in their BIE data base.
Of course, their interface is useful for users who need some data series fast because of the export to excel functions. But for users like myself, it soon became a hassle to download excels and them import them into R.
Furthermore, the API has some documentation that is somewhat friendly but requieres some knowledge of xml or json.
I soon found myself writing some functions to work this API and so the “inegiR” package was born on github and later uploaded to CRAN.
The package uses two main functions to query a data series or the information in the DENUE database. (For users who are not aware: the DENUE is a repository of millions of businesses across the country, updated via census).
The remaining functions serve as elegant wrappers to perform common tasks. For example
inflacion_general() to download monthly inflation data. Other functions make transformations easier on-the-fly, such as
YoY() to calculate a percentage change from a year ago (year-over-year).
Here are some examples.
Example 1: downloading a data series
To get the CRAN version (as of Nov-2015):
To download dev version on github, using devtools:
#install.packages("devtools") library(devtools) install_github("Eflores89/inegiR") #dependiencies: zoo, XML, plyr, jsonlite library(inegiR)
There are roughly two ways to download data series: the “general” and the “short” way (provided there is a wrapper function available).
In the first case, the function parses a URL provided by the user. All the URL’s for each data series can be found in the INEGI development site. You must also sign up for an API token in that same site with your email.
Let us save the imaginary token:
token <- "abc123"
Now, I wish to find the rate of inflation (which in the case of INEGI is a percent change of the INPC data series).
This is the corresponding URL for INPC data.series:
urlINPC <- "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/216064/00000/es/false/xml/"
JSON format is also accepted and is interchangeable (do not use the “?callback?” sign provided by INEGI’s documentation):
urlINPC2 <- "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/216064/00000/es/false/json/"
Now, we are going to download this data series as a data.frame.
INPC <- serie_inegi(urlINPC, token) # take a look tail(INPC) # Fechas Valores # 2014-12-01 116.05900000 # 2015-01-01 115.95400000 # 2015-02-01 116.17400000 # 2015-03-01 116.64700000 # 2015-04-01 116.34500000 # 2015-05-01 115.76400000
The optional “metadata” parameter in serie_inegi allows us to also download the metadata information from the data series, which includes date of update, units, frequency, etc.
If “metadata” is set to TRUE, the information is parsed as a list of two elements: the metadata and the data frame.
INPC_Metadata <- serie_inegi(urlINPC, token, metadata = TRUE) class(INPC_Metadata) #  "list"
To access any of these elements, simply use as a list:
# date of last update INPC_Metadata$MetaData$UltimaActualizacion  "2015/06/09"
Now that we have the INPC data series, we must apply a year-over-year change. For this we use the handy
YoY() function, which let’s us choose the amount of periods to compare over (12 if you want year over year for a monthly series):
Inflation <- YoY(INPC$Valores, lapso = 12, decimal=FALSE) # if we want a dataframe, we simply build like this Inflation_df <- cbind.data.frame(Fechas = INPC$Fechas, Inflation = Inflation) tail(Inflation_df) # Fechas Inflation # 2014-12-01 4.081322 # 2015-01-01 3.065642 # 2015-02-01 3.000266 # 2015-03-01 3.137075 # 2015-04-01 3.062327 # 2015-05-01 2.876643
If we want to graph, we could simply:
library(ggplot2) ggplot(Inflation_df, aes(x = Fechas, y = Inflation))+ geom_line()+ labs(title = "Historical Inflation Rate for Mexico", x = "Date", y = "Rate")
Which could be seen here:
You can also easily “trim” the outputs of these functions with the
ultimos() function, which is just a fancy “tail” that let’s you choose the number of observations.
For example, if we wanted to graph only the last 24 months:
ggplot(ultimos(Inflation_df, n = 24), aes(x = Fechas, y = Inflation))+ geom_line()+ labs(title = "Inflation Rate for Mexico (24 months)", x = "Date", y = "Rate")
This method works for any URL obtained from the INEGI documentation, but for the most used indicators, the package has built-in shortcut wrappers.
Let us obtain the same data series (inflation) via one of these specified shortcut functions:
Inflation_fast <- inflacion_general(token) tail(Inflation_fast) # Fechas Inflacion # 2014-12-01 4.081322 # 2015-01-01 3.065642 # 2015-02-01 3.000266 # 2015-03-01 3.137075 # 2015-04-01 3.062327 # 2015-05-01 2.876643
As you can see, the function basically does all the transformations in one step.
Example 2: downloading statistics from DENUE
To access the DENUE, it is necesary to look at another API here and obtain a different token for these queries.
# new token token_denue <- "abcdef1234"
To download the businesses in a certain radius, we need a few coordinates. Let’s use the ones around Monterrey, Mexico’s main square:
Now, we download into a data.frame the list of businesses in a 250 meter radius.
NegociosMacro <- denue_inegi(latitud = latitud_macro, longitud = longitud_macro, token_denue)
Let’s see only the first rows and columns…
head(NegociosMacro)[,1:2] # id Nombre # 2918696 ESTACIONAMIENTO GRAN PLAZA # 2918698 TEATRO DE LA CIUDAD DE MONTERREY # 2918723 CONGRESO DE ESTADO # 2918793 SECRETARIA DE SALUD DEL ESTADO # 2974150 BIBLIOTECA CENTRAL # 2974215 SOTANO RECURSOS HUMANOS Y ADQUISICIONES
If you would like to change some parameters, this is accepted. For example a 1km radius and only businesses with “Restaurante” in the description.
RestaurantsMacro <- denue_inegi(latitud = latitud_macro, longitud = longitud_macro, token_denue, metros = 1000, keyword = "Restaurante")
I really hope this is useful and streamlines the process of getting INEGI data into R and finally into models, which is what I will be using it for!