inegiR v2

[This article was first published on En El Margen - R-English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After a lot of slacking around, I finally got to finishing the upgraded version of the inegiR package on CRAN. This version combines quite a few changes that I will explain further in this post.

New language

The biggest change upfront is the migration to english in both function names and documentation. The rationale behind this is to make it more accessible to developers around the world (I have recieved a few emails asking for translations). Also, the non-ASCII characters were not helpful. For the Mexican users, I assume that if you know R, you can probably find yourself around an english document.

To avoid crashing workflows, I left the legacy functions intact except for a warning to use the english version instead. An example of this is the commercial growth rate functions, which are:

# english
rate_commerce()
# spanish (old version)
tasa_comercio()

Route API

With some help from Arturo Cárdenas and a revamp of the Sákbe API in INEGI, I was able to add functions to access route information.

The two main ones are:

# to search for a destiny id
inegi_destiny()
# to get route information
inegi_route()

The first thing to understand is that INEGI has categorized sites in Mexico according to a “destiny id”. For example, the International Airport in Mexico city is destiny id #57. The inegi_destiny() function will help you find a destiny id based on a text criteria, sort of like googling the place and getting an address. Here is an example with a plaza in Monterrey:

# download on CRAN or newest dev version (if not accepted yet)
# install.packages("inegiR")
# or... 
# devtools::install_github("eflores89/inegiR")
library(inegiR)
library(knitr)
# to search for Macroplaza destiny id
token <- "mytoken"
destiny1 <- inegi_destiny("Macroplaza", token = token)
kable(destiny1)
IDID_DESTSTATENAMEGEO_STRINGTYPELATLONG
destino6940N.L.Macroplaza, Monterrey{“type”:”Point”,”coordinates”:[-100.309991587,25.668862054]}Point-100.310025.66886
destino20237B.C.Macroplaza del Valle, Mexicali{“type”:”Point”,”coordinates”:[-115.50790804,32.62128025]}Point-115.507932.62128
destino17891Coah.Macroplaza, Acuña{“type”:”Point”,”coordinates”:[-100.978421457,29.3299882860001]}Point-100.978429.32999

When you know two destiny id’s, you can now use the API to learn about potential routes you can take between them. This function will return a list with two objects: a data.frame of route information (kilometers, toll cost, etc) and another data.frame with all the coordinates in the route. Intuitively, if you join all the dots, you can clearly see the route you would take.

To illustrate, i’m going to use the first result and see what the route would be from there to the U.S. Border (which is the other id) with a normal car and with a tolled highway. A further look at the documentation will explain the names and options in the parameters.

route <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1)
str(route)
# List of 2
#  $ ROUTE          :'data.frame':	1 obs. of  6 variables:
#   ..$ KMS       : num 222
#   ..$ TIME_MINS : num 151
#   ..$ TIME_HRS  : num 2.52
#   ..$ HAS_TOLL  : logi TRUE
#   ..$ TOLL_COST : num 364
#   ..$ TOTAL_COST: logi NA
#  $ COORDINATE_PATH:'data.frame':	1176 obs. of  2 variables:
#   ..$ V1: num [1:1176] -100 -100 -100 -100 -100 ...
#   ..$ V2: num [1:1176] 25.7 25.7 25.7 25.7 25.7 ...

As you can see, the returning element is a list of two data.frame objects. The first will give us basic statistics about the route.

kable(route$ROUTE)
KMSTIME_MINSTIME_HRSHAS_TOLLTOLL_COSTTOTAL_COST
222.36151.112.5185TRUE364NA

The total cost is NA because the default value for the calc_cost parameter is FALSE. When this is set to TRUE, the function will additionally look for the price of gasoline in the Sakbé API and calculate a cost of the trip. Be warned, this is very experimental and it is just a rule of thumb (you can see the documentation for a further explanation). Once the price of gasoline is calculated, any tolls are added and then a total cost is supplied. To do this, just change the parameter.

route2 <- inegi_route(from = 6940, to = 7426, token = token, pref = 1, vehicle = 1, 
                      calc_cost = TRUE)
kable(route2$ROUTE)
KMSTIME_MINSTIME_HRSHAS_TOLLTOLL_COSTTOTAL_COST
222.36151.112.5185TRUE364757.1729

All prices are reported in Mexican pesos.

The second element in the list is the data.frame containing all point references in the route. As I said before, just connect the dots. Here is a preview:

kable(head(route$COORDINATE_PATH))
LONGITUDLATITUDINDEX
-100.312525.662381
-100.312525.662312
-100.312425.662253
-100.312425.662224
-100.312425.662205
-100.312425.662156

For this particular route, I added the dots in Google maps to show this better:

New GDP catalog

Another huge issue that users reported was trying to find relevant indicator id’s in the INEGI webpage. As experienced users know, every economic data series has a unique id on the API. However, there is no catalog that allows you to find the id’s you are looking for. I have petitioned INEGI multiple times but got nowhere.

My personal solution was to look up the series in the BIE application (a web browser version of the API) and download the data as a .iqy object. From there, I would hack my way into the file to find the unique id’s being called. Very time intensive and error-prone.

So, to help each other out in this endeavour, I created a catalog of id’s. This version has all the sub-levels of GDP (up until 4th level desagregation), but I plan to update this catalog on a rolling basis. Any help would also be appreciated.

You can see the catalog by calling the dataset like this:

data("inegi_catalog")
kable(head(inegi_catalog[,1:7]))
# for more rows, see docs!
NAMELEVEL_2LEVEL_3LEVEL_4UNITSBASEFREQUENCY
PIBTOTALTOTALTOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL
PIB – IMPUESTOS A PRODUCTOS NETOSIMPUESTOS A PRODUCTOS NETOSTOTALTOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL
PIB – VALOR AGREGADO BRUTOVALOR AGREGADO BRUTOTOTALTOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL
PIB – ACTIVIDADES PRIMARIASACTIVIDADES PRIMARIASTOTALTOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL
PIB – ACTIVIDADES PRIMARIAS – AGRICULTURAACTIVIDADES PRIMARIASAGRICULTURATOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL
PIB – ACTIVIDADES SECUNDARIASACTIVIDADES SECUNDARIASTOTALTOTALMILLIONS OF 2008 PESOS2008TRIMESTRAL

Compact metadata and series helper

Two other common headaches came up with the past versions. First, the inegi_series() functions only accepted the full URL when most of the times, the only thing that changed between them was the number of the id. So I added a simple function to paste the entire URL string for the call to the API.

GPD_ID <- 381016
inegi_code(381016)
# "http://www3.inegi.org.mx/sistemas/api/indicadores/v1//Indicador/381016/00000/es/false/xml/"

The second headache had to do with downloading multiple id’s. The list returned when using inegi_series() with the metadata parameter as TRUE is a bit clunky when using it in a loop or apply function. So I added a compact function that returns all the information in a tidy data.frame:

token_inegi <- "mytoken"
df <- compact_inegi_series(inegi_code(381016), token_inegi)
kable(head(df))
ValuesDatesNameUpdateRegionUnitsIndicatorFrequency
79452041993-01-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral
79393621993-04-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral
79549431993-07-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral
82680361993-10-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral
82105381994-01-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral
84133621994-04-01Producto interno bruto, a precios de mercado2017/08/22NacionalMillones de pesos a precios de 2008381016Trimestral

I hope this update is useful to everyone doing data science with Mexican stats. Any new suggestions or questiosn are welcome via twitter or a github issue request.

To leave a comment for the author, please follow the link and comment on their blog: En El Margen - R-English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)