Occupations classification is an important step in tasks such as labour market analysis, epidemiological studies and official statistics. To assist research on the labour market, ESCO has defined a taxonomy for occupations. Occupations are specified and organized in a hierarchical structure based on the International Standard Classification of Occupations (ISCO).
labourR is a new package that performs occupations coding for multilingual free-form text (e.g. a job title) using the ESCO hierarchical classification model.
The initial motivation was to retrieve the work experience history from a Curriculum Vitae generated from the Europass online CV editor. Document vectorization is performed using the ESCO model and a fuzzy match is allowed with various string distance metrics.
- Allows classifying multilingual free-text using the ESCO-ISCO hierarchy of occupations.
- Computations are fully vectorized and memory efficient.
- Includes facilities to assist research in information mining of labour market data.
You can install the released version of labourR from CRAN with,
library(labourR) corpus <- data.frame( id = 1:3, text = c("Data Scientist", "Junior Architect Engineer", "Cashier at McDonald's") )
Given an ISCO level, the top suggested ISCO group is returned.
num_leaves specifies the number of ESCO occupations used to perform a plurality vote,
classify_occupation(corpus = corpus, isco_level = 3, lang = "en", num_leaves = 5) #> id iscoGroup preferredLabel #> 1: 1 251 Software and applications developers and analysts #> 2: 2 214 Engineering professionals (excluding electrotechnology) #> 3: 3 523 Cashiers and ticket clerks
For further information browse the vignettes.
labourR 1.0.0: Automatic Coding of Occupation Titles was first posted on July 24, 2020 at 7:07 am.
©2020 "R-posts.com". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at [email protected]