Hetu-package for R is meant for algorithmic handling of Finnish personal identity numbers (PINs). The package is especially useful for those who wish to extract information from or validate a large number of PINs at a time.
The toolset for analyzing Finnish PINs was initially developed as a part of sorvi-package, but was later made into a separate package. The development of the hetu-package reached an important milestone in Fall 2020 when it was published in CRAN.
The development of hetu-package is closely related to sweidnumbr, a similar package meant for analyzing Swedish personal identity numbers (PINs) and organizational identity numbers (OINs). Hetu-package shares similar function names with sweidnumbr, when applicable.1
Finnish personal identification code, hetu
Personal identification code (or: national identification number, national identity number, personal identification number or PIN) is meant to be a unique identifier for individuals. Finnish personal identification number (henkilötunnus, hetu for short) consists of date (DDMMYY), century marker (-, + or A), personal number (NNN) and checkmark (C). Males have an odd personal number and females an even personal number.2
Personal identity codes are widely used in public and private sectors alike. They are not confidential or secret information, but like every personal information, handling hetu-codes requires consent from the individual or a valid reason.
Algorithmic handling of hetu-pins
Analyzing and extracting information from Finnish personal identity numbers is rather straightforward even with a naked eye. Hetu-package naturally excels in handling large number of PINs, which would be cumbersome otherwise.
Hetu-package has functions to extract the following information:
- hetu_date / pin_date: date of birth
- hetu_sex / pin_sex: sex, Male or Female
- hetu_age / pin_age: age in years, months or days (at the time of the query or at a desired date)
- hetu_ctrl / pin_ctrl: validity check for the PIN, TRUE or FALSE
Use of hetu-package
Installing the package in R from CRAN:
Loading the package and setting a few imaginary PINs for testing:
library(hetu) example_pins <- c("010101-0101", "111111-111C")
Hetu-function is the backbone of the package and majority of the information that can be extracted is available as a simple data frame:
There are several alternatives in extracting specific information about a group of PINs, for example date of birth. If the output of the hetu-function is saved as an object, all columns can be normally subsetted. For the convenience of the end user, the information in the data frame can also be extracted by using extract-parameter in the hetu-function or by using one of the specialized functions:
# Extracting sex hetu(example_pins, extract = "sex") ##  "Female" "Male" hetu_sex(example_pins) ##  "Female" "Male" # Extracting date of birth hetu(example_pins, extract = "date") ##  "1901-01-01" "1911-11-11" hetu_date(example_pins) ##  "1901-01-01" "1911-11-11" # Extracting information on validity hetu(example_pins, extract = "valid.pin") ##  TRUE TRUE hetu_ctrl(example_pins) ##  TRUE TRUE # Information that can be extracted only with extract-parameter hetu(example_pins, extract = "p.num") ##  "010" "111"
In contrast to other information, extracting age works only with a specialized function. In this example we will also introduce the ability to generate random PINs with rhetu-function:
example_pins2 <- rhetu(5, start = "1950-01-01", end = "1995-05-07") # Age in years hetu_age(example_pins2) ## The age in years has been calculated at 2021-01-31. ##  33 69 62 31 43 # Age in months hetu_age(example_pins2, timespan = "months") ## The age in months has been calculated at 2021-01-31. ##  403 839 752 383 521 # Age in 2011 hetu_age(example_pins2, date = "2011-01-01") ## The age in years has been calculated at 2011-01-01. ##  23 59 52 21 33 # Visualization: boxplot grouped by sex example_pins3 <- rhetu(20, start = "1950-01-01", end = "1995-05-07", p.male = 0.5) boxplot(hetu_age(example_pins3)~hetu_sex(example_pins3), xlab = "", ylab = "Age in years", col=c("cyan", "magenta")) ## The age in years has been calculated at 2021-01-31.
In some cases diagnostics information for invalid PINs might be useful:
hetu_diagnostic("321399-000G") ## hetu is.temp valid.p.num valid.checksum correct.checksum valid.date ## 21 321399-000G FALSE FALSE FALSE FALSE FALSE ## valid.day valid.month valid.year valid.length valid.century ## 21 FALSE FALSE TRUE TRUE TRUE # Print only certain columns hetu_diagnostic("321399-000G", extract = c("valid.p.num", "valid.length")) ## hetu valid.p.num valid.length ## 21 321399-000G FALSE TRUE
Business Identity Numbers (Y-tunnus, BID)
As in sweidnumbr, hetu-package has two functions that can be used with Finnish Business Identity Numbers (y-tunnus). Finnish business identity numbers have the form 1234567-8, where the last number is a checknumber.3 The following functions are available:
- bid_ctrl(bid): checks the valiity of the BID, TRUE or FALSE
- rbid(n): generates n BIDs
example_bids <- rbid(2) example_bids ##  "7128741-6" "1963928-5" bid_ctrl(example_bids) ##  TRUE TRUE
No additional information can be extracted from BIDs.
More information about sweidnumbr can be found e.g. from this blogpost: Magnusson, Mans & Bulow, Erik. 2015. R made personal (at least for swedes)!. URL: https://ropengov.org/2015/08/r-made-personal-at-least-for-swedes/↩︎
Finnish Patent and Registration Office. The Business Information System (BIS). URL: https://www.prh.fi/en/kaupparekisteri/rekisterointipalvelut/ytj.html↩︎