The hetu R package provides tools to work with Finnish personal identity numbers (hetu, short for the Finnish term “henkilötunnus”). Some functions can also be used with Finnish Business ID numbers (y-tunnus).
Where possible, we have unified the syntax with sweidnumbr.
Install the current devel version in R:
devtools::install_github("ropengov/hetu")Test the installation by loading the library:
library(hetu)We also recommend setting the UTF-8 encoding:
Sys.setlocale(locale="UTF-8") Finnish personal identification numbers (Finnish: henkilötunnus, hetu in short), are used to identify citizens. Hetu PIN consists of eleven characters: DDMMYYCZZZQ, where DDMMYY is the day, month and year of birth, C is the century marker, ZZZ is the individual number and Q is the control character.
Males have odd and females have even individual number. The control character is determined by dividing DDMMYYZZZ by 31 and using the remainder (modulo 31) to pick up the corresponding character from the string “0123456789ABCDEFHJKLMNPRSTUVWXY”. For example, if the remainder is 0, the control character is 0 and if the remainder is 12, the control character is C.
A valid individual number is between 002-899. Individual numbers 900-999 are not in normal use and are used only for temporary or artificial PINs. These temporary PINs are sometimes used in different organizations, such as insurance companies or hospitals, if the individual is not a Finnish citizen, a permanent resident or if the exact identity of the individual cannot be determined at the time. Artificial or temporary PINs are not intended for continuous, long term use and they are not usually accepted by PIN validity checking algorithms.
Temporary PINs provide similar information about individual’s birth date or sex as regular PINs. Temporary PINs can also be safely used for testing purposes, as such a number cannot be linked to any real person.
The basic hetu function can be used to view information included in a Finnish personal identification number. The data is outputted as a data frame.
example_pin <- "111111-111C"
hetu(example_pin)
#>          hetu  sex p.num ctrl.char       date day month year century valid.pin
#> 1 111111-111C Male   111         C 1911-11-11  11    11 1911       -      TRUEThe output can be made prettier, for example by using knitr:
knitr::kable(hetu(example_pin))| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | 
|---|---|---|---|---|---|---|---|---|---|
| 111111-111C | Male | 111 | C | 1911-11-11 | 11 | 11 | 1911 | - | TRUE | 
The hetu function also accepts vectors with several identification numbers as input:
example_pins <- c("010101-0101", "111111-111C")
knitr::kable(hetu(example_pins))| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | 
|---|---|---|---|---|---|---|---|---|---|
| 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | 
| 111111-111C | Male | 111 | C | 1911-11-11 | 11 | 11 | 1911 | - | TRUE | 
The hetu function does not print warning messages to the user if input vector contains invalid PINs. Validity of specific PINs can be determined by looking at the valid.pin column.
hetu(c("010101-0102", "111311-111C", "010101-0101"))
#>          hetu    sex p.num ctrl.char       date day month year century
#> 1 010101-0102 Female   010         2 1901-01-01   1     1 1901       -
#> 2 111311-111C   Male   111         C       <NA>  11    NA 1911       -
#> 3 010101-0101 Female   010         1 1901-01-01   1     1 1901       -
#>   valid.pin
#> 1     FALSE
#> 2     FALSE
#> 3      TRUEInformation contained in the PIN can be extracted with a generic extract parameter. Valid values for extraction are hetu, sex, personal.number, ctrl.char, date, day, month, year, century, valid.pin and is.temp.
is.temp can be extracted only if allow.temp is set to TRUE. If allow.temp is set to FALSE (default), temporary PINs are filtered from the output and information provided by is.temp would be meaningless.
hetu(example_pins, extract = "sex")
#> [1] "Female" "Male"
hetu(example_pins, extract = "ctrl.char")
#> [1] "1" "C"Some fields can be extracted with specialized functions. Extracting sex with hetu_sex function:
hetu_sex(example_pins)
#> [1] "Female" "Male"Extracting age at current date and at a given date with hetu_age function:
hetu_age(example_pins)
#> The age in years has been calculated at 2022-05-20.
#> [1] 121 110
hetu_age(example_pins, date = "2012-01-01")
#> The age in years has been calculated at 2012-01-01.
#> [1] 111 100
hetu_age(example_pins, timespan = "months")
#> The age in months has been calculated at 2022-05-20.
#> [1] 1456 1326Dates (birth dates) also have their own function, hetu_date.
hetu_date(example_pins)
#> [1] "1901-01-01" "1911-11-11"The basic hetu function output includes information on the validity of each pin, which can be extracted by using hetu-function with valid.pin as extract parameter.
The validity of the PINs can also be determined by using the hetu_ctrl function, which produces a vector:
hetu_ctrl(c("010101-0101", "111111-111C")) # TRUE TRUE
#> [1] TRUE TRUE
hetu_ctrl("010101-1010") # FALSE
#> [1] FALSEThe package functions can be made to accept artificial or temporary personal identification numbers. Artificial and temporary PINs can be used normally by allowing them through allow.temp parameter.
example_temp_pin <- "010101A900R"
knitr::kable(hetu(example_temp_pin, allow.temp = TRUE))| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp | 
|---|---|---|---|---|---|---|---|---|---|---|
| 010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE | 
A vector with regular and temporary PINs mixed together prints only regular PINs, if allow.temp is not set to TRUE. Automatic omitting of temporary PINs does not produce a visible error message and therefore users need to be cautious if they want to use temporary PINs.
If temporary PINs are not explicitly allowed and the input vector consists of temporary PINs only, the function will return an error.
example_temp_pins <- c("010101A900R", "010101-0101")
hetu_ctrl("010101A900R", allow.temp = FALSE)
#> [1] NA
knitr::kable(hetu(example_temp_pins))| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | 
When allow.temp is set to TRUE, all PINs are handled as if they were regular PINs.
knitr::kable(hetu(example_temp_pins, allow.temp = TRUE))| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp | 
|---|---|---|---|---|---|---|---|---|---|---|
| 010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE | 
| 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | FALSE | 
hetu_ctrl("010101A900R", allow.temp = TRUE)
#> [1] TRUEValidation function hetu_ctrl produces a FALSE for every artificial / temporary PIN, if they are not explicitly allowed.
knitr::kable(hetu(example_temp_pins)) #FALSE TRUE| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | |
|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | 
knitr::kable(hetu(example_temp_pins, allow.temp = TRUE)) #TRUE TRUE| hetu | sex | p.num | ctrl.char | date | day | month | year | century | valid.pin | is.temp | 
|---|---|---|---|---|---|---|---|---|---|---|
| 010101A900R | Female | 900 | R | 2001-01-01 | 1 | 1 | 2001 | A | TRUE | TRUE | 
| 010101-0101 | Female | 010 | 1 | 1901-01-01 | 1 | 1 | 1901 | - | TRUE | FALSE | 
Random PINs can be generated by using the rpin function.
rhetu(n = 4)
#> [1] "070502-3401" "030388-1862" "290391-7615" "151219A8600"
rhetu(n = 4, start.date = "1990-01-01", end.date = "2005-01-01")
#> [1] "151190-6358" "040494-121Y" "021297-2170" "280899-296L"The number of males in the generated sample can be changed with parameter p.male. Default is 0.4.
random_sample <- rhetu(n = 4, p.male = 0.8)
table(random_sample)
#> random_sample
#> 030799+449L 120845-060R 220783-518Y 260661-539R 
#>           1           1           1           1The default proportion of artificial / temporary PINs is 0.0, meaning that no artificial / temporary PINs are generated by default.
temp_sample <- rhetu(n = 4, p.temp = 0.5)
table(hetu(temp_sample, allow.temp = TRUE, extract = "is.temp"))
#> 
#> FALSE 
#>     4In addition to information mentioned in the section Extracting specific information, the user can choose to print additional columns containing information about checks done on PINs. The diagnostic checks produce a TRUE or FALSE for the following categories: valid.p.num, valid.checksum, correct.checksum, valid.date, valid.day, valid.month, valid.year, valid.length and valid.century, FALSE meaning that hetu is somehow incorrect.
diagnosis_example <- c("010101-0102", "111111-111Q", 
"010101B0101", "320101-0101", "011301-0101", 
"010101-01010", "010101-0011")
head(hetu(diagnosis_example, diagnostic = TRUE), 3)
#>          hetu    sex p.num ctrl.char       date day month year century
#> 1 010101-0102 Female   010         2 1901-01-01   1     1 1901       -
#> 2 111111-111Q   Male   111         Q 1911-11-11  11    11 1911       -
#> 3 010101B0101 Female   010         1       <NA>   1     1   NA       B
#>   valid.pin valid.p.num valid.ctrl.char correct.ctrl.char valid.date valid.day
#> 1     FALSE        TRUE            TRUE             FALSE       TRUE      TRUE
#> 2     FALSE        TRUE           FALSE             FALSE       TRUE      TRUE
#> 3     FALSE        TRUE            TRUE              TRUE      FALSE      TRUE
#>   valid.month valid.year valid.length valid.century
#> 1        TRUE       TRUE         TRUE          TRUE
#> 2        TRUE       TRUE         TRUE          TRUE
#> 3        TRUE       TRUE         TRUE         FALSEDiagnostic information can be examined more closely by using subset or by using a separate hetu_diagnostics function. The user can print all diagnostic information for all PINs in the dataset:
tail(hetu_diagnostic(diagnosis_example), 3)
#>           hetu is.temp valid.p.num valid.ctrl.char correct.ctrl.char valid.date
#> 5  011301-0101   FALSE        TRUE            TRUE             FALSE      FALSE
#> 6 010101-01010   FALSE        TRUE            TRUE              TRUE       TRUE
#> 7  010101-0011   FALSE       FALSE            TRUE             FALSE       TRUE
#>   valid.day valid.month valid.year valid.length valid.century
#> 5      TRUE       FALSE       TRUE         TRUE          TRUE
#> 6      TRUE        TRUE       TRUE        FALSE          TRUE
#> 7      TRUE        TRUE       TRUE         TRUE          TRUEBy using extract parameter, the user can choose which columns will be printed in the output table. Valid extract values are listed in the function’s help file.
hetu_diagnostic(diagnosis_example, extract = c("valid.century", "correct.checksum"))
#> Error in hetu_diagnostic(diagnosis_example, extract = c("valid.century", : Trying to extract invalid diagnostic(s)Because of the way PINs are handled in inside hetu-function, the diagnostics-function can show unexpected warning messages or introduce NAs by coercion if the date-part of the PIN is too long. This may result in inability to handle the PIN at all!
# Faulty example
hetu_diagnostic(c("01011901-01010"))The package has also the ability to generate Finnish Business ID codes (y-tunnus) and check their validity. Unlike with personal identification numbers, no additional information can be extracted from Business IDs.
Similar to hetu PINs, random Finnish Business IDs (y-tunnus) can be generated by using rbid function.
bid_sample <- rbid(3)
bid_sample
#> [1] "0991107-0" "8377128-0" "1286283-9"The validity of Finnish Business Identity Codes can be checked with a similar function to hetu_ctrl: bid_ctrl.
bid_ctrl(c("0737546-2", "1572860-0")) # TRUE TRUE
#> [1] TRUE TRUE
bid_ctrl("0737546-1") # FALSE
#> [1] FALSEData frames generated by hetu function work well with tidyverse/dplyr workflows as well.
library(hetu)
library(tidyverse)
library(dplyr)
# Generate data for this example
hdat<-tibble(pin=rhetu(n = 4, start_date = "1990-01-01", end_date = "2005-01-01"))
# Extract all the hetu information to tibble format
hdat<-hdat %>%
  mutate(result=map(.x=pin,.f=hetu::hetu)) %>% unnest(cols=c(result))
hdatThis work can be freely used, modified and distributed under the open license specified in the DESCRIPTION file.
Kindly cite the work as follows
citation("hetu")
#> 
#> Kindly cite the hetu R package as follows:
#> 
#>   Pyry Kantanen, Mans Magnusson, Jussi Paananen and Leo Lahti (rOpenGov
#>   2022). hetu: Structural Handling of Finnish Personal Identity Codes.
#>   R package version 1.0.7 URL: http://github.com/rOpenGov/hetu
#> 
#> A BibTeX entry for LaTeX users is
#> 
#>   @Misc{,
#>     title = {hetu: Structural Handling of Finnish Personal Identity Codes},
#>     author = {Pyry Kantanen and Mans Magnusson and Jussi Paananen and Leo Lahti},
#>     url = {https://github.com/rOpenGov/hetu},
#>     year = {2022},
#>     note = {R package version 1.0.7},
#>   }
#> 
#> Many thanks for all contributors!This vignette was created with
sessionInfo()
#> R version 4.2.0 (2022-04-22)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur/Monterey 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] C/fi_FI.UTF-8/fi_FI.UTF-8/C/fi_FI.UTF-8/fi_FI.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] hetu_1.0.7
#> 
#> loaded via a namespace (and not attached):
#>  [1] lubridate_1.8.0 digest_0.6.29   R6_2.5.1        backports_1.4.1
#>  [5] jsonlite_1.8.0  magrittr_2.0.3  evaluate_0.15   highr_0.9      
#>  [9] stringi_1.7.6   rlang_1.0.2     cli_3.3.0       jquerylib_0.1.4
#> [13] bslib_0.3.1     generics_0.1.2  checkmate_2.1.0 rmarkdown_2.14 
#> [17] tools_4.2.0     stringr_1.4.0   parallel_4.2.0  xfun_0.31      
#> [21] yaml_2.3.5      fastmap_1.1.0   compiler_4.2.0  htmltools_0.5.2
#> [25] knitr_1.39      sass_0.4.1