In this vignette a worked example on creating Precision-Recall and
Receiver Operator Characteristics curves are provided. Throughout this
vignette the Banknote
Authentication-dataset is used. The banknote-dataset is
a part of {SLmetrics} which is list of features and
targets, and can be called as follows:
The banknote dataset classification tasks achieves
between 95% and 99% accuracy and therefore makes a bad case for
demonstrating Precision-Recall and Receiver Operator Characteristics
curves. To alleviate this, random noise will be injected to the original
dataset as follows:
# 1) set seed
set.seed(1903)
# 2) extract indices
# for shuffling
noise <- sample(
    x = 1:nrow(banknote$features),
    size = nrow(banknote$features) * 0.50
)
# 3) reshuffle
# features and target
noise <- cbind(
    banknote$features[sample(noise),],
    target = banknote$target[sample(noise)]
)The data.frame is constructed as follows:
# 1) convert to data.frame
# and head
head(
    banknote <- cbind(
        banknote$features,
        target = banknote$target
        )
)
#>   variance skewness curtosis  entropy    target
#> 1  3.62160   8.6661  -2.8073 -0.44699 authentic
#> 2  4.54590   8.1674  -2.4586 -1.46210 authentic
#> 3  3.86600  -2.6383   1.9242  0.10645 authentic
#> 4  3.45660   9.5228  -4.0112 -3.59440 authentic
#> 5  0.32924  -4.4552   4.5718 -0.98880 authentic
#> 6  4.36840   9.6718  -3.9606 -3.16250 authentic
# 2) introduce random
# noise to the data
# NOTE: wrapped in `try()` in case 
# noise is removed
try(
    expr = {
        banknote <- rbind(
        banknote,
        noise
    )
    },
    silent = TRUE
)
# 3) convert target to binary
# value
banknote$target <- as.numeric(
    banknote$target == "inauthentic"
)To predict whether the banknotes are authentic or inauthentic a logistic regression will be trained on a training sample, and evaluated on a the test sample.
To train and test test the model a training/test split with 80% and 20%.
To evaluate the performance we will extract the response probabilities
# 1) extract class
# probabilites
class_probabilities <- predict(
    object  = model,
    newdata = subset(test, select = -target),
    type    = "response"
)
# 2) calculate class
class_probabilities <- as.matrix(
    cbind(
        class_probabilities,
        1 - class_probabilities
    )
)# 1) create actual
# value
actual <- factor(
    x = test$target,
    levels = c(1, 0),
    labels = c("inauthentic", "authentic")
)# 1) construct precision-recall 
# object
print(
    precision_recall <- prROC(
        actual   = actual,
        response = class_probabilities
    )
)
#>    threshold level       label  recall precision
#> 1        Inf     1 inauthentic 0.00000     1.000
#> 2      0.919     1 inauthentic 0.00535     1.000
#> 3      0.917     1 inauthentic 0.01070     1.000
#> 4      0.909     1 inauthentic 0.01604     1.000
#> 5      0.906     1 inauthentic 0.02139     1.000
#> 6      0.903     1 inauthentic 0.02674     1.000
#> 7      0.901     1 inauthentic 0.03209     1.000
#> 8      0.898     1 inauthentic 0.03209     0.857
#> 9      0.898     1 inauthentic 0.03743     0.875
#> 10     0.895     1 inauthentic 0.04278     0.889
#>  [ reached 'max' / getOption("max.print") -- omitted 816 rows ]The Precision-Recall object can be visualized by using
plot()
# 1) construct Receiver Operator Characteristics 
# object
print(
    receiver_operator_characteristics <- ROC(
        actual   = actual,
        response = class_probabilities
    )
)
#>    threshold level       label     tpr     fpr
#> 1        Inf     1 inauthentic 0.00000 0.00000
#> 2      0.919     1 inauthentic 0.00535 0.00000
#> 3      0.917     1 inauthentic 0.01070 0.00000
#> 4      0.909     1 inauthentic 0.01604 0.00000
#> 5      0.906     1 inauthentic 0.02139 0.00000
#> 6      0.903     1 inauthentic 0.02674 0.00000
#> 7      0.901     1 inauthentic 0.03209 0.00000
#> 8      0.898     1 inauthentic 0.03209 0.00444
#> 9      0.898     1 inauthentic 0.03743 0.00444
#> 10     0.895     1 inauthentic 0.04278 0.00444
#>  [ reached 'max' / getOption("max.print") -- omitted 816 rows ]The Receiver Operator Characteristics object can be visualized by
using plot()