In the MixMatrix package, there are two functions for
training a linear or quadratic classifier. The usage is fairly similar
to the function MASS::lda() or MASS::qda(),
however it requires the input as an array and the group variable
provided as a vector (that is, it cannot handle data frames or the
formula interface directly, which is reasonable, as there is no
immediately clear way to make that work for a collection of
matrices).
set.seed(20180222)
library('MixMatrix')
A <- rmatrixnorm(30, mean = matrix(0, nrow=2, ncol=3)) # creating the three groups
B <- rmatrixnorm(30, mean = matrix(c(1, 0), nrow = 2, ncol = 3))
C <- rmatrixnorm(30, mean = matrix(c(0, 1), nrow = 2, ncol = 3))
ABC <- array(c(A,B,C), dim = c(2,3,90)) # combining into on array
groups <- factor(c(rep("A", 30), rep("B", 30), rep("C", 30))) # labels
prior = c(30, 30, 30) / 90 # equal prior
matlda <- matrixlda(x = ABC, grouping = groups, prior = prior) # perform LDA
matqda <- matrixqda(x = ABC, grouping = groups, prior = prior) # perform QDAThis does not sphere the data or extract an SVD or Fisher discriminant scores - it is a simple linear/quadratic discriminant function based on the likelihood function.
The matrixlda function presumes equal covariance
matrices among the groups while matrixqda fits separate
covariance for each groups.
It is possible to set variance or mean restrictions from the
MLmatrixnorm and MLmatrixt functions using the
... argument. These restrictions are applied to all
groups.
The predict method for these objects works in much the
same way as for lda or qda objects: provide
the function and the new data, then it will return the MAP class
assignments and the posterior. If no new data is provided, it will
attempt to run it on the original data if it is available in the
environment.
ABC[, , c(1, 31, 61)] # true class memberships: A, B, C
#> , , 1
#> 
#>             [,1]       [,2]      [,3]
#> [1,] -0.07020924  0.8559433 0.9827602
#> [2,]  1.30097854 -0.6893054 0.2830575
#> 
#> , , 2
#> 
#>           [,1]     [,2]        [,3]
#> [1,] 2.2374662 2.408268 -0.37053822
#> [2,] 0.9653824 0.630897 -0.08850761
#> 
#> , , 3
#> 
#>            [,1]       [,2]      [,3]
#> [1,] -1.6961217 -1.5642705 0.2045711
#> [2,]  0.7782369  0.1842012 0.3869149
#predict the membership of the first observation of each group
predict(matlda, ABC[, , c(1, 31, 61)])
#> $class
#> [1] B B A
#> Levels: A B C
#> 
#> $posterior
#>            [,1]        [,2]       [,3]
#> [1,] 0.27340107 0.690217317 0.03638161
#> [2,] 0.03833953 0.949049289 0.01261118
#> [3,] 0.54647035 0.001273576 0.45225607
#predict the membership of the first observation of each group
predict(matqda,  ABC[, , c(1, 31, 61)])
#> $class
#> [1] B B A
#> Levels: A B C
#> 
#> $posterior
#>            [,1]        [,2]       [,3]
#> [1,] 0.24302341 0.735815885 0.02116070
#> [2,] 0.03295848 0.963641160 0.00340036
#> [3,] 0.54611977 0.007871269 0.44600896In this example, points from classes A, B, and C were selected and they ended up being classified as B, B, and A. The QDA and LDA rules had only minor disagreements, which is to be expected when they do truly have the same covariances.
Suppose there are two populations, \(\pi_1\) and \(\pi_2\) with prior probabilities of belonging to these classes, \(p_1\) and \(p_2\). Define a function, \(c(1|2)\) as the cost of misclassifying a member of population \(\pi_2\) as a member of class \(1\) (and vice versa). Further, define \(P(1|2)\) as the probability of classifying a member of population \(\pi_2\) as a member of class \(1\) (and vice versa). Then we define the expected cost of misclassification as:
\[ECM = c(2|1)P(2|1)p_1 + c(1|2)P(1|2)p_2 \]
A reasonable classification rule based on ECM is the following:
Classify as class \(1\) if:
\[ \frac{f_1(x)}{f_2(x)} \geq \frac{c(1|2) p_2}{c(2|1)p_1} \]
Where \(f_i(x)\) is the probability density function for group \(\pi_i\) evaluated at \(x\).
Recall the probability density function for a matrix variate normal distribution:
\[f(\mathbf{X};\mathbf{M}, \mathbf{U}, \mathbf{V}) = \frac{\exp\left( -\frac{1}{2} \, \mathrm{tr}\left[ \mathbf{V}^{-1} (\mathbf{X} - \mathbf{M})^{T} \mathbf{U}^{-1} (\mathbf{X} - \mathbf{M}) \right] \right)}{(2\pi)^{np/2} |\mathbf{V}|^{n/2} |\mathbf{U}|^{p/2}} \]
\(\mathbf{X}\) and \(\mathbf{M}\) are \(n \times p\), \(\mathbf{U}\) is \(n \times n\) and describes the covariance relationship between the rows, and \(\mathbf{V}\) is \(p \times p\) and describes the covariance relationship between the columns.
A decision rule for this case:
\[\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{X}^{T} \mathbf{U}_1^{-1} \mathbf{X} - \mathbf{V}_2^{-1} \mathbf{X}^{T} \mathbf{U}_2^{-1} \mathbf{X}) \\ & & +(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}_1^{-1} \mathbf{M}_1^{T} \mathbf{U}_1^{-1} \mathbf{M}_1 - \mathbf{V}_2^{-1} \mathbf{M}_2^{T} \mathbf{U}_2^{-1} \mathbf{M}_2) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}\_1|-\log|\mathbf{U}\_2|)+ n(\log|\mathbf{V}\_1|-\log|\mathbf{V}\_2|) ) \end{eqnarray}\]
If \(R(\mathbf{X}) \geq \log(c(1|2)p_2) - \log(c(2|1)p_1)\), assign \(\mathbf{X}\) to group \(1\), otherwise assign to group \(2\).
In the multivariate case, this is equivalent to the LDA/QDA rules - term (1) is the quadratic form which vanishes in case of equal covariances between groups, term (2) is the LDA term, and terms (3) and (4) are constants which depend on the parameters and not \(\mathrm{X}\).
Typically, the models we have used have implicitly used an equal probability prior and an equal cost of misclassification, but other inputs can be specified. In case of equal priors and equal cost of misclassification, this term is 0.
If the two groups have the same covariances, then this simplifies. The classification rule is then: \[\begin{eqnarray} R(\mathbf{X}) & = & \mathrm{trace}\big[ (\mathbf{V}^{-1} (\mathbf{M}_1 -\mathbf{M}_2)^{T}\mathbf{U}^{-1}) \mathbf{X} \\ & & -\frac{1}{2}(\mathbf{V}^{-1} \mathbf{M}_1^{T} \mathbf{U}^{-1} \mathbf{M}_1 - \mathbf{V}^{-1} \mathbf{M}_2^{T} \mathbf{U}^{-1} \mathbf{M}_2) \big] \\ & \geq & \log(c(1|2)p_2) - \log(c(2|1)p_1) \end{eqnarray}\]
Classify to group \(1\) if the last term is true. Note this is linear in \(\mathbf{X}\). This is directly analogous to LDA in the multivariate case.
The factor \(R\) for each group \(g\) in a QDA setting is:
\[\begin{eqnarray} R_g(\mathbf{X}) & = & \mathrm{trace}\big[ -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{X}^{T} \mathbf{U}_g^{-1} \mathbf{X} +(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1}) \mathbf{X} -\frac{1}{2}(\mathbf{V}_g^{-1} \mathbf{M}_g^{T} \mathbf{U}_g^{-1} \mathbf{M}_g) \big] \\ & & -\frac{1}{2}(p (\log|\mathbf{U}\_g|)+ n(\log|\mathbf{V}\_g|) ) + \log p_g \end{eqnarray}\] When \(U_i = U_j\) for all groups \(i,j\), several of these terms cancel. The posterior probability is:
\[ P(\mathbf{X} \in g) = \frac{ \exp (R_g (\mathbf{X}))}{\sum_i \exp (R_i(\mathbf{X}))} \] with the bottom sum being over all groups \(i\).
The objects returned by matrixlda and
matrixqda are S3 classes. See below example output:
matlda
#> $prior
#>         A         B         C 
#> 0.3333333 0.3333333 0.3333333 
#> 
#> $counts
#>  A  B  C 
#> 30 30 30 
#> 
#> $means
#> , , 1
#> 
#>            [,1]        [,2]       [,3]
#> [1,] 0.08629672 -0.06362916 -0.3045685
#> [2,] 0.03851982 -0.01375580  0.3978759
#> 
#> , , 2
#> 
#>           [,1]       [,2]       [,3]
#> [1,] 1.2872487  1.1199220 1.01282774
#> [2,] 0.3044151 -0.3797203 0.06539076
#> 
#> , , 3
#> 
#>            [,1]       [,2]       [,3]
#> [1,] 0.07377351 -0.3278233 -0.3031961
#> [2,] 0.92568372  1.3738475  0.9353985
#> 
#> 
#> $scaling
#> [1] 1.002299
#> 
#> $U
#>            [,1]       [,2]
#> [1,] 1.00000000 0.03243759
#> [2,] 0.03243759 0.97819549
#> 
#> $V
#>             [,1]        [,2]        [,3]
#> [1,]  1.00000000  0.01988966 -0.04879263
#> [2,]  0.01988966  0.93839973 -0.04563610
#> [3,] -0.04879263 -0.04563610  0.93321444
#> 
#> $lev
#> [1] "A" "B" "C"
#> 
#> $N
#> [1] 90
#> 
#> $method
#> [1] "normal"
#> 
#> $nu
#> NULL
#> 
#> $call
#> matrixlda(x = ABC, grouping = groups, prior = prior)
#> 
#> attr(,"class")
#> [1] "matrixlda"
matqda
#> $prior
#>         A         B         C 
#> 0.3333333 0.3333333 0.3333333 
#> 
#> $counts
#>  A  B  C 
#> 30 30 30 
#> 
#> $means
#> , , 1
#> 
#>            [,1]        [,2]       [,3]
#> [1,] 0.08629672 -0.06362916 -0.3045685
#> [2,] 0.03851982 -0.01375580  0.3978759
#> 
#> , , 2
#> 
#>           [,1]       [,2]       [,3]
#> [1,] 1.2872487  1.1199220 1.01282774
#> [2,] 0.3044151 -0.3797203 0.06539076
#> 
#> , , 3
#> 
#>            [,1]       [,2]       [,3]
#> [1,] 0.07377351 -0.3278233 -0.3031961
#> [2,] 0.92568372  1.3738475  0.9353985
#> 
#> 
#> $U
#> , , 1
#> 
#>            [,1]       [,2]
#> [1,] 1.00000000 0.02620514
#> [2,] 0.02620514 0.95989647
#> 
#> , , 2
#> 
#>            [,1]       [,2]
#> [1,]  1.0000000 -0.0173514
#> [2,] -0.0173514  0.9456145
#> 
#> , , 3
#> 
#>          [,1]     [,2]
#> [1,] 1.000000 0.129851
#> [2,] 0.129851 1.061265
#> 
#> 
#> $V
#> , , 1
#> 
#>             [,1]        [,2]        [,3]
#> [1,]  1.05498245 -0.07418973  0.09476411
#> [2,] -0.07418973  0.98800103 -0.04390729
#> [3,]  0.09476411 -0.04390729  0.79903040
#> 
#> , , 2
#> 
#>             [,1]       [,2]        [,3]
#> [1,]  1.09941477 0.12678075 -0.03952663
#> [2,]  0.12678075 0.94606849  0.03910645
#> [3,] -0.03952663 0.03910645  0.85913748
#> 
#> , , 3
#> 
#>              [,1]         [,2]       [,3]
#> [1,]  0.833193637  0.008466708 -0.2084547
#> [2,]  0.008466708  0.875328749 -0.1142510
#> [3,] -0.208454704 -0.114250988  1.1520533
#> 
#> 
#> $lev
#> [1] "A" "B" "C"
#> 
#> $N
#> [1] 90
#> 
#> $method
#> [1] "normal"
#> 
#> $nu
#> NULL
#> 
#> $call
#> matrixqda(x = ABC, grouping = groups, prior = prior)
#> 
#> attr(,"class")
#> [1] "matrixqda"This vignette was built using rmarkdown.
sessionInfo()
#> R version 4.4.1 (2024-06-14)
#> Platform: x86_64-redhat-linux-gnu
#> Running under: Fedora Linux 40 (Xfce)
#> 
#> Matrix products: default
#> BLAS/LAPACK: FlexiBLAS OPENBLAS-OPENMP;  LAPACK version 3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C             
#>  [3] LC_TIME=en_US.utf8        LC_COLLATE=C             
#>  [5] LC_MONETARY=en_US.utf8    LC_MESSAGES=en_US.utf8   
#>  [7] LC_PAPER=en_US.utf8       LC_NAME=C                
#>  [9] LC_ADDRESS=C              LC_TELEPHONE=C           
#> [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C      
#> 
#> time zone: America/Chicago
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] MixMatrix_0.2.8
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.47        
#>  [5] glue_1.7.0        cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1
#>  [9] rmarkdown_2.28    lifecycle_1.0.4   cli_3.6.3         sass_0.4.9       
#> [13] jquerylib_0.1.4   compiler_4.4.1    tools_4.4.1       evaluate_1.0.0   
#> [17] bslib_0.8.0       Rcpp_1.0.13       yaml_2.3.10       rlang_1.1.4      
#> [21] jsonlite_1.8.9knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(20180222)
library('MixMatrix')
A <- rmatrixnorm(30, mean = matrix(0, nrow=2, ncol=3)) # creating the three groups
B <- rmatrixnorm(30, mean = matrix(c(1, 0), nrow = 2, ncol = 3))
C <- rmatrixnorm(30, mean = matrix(c(0, 1), nrow = 2, ncol = 3))
ABC <- array(c(A,B,C), dim = c(2,3,90)) # combining into on array
groups <- factor(c(rep("A", 30), rep("B", 30), rep("C", 30))) # labels
prior = c(30, 30, 30) / 90 # equal prior
matlda <- matrixlda(x = ABC, grouping = groups, prior = prior) # perform LDA
matqda <- matrixqda(x = ABC, grouping = groups, prior = prior) # perform QDA
ABC[, , c(1, 31, 61)] # true class memberships: A, B, C
#predict the membership of the first observation of each group
predict(matlda, ABC[, , c(1, 31, 61)])
#predict the membership of the first observation of each group
predict(matqda,  ABC[, , c(1, 31, 61)])
matlda
matqda
sessionInfo()