library(tensorMiss)We go through some basic functions related to tensor in ‘tensorMiss’ in this vignette. First, we start with constructing a tensor, which is by using the base class ‘array’ in R. The following tensor is an order-\(3\) tensor with dimensions \(3\times 4\times 2\).
example <- array(1:24, dim=c(3,4,2))As we can see, subsetting and truncating in multi-dimensional arrays are trivial, and we display them below by an example to inject missingness inside the tensor.
example[3,1,1] <- NA
print(example)
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]    1    4    7   10
#> [2,]    2    5    8   11
#> [3,]   NA    6    9   12
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   13   16   19   22
#> [2,]   14   17   20   23
#> [3,]   15   18   21   24We now quickly go through the basic unfolding and refolding functions as an example.
example.1 <- unfold(example, 1)
print(example.1)
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,]    1    4    7   10   13   16   19   22
#> [2,]    2    5    8   11   14   17   20   23
#> [3,]   NA    6    9   12   15   18   21   24Without doubt, the refolding of an unfolding matrix returns back to the original tensor, given the correct dimension.
refold(example.1, 1, dim(example))==example
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,]   NA TRUE TRUE TRUE
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,] TRUE TRUE TRUE TRUE
#> [2,] TRUE TRUE TRUE TRUE
#> [3,] TRUE TRUE TRUE TRUELastly, k-mode matrix product is performed in the following. See Kolda and Bader (2009) for more details on tensor data.
ttm(example, matrix(1:4,nrow=2), 3)
#> , , 1
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   40   52   64   76
#> [2,]   44   56   68   80
#> [3,]   NA   60   72   84
#> 
#> , , 2
#> 
#>      [,1] [,2] [,3] [,4]
#> [1,]   54   72   90  108
#> [2,]   60   78   96  114
#> [3,]   NA   84  102  120A factor-based imputation method is proposed by Cen and Lam (2024) on tensor time series. The ‘tensor_gen’ function initialises a zero-mean mode-\(K\) tensor time series with factor structure, so that at each time \(t\in\{1,2,\dots,T\}\),
\[
\mathcal{Y}_t = \mathcal{F}_t \times_1 \mathbf{A}_1 \times_2
\mathbf{A}_2 \times_3 \cdots \times_K \mathbf{A}_K + \mathcal{E}_t ,
\] where \(\mathcal{Y}_t\in\mathbb{R}^{d_1\times
d_2\times\cdots\times d_K}\) is the generated order-\(K\) tensor data, \(\mathcal{F}_t\in\mathbb{R}^{r_1\times
r_2\times\cdots\times r_K}\) is the (possibly weak) core factor
series, each \(\mathbf{A}_k\in\mathbb{R}^{d_k\times r_k}\)
for \(k\in\{1,\dots,K\}\) is the
mode-\(k\) factor loading matrix, and
\(\mathcal{E}_t\) is the error series
with the same dimension as \(\mathcal{Y}_t\). Weak cross-sectional and
serial correlations are allowed in the fibres of the error series. See
Cen and Lam (2024) for the details.
The purpose of the imputation given a tensor time series with
missingness is to estimate/impute the missing entries. First, the data
can be initialised by the ‘tensor_gen’ function. For reproducibility, a
seed parameter is required, which is 2023 by default.
K <- 3 #order 3 at each time
TT <- 20 #time length
d <- c(30,30,30) #spatial dimensions
r <- c(2,2,2) #rank of core tensor
re <- c(2,2,2) #rank of common component in error
eta <- list(c(0,0), c(0,0), c(0,0)) #strong factors
coef_f <- c(0.7, 0.3, -0.4, 0.2, -0.1) #AR(5) coefficients for core factor
coef_fe <- c(-0.7, -0.3, -0.4, 0.2, 0.1) #AR(5) coefficients for common component in error
coef_e <- c(0.8, 0.4, -0.4, 0.2, -0.1) #AR(5) coefficients for idiosyncratic part in error
data_test <- tensor_gen(K,TT,d,r,re,eta, coef_f, coef_fe, coef_e)Missing entries are represented by ‘NA’s in the data. For example, each entry is randomly missing with probability 0.3 using the ’miss_gen’ function. More missing patterns are available with the function.
data_miss <- miss_gen(data_test$X)With the ‘miss_factor_est’ function, the factor structure can be
estimated in one go. The number of factors could be either provided or
not provided, in the latter case the function estimates the number of
factors using the eigenvalue-ratio-based estimator. For the details of
estimation, see Cen and Lam (2024).
As an example, the factor loading error measured by column space
distance are computed using the ‘fle’ function.
est_result <- miss_factor_est(data_miss, r)
fle(est_result$A[[1]], data_test$A[[1]])
#> [1] 0.01580778Lastly, we can gauge the imputation performance using relative MSE (rMSE) or even the introduced q-MSE. Setting q as the length of the input vector would essentially output the relative MSE. Examples of rMSE and q-MSE with q as 100 are demonstrated.
qMSE(c(data_test$C), c(est_result$imputation), length(c(data_test$C))) #rMSE
#> [1] 0.0006705599
qMSE(c(data_test$C), c(est_result$imputation), 100) #q-MSE
#> [1] 5.78442e-06Under certain conditions (Cen and Lam 2024), the residue between the row of the estimated loading matrix and its corresponding true row under some rotation can be shown to be asymptotically normal. A consistent covariance matrix estimator can be computed by the ‘sigmaD’ function. For instance, to compute the residue on the first row of the estimated mode-2 loading matrix, the covariance matrix estimator is obtained below. The rotation matrix is also computed afterwards.
# computing the covariance matrix estimator
r2 <- r[2]
A2 <- data_test$A[[2]]
beta <- floor(0.2*(TT^0.25 * (d[2])^0.25))
D2 <- diag(x=(svd(est_result$covMatrix[[2]])$d)[1:r2], nrow=r2, ncol=r2)
# HAC_cov: HAC-type covariance matrix estimator
HAC_cov <- sigmaD(2, D2, est_result$A[[2]], est_result$imputation, data_miss, 1, beta)
# computing the rotation matrix
Amk <- data_test$A[[3]] %x% data_test$A[[1]]
R_ast <- 0
for (t in 1:TT){
  R_ast <- R_ast + unfold(data_test$Ft[t,,,],2) %*% t(Amk) %*% Amk %*% t(unfold(data_test$Ft[t,,,],2))
}
R_ast <- A2 %*% R_ast %*% t(A2)
R_ast <- R_ast/TT
Z2 <- diag(x = diag(t(A2) %*% A2), nrow=r2, ncol=r2)
Q2 <- A2 %*% diag(x=diag(solve(Z2))^0.5, nrow=r2, ncol=r2)
# H2: rotation matrix
H2 <- solve(D2) %*% t(est_result$A[[2]]) %*% R_ast %*% Q2 %*% solve(t(Q2)%*% Q2)Eventually, the standardised residue is shown below and should follow a standard normal distribution when dimensions are increased.
HAC_cov.eigen <- eigen(HAC_cov)
HAC_cov.sqrt <- HAC_cov.eigen$vectors %*% diag(sqrt(HAC_cov.eigen$values)) %*% solve(HAC_cov.eigen$vectors)
A2_1 <- (solve(HAC_cov.sqrt) %*% D2) %*% (matrix(est_result$A[[2]], nrow=d[2], ncol=r2)[1,] - (H2 %*% Q2[1,]))
A2_1
#>           [,1]
#> [1,]  3.482375
#> [2,] -0.951442