---
title: "F. Complex multiblock analysis"
output: 
  rmarkdown::html_vignette:
    toc: true
vignette: >
  %\VignetteIndexEntry{F. Complex multiblock analysis}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
editor_options: 
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width=6, 
  fig.height=4
)
# Legge denne i YAML på toppen for å skrive ut til tex
#output: 
#  pdf_document: 
#    keep_tex: true
# Original:
#  rmarkdown::html_vignette:
#    toc: true
```

```{r}
# Start the multiblock R package
library(multiblock)
```

# Complex  data structures

The following methods for complex data structures are available in the _multiblock_ package (function names in parentheses):

* L-PLS - Partial Least Squares in L configuration (_lpls_)
* SO-PLS-PM - Sequential and Orthogonalised PLS Path Modeling (_sopls_pm_)

## L-PLS

To showcase L-PLS we will use simulated data specifically made for L-shaped data. Regression 
using L-PLS can be either outwards from _X1_ to _X2_ and _X3_ or inwards from _X2_ and _X3_
to _X1_. In the former case, prediction can either be of _X2_ or _X3_ given _X1_. Cross-validation
is performed either on the rows of _X1_ or the columns of _X1_.

```{}
   ______N 
  |       |
  |       |
  |  X3   |
  |       |
 K|_______|
             
             
   ______N       ________J 
  |       |     |         |
  |       |     |         |
  |  X1   |     |   X2    |
  |       |     |         |
 I|_______|    I|_________|
```


## Simulated L-shaped data

We simulate two latent components in L shape with blocks having dimensions (30x20),
(20x5) and (6x20) for blocks _X1_, _X2_ and _X3_, respectively.

```{r}
set.seed(42)

# Simulate data set
sim <- lplsData(I = 30, N = 20, J = 5, K = 6, ncomp = 2)

# Split into separate blocks
X1  <- sim$X1; X2 <- sim$X2; X3 <- sim$X3
```

## Exo-L-PLS

The first L-PLS will be outwards. Predictions have to be accompanied by a direction.

```{r fig.width=5, fig.height=5}
# exo-L-PLS:
lp.exo  <- lpls(X1,X2,X3, ncomp = 2) # type = "exo" is default

# Predict X1
pred.exo.X2 <- predict(lp.exo, X1new = X1, exo.direction = "X2")

# Predict X3
pred.exo.X2 <- predict(lp.exo, X1new = X1, exo.direction = "X3")

# Correlation loading plot
plot(lp.exo)
```


## Endo-L-PLS

The second L-PLS will be inwards.

```{r}
# endo-L-PLS:
lp.endo <- lpls(X1,X2,X3, ncomp = 2, type = "endo")

# Predict X1 from X2 and X3 (in this case fitted values):
pred.endo.X1 <- predict(lp.endo, X2new = X2, X3new = X3)
```

## L-PLS cross-validation

Cross-validation comes with choices of directions when applying this to L-PLS since we have both sample
and variable links. The cross-validation routines compute RMSECV values and perform cross-validated predictions.

```{r}
# LOO cross-validation horizontally
lp.cv1 <- lplsCV(lp.exo, segments1 = as.list(1:dim(X1)[1]), trace = FALSE)

# LOO cross-validation vertically
lp.cv2 <- lplsCV(lp.exo, segments2 = as.list(1:dim(X1)[2]), trace = FALSE)

# Three-fold CV, horizontal
lp.cv3 <- lplsCV(lp.exo, segments1 = as.list(1:10, 11:20, 21:30), trace = FALSE)

# Three-fold CV, horizontal, inwards model
lp.cv4 <- lplsCV(lp.endo, segments1 = as.list(1:10, 11:20, 21:30), trace = FALSE)
```


## SO-PLS Path Modelling

The following example uses the _potato_ data and the _wine_ data to showcase some of the functions available for SO-PLS-PM analyses.

### Single SO-PLS-PM model

A model with four blocks having 5 components per input block is fitted. We set _computeAdditional_
to _TRUE_ to turn on computation of additional explained variance per added block in the model.

```{r}
# Load potato data
data(potato)

# Single path
pot.pm <- sopls_pm(potato[1:3], potato[['Sensory']], c(5,5,5), computeAdditional=TRUE)

# Report of explained variances and optimal number of components .
# Bootstrapping can be enabled to assess stability.
# (LOO cross-validation is default)
pot.pm
```

### Multiple paths in an SO-PLS-PM model

A model containing five blocks is fitted. Explained variances for all
sub-paths are estimated.

```{r}
# Load wine data
data(wine)

# All path in the forward direction
pot.pm.multiple <- sopls_pm_multiple(wine, ncomp = c(4,2,9,8))

# Report of direct, indirect and total explained variance per sub-path.
# Bootstrapping can be enabled to assess stability.
pot.pm.multiple
```