library(gt)
library(gMCP)
library(lrstat)
library(graphicalMCP)C:\10gSfZ83ae05fe14aa8-testing.R
Consider a confirmatory clinical trial comparing a test treatment (treatment) against the control treatment (control) for a disease. There are two doses of treatment: the low dose and the high dose. There are three endpoints included in the multiplicity adjustment strategy, which are the primary endpoint (PE) and two secondary endpoints (SE1 and SE2). In total, there are six null hypotheses: \(H_1\), \(H_3\) and \(H_5\) are the primary hypothesis and two secondary hypotheses respectively for the low dose versus control; \(H_2\), \(H_4\) and \(H_6\) are the primary hypothesis and two secondary hypotheses respectively for the high dose versus control.
There are clinical considerations which constrain the structure of multiple comparison procedures, and which can be flexibly incorporated using graphical approaches. First, the low and the high doses are considered equally important, which means that rejecting the primary hypothesis for either dose versus control leads to a successful trial. Regarding secondary hypotheses, each one is tested only if its corresponding primary hypothesis has been rejected. This means that \(H_3\) and \(H_5\) are tested only after \(H_1\) has been rejected; \(H_4\) and \(H_6\) are tested only after \(H_2\) has been rejected.
In addition, there are some statistical considerations to complete the graph. The primary hypotheses \(H_1\) and \(H_2\) will have an equal hypothesis weight of 0.5. The secondary hypotheses have a hypothesis weight of 0. When a primary hypothesis has been rejected, its weight will propagate along three outgoing edges: one to the other primary hypothesis and two to its descendant secondary hypotheses. The edge to the other primary hypothesis will have a transition weight of 0.5; the two edges to the descendant secondary hypotheses will have an equal transition weight of 0.25. Between the secondary hypotheses for each dose-control comparison, we will have an edge of a transition weight of 1 (or very close to 1 to allow \(\epsilon\) edges). The hypothesis weights for a dose-control comparison group will be propagated to the primary hypothesis for the other dose-control comparison, but only after all hypotheses in the first dose-control comparison group have been deleted. With these specifications, we can create the following graph.
hypotheses <- c(0.5, 0.5, 0, 0, 0, 0)
epsilon <- 1e-5
transitions <- rbind(
  c(0, 0.5, 0.25, 0, 0.25, 0),
  c(0.5, 0, 0, 0.25, 0, 0.25),
  c(0, 0, 0, 0, 1, 0),
  c(epsilon, 0, 0, 0, 0, 1 - epsilon),
  c(0, epsilon, 1 - epsilon, 0, 0, 0),
  c(0, 0, 0, 1, 0, 0)
)
hyp_names <- c("H1", "H2", "H3", "H4", "H5", "H6")
g <- graph_create(hypotheses, transitions, hyp_names)
plot_layout <- rbind(
  c(0.15, 0.5),
  c(0.65, 0.5),
  c(0, 0),
  c(0.5, 0),
  c(0.3, 0),
  c(0.8, 0)
)
plot(
  g,
  layout = plot_layout,
  eps = epsilon,
  edge_curves = c(pairs = 0.8),
  vertex.size = 35
)C:\10gSfZ83ae05fe14aa8-testing.R
Given a set of p-values for \(H_1, \ldots,
H_6\), the graphical multiple comparison procedure can be
performed to control the family-wise error rate (FWER) at the
significance level alpha. For one-sided p-values,
alpha is often set to 0.025 (default). First, we perform a
Bonferroni-based procedure using the closure principle (Bretz et al. 2009). After running the
procedure, none of these hypotheses is rejected. These results are
identical to using graph_test_shortcut().
p_values <- c(0.015, 0.013, 0.01, 0.007, 0.1, 0.0124)
test_results <- graph_test_closure(
  g,
  p = p_values,
  alpha = 0.025,
  verbose = TRUE,
  test_values = TRUE
)
test_results$outputs$adjusted_p
#>    H1    H2    H3    H4    H5    H6 
#> 0.026 0.026 0.028 0.028 0.100 0.028
test_results$outputs$rejected
#>    H1    H2    H3    H4    H5    H6 
#> FALSE FALSE FALSE FALSE FALSE FALSE
# Same testing results as
# 'graph_test_shortcut(g, p = p_values, alpha = 0.025)$outputs$rejected'C:\10gSfZ83ae05fe14aa8-testing.R
To investigate the closure and how every intersection hypothesis is
rejected, one could obtain the detailed output by specifying
verbose = TRUE. Because all hypotheses are tested using
Bonferroni tests, the adjusted p-value for every intersection hypothesis
is the same as the adjusted p-value for group 1 (meaning all hypotheses
are in the same group for Bonferroni tests). An intersection hypothesis
is rejected if its adjusted p-value is less than or equal to
alpha.
Adjusted p-values are initially calculated by groups of hypotheses. In this case there is only one group, which includes all hypotheses, but there can be more. The adjusted p-value for an intersection hypothesis is the minimum across groups within that intersection. Finally, the adjusted p-value for an individual hypothesis for the whole procedure is equal to the maximum of adjusted p-values across intersections involving that hypothesis.
test_results_verbose <-
  graph_test_closure(
    g,
    p = p_values,
    alpha = 0.025,
    verbose = TRUE
  )
head(test_results_verbose$details$results)
#>   Intersection  H1  H2 H3 H4 H5 H6 adj_p_grp1 adj_p_inter reject_intersection
#> 1       111111 0.5 0.5  0  0  0  0      0.026       0.026               FALSE
#> 2       111110 0.5 0.5  0  0  0 NA      0.026       0.026               FALSE
#> 3       111101 0.5 0.5  0  0 NA  0      0.026       0.026               FALSE
#> 4       111100 0.5 0.5  0  0 NA NA      0.026       0.026               FALSE
#> 5       111011 0.5 0.5  0 NA  0  0      0.026       0.026               FALSE
#> 6       111010 0.5 0.5  0 NA  0 NA      0.026       0.026               FALSEC:\10gSfZ83ae05fe14aa8-testing.R
An equivalent way to obtain rejections is via adjusting significance
levels. A hypothesis is rejected if its p-value is less than or equal to
its adjusted significance level. One can obtain adjusted significance
levels for every hypothesis in every intersection hypothesis in the
closure by specifying test_values = TRUE. A hypothesis is
rejected by the closed testing procedure if all intersection
hypotheses involving it have been rejected. An intersection hypothesis
is rejected if any null hypotheses within it have been
rejected.
test_results_test_values <-
  graph_test_closure(
    g,
    test_types = "b",
    p = p_values,
    alpha = 0.025,
    test_values = TRUE
  )
head(test_results_test_values$test_values$results)
#>   Intersection Hypothesis       Test      p Weight Alpha Inequality_holds
#> 1       111111         H1 bonferroni 0.0150    0.5 0.025            FALSE
#> 2       111111         H2 bonferroni 0.0130    0.5 0.025            FALSE
#> 3       111111         H3 bonferroni 0.0100    0.0 0.025            FALSE
#> 4       111111         H4 bonferroni 0.0070    0.0 0.025            FALSE
#> 5       111111         H5 bonferroni 0.1000    0.0 0.025            FALSE
#> 6       111111         H6 bonferroni 0.0124    0.0 0.025            FALSEC:\10gSfZ83ae05fe14aa8-testing.R
The framework of graphicalMCP allows a mixture of tests to improve the performance from the Bonferroni-based graphical approaches. We can group hypotheses into multiple subgroups and perform a separate test for every subgroup. Currently, graphicalMCP supports Bonferroni, parametric and Simes tests. To connect results from different subgroups of hypotheses, Bonferroni tests are used. Here, we will show two examples. The first example applies parametric tests to primary hypotheses (Xi et al. 2017), and the second example applies Simes tests to two subgroups of secondary hypotheses in addition to parametric tests to primary hypotheses.
In this example, we assume that the test statistics for primary hypotheses follow an asymptotic bivariate normal distribution. Under their null hypotheses \(H_1\) and \(H_2\), the mean of the distribution is 0. The correlation between test statistics for \(H_1\) and \(H_2\) can be calculated as a function of sample size. Assume that the sample size for the control, the low and the high doses is \(n_0\), \(n_1\) and \(n_2\), respectively. Then the correlation between test statistics for \(H_1\) and \(H_2\) is \(\rho_{12}=\left(\frac{n_1}{n_1+n_0}\right)^{1/2}\left(\frac{n_2}{n_2+n_0}\right)^{1/2}\). Under the equal randomization, this correlation is 0.5.
For the intersection hypothesis \(H_1\cap
H_2\cap H_3\cap H_4\cap H_5\cap H_6\), hypothesis weights are
0.5, 0.5, 0, 0, 0, and 0, respectively for \(H_1,\ldots, H_6\). Assume one-sided
p-values for these hypotheses are \(p_1,\ldots,p_6\), respectively. The
intersection hypothesis is rejected by the Bonferroni test if \(p_1\leq 0.5\alpha\) or \(p_2\leq 0.5\alpha\). Alternatively, a
parametric test utilizes the correlation between test statistics \(t_1=\Phi^{-1}(1 - p_1)\) and \(t_2=\Phi^{-1}(1 - p_2)\). The intersection
hypothesis can be rejected if \(p_1\leq
c\times 0.5\alpha\) or \(p_2\leq
c\times 0.5\alpha\), where the \(c\) value is the adjustment due to the
correlation between \(t_1\) and \(t_2\). More specifically, \(c\) can be solved as a solution to \(Pr\{(p_1\leq c\times 0.5\alpha)\cup (p_2\leq
c\times 0.5\alpha)\}=\alpha\). For a given correlation \(\rho_{12}\), the \(c\) value can be solved using the
uniroot function and the mvtnorm package. For
example, if \(\rho_{12}=0.5\), the
\(c\) value is 1.078. Then we can
obtain the adjusted significance level for \(H_1\) and \(H_2\) as \(c\times 0.5\alpha\). Alternatively, we can
calculate the adjusted p-value for \(H_1\cap
H_2\cap H_3\cap H_4\cap H_5\cap H_6\) as \(Pr\{(P_1\leq \min{\{p_1, p_2\}})\cup (P_2\leq
\min{\{p_1, p_2\}})\}\).
To implement this procedure, we need to create two subgroups: one for
\(H_1\) and \(H_2\), and one for the rest of the
hypotheses. For the first subgroup of hypotheses, we apply parametric
tests; for the second subgroup (and its subsets), we apply Bonferroni
tests. Three additional inputs are needed: to specify the grouping
information with test_groups, to identify the types of
tests for every subgroup with test_types, and to provide
the correlation matrix for parametric tests with test_corr.
Assume the correlation is 0.5 between test statistics for primary
hypotheses. The procedure rejects \(H_1\) and \(H_2\), but no others. Compared with the
Bonferroni-based graphical approach which didn’t rejected any
hypothesis, this illustrates the power improvement of using parametric
tests over Bonferroni tests.
corr_12 <- matrix(0.5, nrow = 2, ncol = 2)
diag(corr_12) <- 1
test_results_parametric <-
  graph_test_closure(
    g,
    p = p_values,
    alpha = 0.025,
    test_groups = list(c(1, 2), 3:6),
    test_types = c("parametric", "bonferroni"),
    test_corr = list(corr_12, NA),
    test_values = TRUE
  )
test_results_parametric$outputs$adjusted_p
#>         H1         H2         H3         H4         H5         H6 
#> 0.02413846 0.02413846 0.02800000 0.02800000 0.10000000 0.02800000
test_results_parametric$outputs$rejected
#>    H1    H2    H3    H4    H5    H6 
#>  TRUE  TRUE FALSE FALSE FALSE FALSE
head(test_results_parametric$test_values$results)
#>   Intersection Hypothesis       Test      p      c_value Weight Alpha
#> 1       111111         H1 parametric 0.0150 1.0782936582    0.5 0.025
#> 2       111111         H2 parametric 0.0130 1.0782936582    0.5 0.025
#> 3       111111         H3 bonferroni 0.0100                 0.0 0.025
#> 4       111111         H4 bonferroni 0.0070                 0.0 0.025
#> 5       111111         H5 bonferroni 0.1000                 0.0 0.025
#> 6       111111         H6 bonferroni 0.0124                 0.0 0.025
#>   Inequality_holds
#> 1            FALSE
#> 2             TRUE
#> 3            FALSE
#> 4            FALSE
#> 5            FALSE
#> 6            FALSEC:\10gSfZ83ae05fe14aa8-testing.R
In addition to using parametric tests for primary hypotheses, there are other ways to improve the Bonferroni-based graphical approaches. One way is to apply Simes tests to secondary hypotheses(Bretz et al. 2011; Lu 2016). Simes tests improve over Bonferroni tests because they may reject an intersection hypothesis if all p-values are relatively small, even if they’re larger than adjusted significance levels of Bonferroni tests.
For the intersection hypothesis \(H_3\cap H_5\), hypothesis weights are 0.5 and 0.5, respectively for \(H_3\) and \(H_5\). The intersection hypothesis is rejected by the Bonferroni test if \(p_3\leq 0.5\alpha\) or \(p_5\leq 0.5\alpha\). In addition to these conditions, the Simes test can also reject the intersection hypothesis if both \(p_3\) and \(p_5\) are less than or equal to \(\alpha\). In order to control the Type I error for the Simes test, a distributional requirement is needed, which is called \(MTP_2\) (Sarkar 1998). In this case, it means that the correlation between test statistics for \(H_3\) and \(H_5\) should be non-negative.
To illustrate the possibility of using mixed tests in this example, we keep parametric tests for primary hypotheses and apply Simes tests for secondary hypotheses. There are two sets of secondary hypotheses: \(H_3\) and \(H_5\) for the secondary hypotheses for the low dose versus control, and \(H_4\) and \(H_6\) for the secondary hypotheses for the high dose versus control. It is believed that the correlation between test statistics for \(H_3\) and \(H_5\) is non-negative and it is the same case for \(H_4\) and \(H_6\). Thus one can apply Simes tests to \(H_3\) and \(H_5\), and separately to \(H_4\) and \(H_6\). Note that this is different from applying Simes tests to all \(H_3\ldots,H_6\) which requires a stronger \(MTP_2\) condition.
To implement this procedure, we create three subgroups: one for \(H_1\) and \(H_2\), one for \(H_3\) and \(H_5\), and one for \(H_4\) and \(H_6\). For the first subgroup of hypotheses, we apply the parametric tests; for the second and the third subgroups, we apply separate Simes tests. Assume the correlation is 0.5 between test statistics for primary hypotheses. The procedure rejects \(H_1\), \(H_2\), \(H_4\), \(H_6\), and \(H_3\). Compared to the results of only using parametric tests for primary hypotheses (and Bonferroni tests for secondary hypotheses), \(H_3\), \(H_4\) and \(H_6\) are the additional hypotheses rejected because of using Simes tests, showing the power improvement of using Simes tests.
test_results_parametric_simes <-
  graph_test_closure(
    g,
    p = p_values, alpha = 0.025,
    test_groups = list(c(1, 2), c(3, 5), c(4, 6)),
    test_types = c("parametric", "simes", "simes"),
    test_corr = list(corr_12, NA, NA),
    test_values = TRUE
  )
test_results_parametric_simes$outputs$adjusted_p
#>         H1         H2         H3         H4         H5         H6 
#> 0.02413846 0.02413846 0.02480008 0.02480000 0.10000000 0.02480008
test_results_parametric_simes$outputs$rejected
#>    H1    H2    H3    H4    H5    H6 
#>  TRUE  TRUE  TRUE  TRUE FALSE  TRUE
head(test_results_parametric_simes$test_values$results)
#>   Intersection Hypothesis       Test      p      c_value Weight Alpha
#> 1       111111         H1 parametric 0.0150 1.0782936582    0.5 0.025
#> 2       111111         H2 parametric 0.0130 1.0782936582    0.5 0.025
#> 3       111111         H3      simes 0.0100                 0.0 0.025
#> 5       111111         H4      simes 0.0070                 0.0 0.025
#> 4       111111         H5      simes 0.1000                 0.0 0.025
#> 6       111111         H6      simes 0.0124                 0.0 0.025
#>   Inequality_holds
#> 1            FALSE
#> 2             TRUE
#> 3            FALSE
#> 5            FALSE
#> 4            FALSE
#> 6            FALSEC:\10gSfZ83ae05fe14aa8-testing.R
Given the above graph, the trial team is often interested in the
power of the trial. For a single null hypothesis, the power is the
probability to reject the null hypothesis at the significance level
alpha when the alternative hypothesis is true (i.e. a true
positive). For multiple null hypotheses, there could be multiple version
of power. For example, the power to reject at least one hypothesis and
the power to reject all hypotheses, given the alternative hypotheses are
true. With the graphical multiple comparison procedures, it is also
important to understand the power to reject each hypothesis, given the
multiplicity adjustment. Sometimes, a team may want to customize
definitions of power to define success. Thus power calculation is an
important aspect of trial design.
Assume that the primary endpoint is about the occurrence of an
unfavorable clinical event. To evaluate the treatment effect, the
proportion of patients with this event is calculated and it is the lower
the better. Assume that the proportions are 0.181 for the low and the
high doses, and 0.3 for control. Using the equal randomization among the
three treatment groups, the clinical trial team chooses a total sample
size of 600 with 200 per group. This leads to a marginal power of 80%
for \(H_1\) and \(H_2\), respectively, using the two-sample
test for difference in proportions with unpooled variance each at
one-sided significance level 0.025. In this calculation, we use the
marginal power to combine the information from the treatment effect, any
nuisance parameter, and sample sizes for each hypothesis. Note that the
significance level used for the marginal power should be the same as
alpha which is used in the power calculation as the
significance level for the FWER control. In addition, the marginal power
has a one-to-one relationship with the noncentrality parameter, which is
illustrated below.
alpha <- 0.025
prop <- c(0.3, 0.181, 0.181)
sample_size <- rep(200, 3)
unpooled_variance <-
  prop[-1] * (1 - prop[-1]) / sample_size[-1] +
  prop[1] * (1 - prop[1]) / sample_size[1]
noncentrality_parameter_primary <-
  -(prop[-1] - prop[1]) / sqrt(unpooled_variance)
marginal_power_primary <- pnorm(
  qnorm(alpha, lower.tail = FALSE),
  mean = noncentrality_parameter_primary,
  sd = 1,
  lower.tail = FALSE
)
names(marginal_power_primary) <- c("H1", "H2")
marginal_power_primary
#>        H1        H2 
#> 0.8028315 0.8028315C:\10gSfZ83ae05fe14aa8-testing.R
Assume that the secondary endpoint (SE1) is about the change in total medication score from baseline, which is a continuous outcome. Also assume that the secondary endpoint (SE2) is about the change in another medication score from baseline, which is a continuous outcome and it contains fewer categories compared to SE1. To evaluate the treatment effect, the mean change is calculated and it is the more reduction the better. Assume that the mean change from baseline for SE1 is the reduction of 7.5 and 8.25, respectively for the low and the high doses, and 5 for control. Also assume that the mean change from baseline for SE2 is the reduction of 8 and 9, respectively for the low and the high doses, and 6 for control. Further assume a known common standard deviation of 10. Given the sample size of 200 per treatment group, the marginal power is 71% and 90% for \(H_3\) and \(H_4\), respectively and 52% and 85% for \(H_5\) and \(H_6\), respectively using the two-sample \(z\)-test for the difference in means each at the one-sided significance level 0.025.
mean_change_se1 <- c(5, 7.5, 8.25)
sd <- rep(10, 3)
variance <- sd[-1]^2 / sample_size[-1] + sd[1]^2 / sample_size[1]
noncentrality_parameter_se1 <-
  (mean_change_se1[-1] - mean_change_se1[1]) /
    sqrt(variance)
marginal_power_se1 <- pnorm(
  qnorm(alpha, lower.tail = FALSE),
  mean = noncentrality_parameter_se1,
  sd = 1,
  lower.tail = FALSE
)
names(marginal_power_se1) <- c("H3", "H4")
marginal_power_se1
#>        H3        H4 
#> 0.7054139 0.9014809
mean_change_se2 <- c(6, 8, 9)
noncentrality_parameter_se2 <-
  (mean_change_se2[-1] - mean_change_se2[1]) /
    sqrt(variance)
marginal_power_se2 <- pnorm(
  qnorm(alpha, lower.tail = FALSE),
  mean = noncentrality_parameter_se2,
  sd = 1,
  lower.tail = FALSE
)
names(marginal_power_se2) <- c("H5", "H6")
marginal_power_se2
#>        H5        H6 
#> 0.5159678 0.8508384C:\10gSfZ83ae05fe14aa8-testing.R
In addition to the marginal power, we also need to make assumptions about the joint distribution of test statistics. In this example, we assume that they follow a multivariate normal distribution which means they’re defined by the noncentrality parameters above and the correlation matrix \(R\). To obtain the correlations, it is helpful to understand that there are two types of correlations in this example. The correlation between two dose-control comparisons for the same endpoint and the correlation between endpoints. The former correlation can be calculated as a function of sample size. For example, the correlation between test statistics for \(H_1\) and \(H_2\) is \(\rho_{12}=\left(\frac{n_1}{n_1+n_0}\right)^{1/2}\left(\frac{n_2}{n_3+n_0}\right)^{1/2}\). Under the equal randomization, this correlation is 0.5. The correlation between test statistics for \(H_3\) and \(H_4\) and for \(H_5\) and \(H_6\) is the same as the above. On the other hand, the correlation between endpoints for the same dose-control comparison is often estimated based on prior knowledge or from previous trials. Without the information, we assume it to be \(\rho_{13}=\rho_{15}=\rho_{24}=\rho_{26}=\rho_{35}=\rho_{46}=0.5\). In practice, one could set this correlation as a parameter and try multiple values to assess the sensitivity of this assumption. Regarding the correlation between test statistics for \(H_1\) and \(H_4 (H_6)\) and for \(H_2\) and \(H_3 (H_5)\), they are even more difficult to estimate. Here we use a simple product rule, which means that this correlation is a product of correlations of the two previously assumed correlations. For example, \(\rho_{14}=\rho_{12}*\rho_{24}\) and \(\rho_{23}=\rho_{12}*\rho_{13}\). In practice, further assumptions may be made and tested, instead of using the product rule.
corr <- matrix(0, nrow = 6, ncol = 6)
corr[1, 2] <-
  corr[3, 4] <-
  corr[5, 6] <-
  sqrt(
    sample_size[2] / sum(sample_size[1:2]) *
      sample_size[3] / sum(sample_size[c(1, 3)])
  )
rho <- 0.5
corr[1, 3] <-
  corr[1, 5] <-
  corr[2, 4] <-
  corr[2, 6] <-
  corr[3, 5] <-
  corr[4, 6] <-
  rho
corr[1, 4] <- corr[1, 6] <- corr[2, 3] <- corr[2, 5] <- corr[1, 2] * rho
corr[3, 6] <- corr[1, 3] * corr[1, 6]
corr[4, 5] <- corr[1, 4] * corr[1, 6]
corr <- corr + t(corr)
diag(corr) <- 1
colnames(corr) <- hyp_names
rownames(corr) <- hyp_names
corr
#>      H1   H2    H3     H4     H5    H6
#> H1 1.00 0.50 0.500 0.2500 0.5000 0.250
#> H2 0.50 1.00 0.250 0.5000 0.2500 0.500
#> H3 0.50 0.25 1.000 0.5000 0.5000 0.125
#> H4 0.25 0.50 0.500 1.0000 0.0625 0.500
#> H5 0.50 0.25 0.500 0.0625 1.0000 0.500
#> H6 0.25 0.50 0.125 0.5000 0.5000 1.000C:\10gSfZ83ae05fe14aa8-testing.R
As mentioned earlier, there are multiple versions of “power” when
there are multiple hypotheses. Commonly used “power” definitions
include: - Local power: The probability of each hypothesis being
rejected (with multiplicity adjustment) - Expected no. of rejections:
The expected number of rejections - Power to reject 1 or more: The
probability to reject at least one hypothesis - Power to reject all: The
probability to reject all hypotheses These are the default outputs from
the graph_calculate_power function. In addition, an user
could customize success criteria to define other versions of
“power”.
success_fns <- list(
  # Probability to reject H1
  H1 = function(x) x[1],
  # Expected number of rejections
  `Expected no. of rejections` =
    function(x) x[1] + x[2] + x[3] + x[4] + x[5] + x[6],
  # Probability to reject at least one hypothesis
  `AtLeast1` = function(x) x[1] | x[2] | x[3] | x[4] | x[5] | x[6],
  # Probability to reject all hypotheses
  `All` = function(x) x[1] & x[2] & x[3] & x[4] & x[5] & x[6],
  # Probability to reject both H1 and H2
  `H1andH2` = function(x) x[1] & x[2],
  # Probability to reject all hypotheses for the low dose or the high dose
  `(H1andH3andH5)or(H2andH4andH6)` <-
    function(x) (x[1] & x[3] & x[5]) | (x[2] & x[4] & x[6])
)C:\10gSfZ83ae05fe14aa8-testing.R
Given the above inputs, we can calculate power via simulation for the
graphical multiple comparison procedure at one-sided significance level
alpha = 0.025 using sim_n = 1e5 simulations
and the random seed 1234. There are three procedures to compare: the
Bonferroni-based procedure, the procedure using parametric tests for
primary hypotheses, and the procedure using parametric tests for primary
hypotheses and Simes tests for two sets of secondary hypotheses
separately. The local power for hypotheses \(H_1, \ldots, H_6\) is - 0.760, 0.752,
0.510, 0.665, 0.391, and 0.625, respectively using the Bonferroni-based
procedure, - 0.764, 0.756, 0.511, 0.668, 0.392, and 0.628, respectively
using the procedure using parametric tests for primary hypotheses, -
0.764, 0.757, 0.521, 0.673, 0.402, and 0.633, respectively using the
procedure using parametric tests for primary hypotheses and Simes tests
for two sets of secondary hypotheses separately. Note that the local
power is improved for all hypotheses after parametric tests and Simes
tests being applied over the Bonferroni-based procedure.
set.seed(1234)
power_bonferroni <- graph_calculate_power(
  g,
  alpha = 0.025,
  sim_corr = corr,
  sim_n = 1e5,
  power_marginal = c(
    marginal_power_primary,
    marginal_power_se1,
    marginal_power_se2
  ),
  sim_success = success_fns
)
round(power_bonferroni$power$power_local, 3)
#>    H1    H2    H3    H4    H5    H6 
#> 0.760 0.752 0.510 0.665 0.391 0.625
set.seed(1234)
power_parametric <- graph_calculate_power(
  g,
  alpha = 0.025,
  sim_corr = corr,
  sim_n = 1e5,
  power_marginal = c(
    marginal_power_primary,
    marginal_power_se1,
    marginal_power_se2
  ),
  test_groups = list(c(1, 2), 3:6),
  test_types = c("parametric", "bonferroni"),
  test_corr = list(corr_12, NA),
  sim_success = success_fns
)
round(power_parametric$power$power_local, 3)
#>    H1    H2    H3    H4    H5    H6 
#> 0.764 0.756 0.511 0.668 0.392 0.628
set.seed(1234)
power_parametric_simes <- graph_calculate_power(
  g,
  alpha = 0.025,
  sim_corr = corr,
  sim_n = 1e5,
  power_marginal = c(
    marginal_power_primary,
    marginal_power_se1,
    marginal_power_se2
  ),
  test_groups = list(c(1, 2), c(3, 5), c(4, 6)),
  test_types = c("parametric", "simes", "simes"),
  test_corr = list(corr_12, NA, NA),
  sim_success = success_fns
)
round(power_parametric_simes$power$power_local, 3)
#>    H1    H2    H3    H4    H5    H6 
#> 0.764 0.757 0.521 0.673 0.402 0.633C:\10gSfZ83ae05fe14aa8-testing.R
To see the detailed outputs of all simulated p-values and rejection
decisions for all hypotheses, we can specify
verbose = TRUE. This will produce a lot of outputs. To
allow flexible printing functions, a user can change the following: -
The indented space with the default setting of indent = 2 -
The precision of numeric values (i.e., the number of decimal places)
with the default setting of precision = 6
set.seed(1234)
power_verbose_output_parametric_simes <- graph_calculate_power(
  g,
  alpha = 0.025,
  sim_corr = corr,
  sim_n = 1e5,
  power_marginal = c(
    marginal_power_primary,
    marginal_power_se1,
    marginal_power_se2
  ),
  test_groups = list(c(1, 2), c(3, 5), c(4, 6)),
  test_types = c("parametric", "simes", "simes"),
  test_corr = list(corr_12, NA, NA),
  sim_success = success_fns,
  verbose = TRUE
)
head(power_verbose_output_parametric_simes$details$p_sim, 10)
#>                 H1           H2           H3           H4           H5
#>  [1,] 0.0236514114 6.962444e-03 3.835101e-03 0.0615622415 0.0034034237
#>  [2,] 0.0367030430 4.681135e-02 6.857853e-02 0.0284812435 0.1496042440
#>  [3,] 0.0157541925 6.543826e-03 1.152738e-03 0.0007655046 0.0836186744
#>  [4,] 0.0053369856 8.699959e-07 9.952719e-03 0.0002429371 0.0369937655
#>  [5,] 0.0342665326 1.788794e-01 6.202961e-03 0.0305855529 0.0461642736
#>  [6,] 0.0011139852 1.783703e-02 6.083855e-02 0.0087065723 0.3865009844
#>  [7,] 0.2540774514 2.101193e-01 2.616673e-02 0.0277425681 0.0092489689
#>  [8,] 0.0908135583 4.433230e-02 1.903718e-01 0.0418782572 0.3996511810
#>  [9,] 0.0739568465 4.728371e-02 3.778643e-01 0.0296172396 0.4446759145
#> [10,] 0.0003353917 1.048743e-03 1.246528e-05 0.0032298980 0.0001601294
#>                 H6
#>  [1,] 0.0018417304
#>  [2,] 0.0552141940
#>  [3,] 0.0255279965
#>  [4,] 0.0001345536
#>  [5,] 0.0666051161
#>  [6,] 0.0867387423
#>  [7,] 0.0297658510
#>  [8,] 0.1097032475
#>  [9,] 0.0620564532
#> [10,] 0.0114581872
print(power_verbose_output_parametric_simes,
  indent = 4,
  precision = 6
)
#> 
#> Test parameters ($inputs) ------------------------------------------------------
#>     Initial graph
#> 
#>     --- Hypothesis weights ---
#>     H1: 0.5
#>     H2: 0.5
#>     H3: 0.0
#>     H4: 0.0
#>     H5: 0.0
#>     H6: 0.0
#> 
#>     --- Transition weights ---
#>             H1      H2      H3      H4      H5      H6
#>     H1 0.00000 0.50000 0.25000 0.00000 0.25000 0.00000
#>     H2 0.50000 0.00000 0.00000 0.25000 0.00000 0.25000
#>     H3 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000
#>     H4 0.00001 0.00000 0.00000 0.00000 0.00000 0.99999
#>     H5 0.00000 0.00001 0.99999 0.00000 0.00000 0.00000
#>     H6 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000
#> 
#>     Alpha = 0.025
#> 
#>     Parametric testing correlation:     H1  H2
#>                                     H1 1.0 0.5
#>                                     H2 0.5 1.0
#> 
#>     Test types
#>     parametric: (H1, H2)
#>          simes: (H3, H5)
#>          simes: (H4, H6)
#> 
#> Simulation parameters ($inputs) ------------------------------------------------
#>     Testing 100,000 simulations with multivariate normal params:
#> 
#>                           H1       H2       H3       H4       H5       H6
#>     Marginal power: 0.802831 0.802831 0.705414 0.901481 0.515968 0.850838
#> 
#>     Correlation:        H1     H2     H3     H4     H5     H6
#>                  H1 1.0000 0.5000 0.5000 0.2500 0.5000 0.2500
#>                  H2 0.5000 1.0000 0.2500 0.5000 0.2500 0.5000
#>                  H3 0.5000 0.2500 1.0000 0.5000 0.5000 0.1250
#>                  H4 0.2500 0.5000 0.5000 1.0000 0.0625 0.5000
#>                  H5 0.5000 0.2500 0.5000 0.0625 1.0000 0.5000
#>                  H6 0.2500 0.5000 0.1250 0.5000 0.5000 1.0000
#> 
#> Power calculation ($power) -----------------------------------------------------
#>                                      H1      H2      H3      H4      H5      H6
#>                    Local power: 0.76437 0.75656 0.52086 0.67302 0.40234 0.63292
#> 
#>     Expected no. of rejections: 3.75007
#>      Power to reject 1 or more: 0.86277
#>            Power to reject all: 0.32537
#> 
#>                Success measure   Power
#>                             H1 0.76437
#>     Expected no. of rejections 3.75007
#>                       AtLeast1 0.86277
#>                            All 0.32537
#>                        H1andH2 0.65816
#>                                0.63324
#> 
#> Simulation details ($details) --------------------------------------------------
#>         p_sim_H1    p_sim_H2    p_sim_H3    p_sim_H4    p_sim_H5    p_sim_H6 rej_H1
#>      2.36514e-02 6.96244e-03 3.83510e-03 6.15622e-02 3.40342e-03 1.84173e-03  FALSE
#>      3.67030e-02 4.68113e-02 6.85785e-02 2.84812e-02 1.49604e-01 5.52142e-02  FALSE
#>      1.57542e-02 6.54383e-03 1.15274e-03 7.65505e-04 8.36187e-02 2.55280e-02   TRUE
#>      5.33699e-03 8.69996e-07 9.95272e-03 2.42937e-04 3.69938e-02 1.34554e-04   TRUE
#>      3.42665e-02 1.78879e-01 6.20296e-03 3.05856e-02 4.61643e-02 6.66051e-02  FALSE
#>      1.11399e-03 1.78370e-02 6.08385e-02 8.70657e-03 3.86501e-01 8.67387e-02   TRUE
#>      2.54077e-01 2.10119e-01 2.61667e-02 2.77426e-02 9.24897e-03 2.97659e-02  FALSE
#>      9.08136e-02 4.43323e-02 1.90372e-01 4.18783e-02 3.99651e-01 1.09703e-01  FALSE
#>      7.39568e-02 4.72837e-02 3.77864e-01 2.96172e-02 4.44676e-01 6.20565e-02  FALSE
#>      3.35392e-04 1.04874e-03 1.24653e-05 3.22990e-03 1.60129e-04 1.14582e-02   TRUE
#>      rej_H2 rej_H3 rej_H4 rej_H5 rej_H6
#>        TRUE  FALSE  FALSE  FALSE   TRUE
#>       FALSE  FALSE  FALSE  FALSE  FALSE
#>        TRUE   TRUE   TRUE  FALSE  FALSE
#>        TRUE   TRUE   TRUE  FALSE   TRUE
#>       FALSE  FALSE  FALSE  FALSE  FALSE
#>        TRUE  FALSE  FALSE  FALSE  FALSE
#>       FALSE  FALSE  FALSE  FALSE  FALSE
#>       FALSE  FALSE  FALSE  FALSE  FALSE
#>       FALSE  FALSE  FALSE  FALSE  FALSE
#>        TRUE   TRUE   TRUE   TRUE   TRUE
#>     ... (Use `print(x, rows = <nn>)` for more)C:\10gSfZ83ae05fe14aa8-testing.R