Correlate PhiSpace scores with a response variable

This function performs regression or classification analysis to identify which PhiSpace cell state scores are most associated with a response variable. It supports both continuous (regression) and discrete (classification) responses using various methods including PLS, PLSDA, and DWD.

correlatePhiSpace(
  spe,
  response,
  method = c("PLSDA", "PLS", "DWD"),
  ncomp = NULL,
  reducedDimName = "PhiSpace",
  center = TRUE,
  scale = FALSE,
  dwd_params = list(),
  seed = NULL
)

Arguments

spe

A SpatialExperiment or SingleCellExperiment object containing PhiSpace scores in reducedDim

response

Either a character string specifying a column name in colData(spe), or a vector of response values. For classification methods (PLSDA, DWD), this should be a factor or will be coerced to a factor.

method

Character string specifying the analysis method. One of:

"PLS" - Partial Least Squares regression (for continuous response)
"PLSDA" - Partial Least Squares Discriminant Analysis (for discrete response)
"DWD" - Distance Weighted Discrimination (for binary classification)

ncomp

Integer specifying the number of components to use. If NULL (default), will be set automatically based on the method:

For PLSDA: number of classes - 1
For PLS: min(10, ncol(PhiSpace)/2)
For DWD: not applicable (DWD finds a single discriminant direction)

reducedDimName

Character string specifying which reducedDim slot contains the PhiSpace scores. Default is "PhiSpace".

center

Logical indicating whether to center the PhiSpace scores before analysis. Default is TRUE.

scale

Logical indicating whether to scale the PhiSpace scores before analysis. Default is FALSE.

dwd_params

List of additional parameters for DWD (only used if method="DWD"):

kernel: kernel function (default: vanilladot())
qval: q-value parameter for DWD (default: 1)
lambda: regularization parameter or sequence (default: auto-tuned via CV)
cv_folds: number of cross-validation folds (default: 5)

seed

Integer seed for reproducibility. Default is NULL (no seed set).

Value

A list with class "PhiSpaceCorrelation" containing:

method: The method used
importance_scores: Matrix of feature importance scores (coefficients or weights)
scores: Predicted scores or component scores for each observation
model: The fitted model object
feature_ranking: Data frame with features ranked by importance
response_summary: Summary information about the response variable
parameters: List of parameters used in the analysis

Details

Method Selection:

Use PLS for continuous responses (e.g., survival time, gene expression)
Use PLSDA for multi-class classification (e.g., disease subtypes, clusters)
Use DWD for binary classification when interpretability is important

Feature Importance:

For PLS/PLSDA: regression coefficients from the final component
For DWD: the discriminant weights

The importance scores indicate which cell state features (PhiSpace dimensions) are most strongly associated with the response. Positive values indicate positive association, negative values indicate negative association.

Examples

if (FALSE) { # \dontrun{
# PLSDA example: associate PhiSpace scores with clusters
result <- correlatePhiSpace(
  spe = query_spe,
  response = "PhiClust",
  method = "PLSDA"
)

# View top important features
head(result$feature_ranking, 10)

# Plot importance scores
plot(result, nfeat = 20)

# PLS example: associate with a continuous variable
result <- correlatePhiSpace(
  spe = query_spe,
  response = "spatial_distance",
  method = "PLS",
  ncomp = 5
)

# DWD example: binary classification
result <- correlatePhiSpace(
  spe = query_spe,
  response = "cancer_vs_normal",
  method = "DWD",
  dwd_params = list(qval = 1, cv_folds = 10)
)
} # }