clusterPhiSpace.RdPerforms k-means clustering on principal components derived from PhiSpace scores or any other matrix-like data. This is useful for identifying spatial niches or clusters based on cell state compositions or gene expression patterns.
clusterPhiSpace(
x,
k = NULL,
k_range = NULL,
select_k_method = c("silhouette", "elbow"),
ncomp = NULL,
reducedDimName = "PhiSpace",
use_assay = NULL,
nstart = 20,
iter.max = 500,
algorithm = c("Lloyd", "Hartigan-Wong", "MacQueen", "Forgy"),
center = TRUE,
scale = FALSE,
seed = NULL,
return_pca = TRUE,
store_in_colData = FALSE,
cluster_name = "PhiClust"
)Either a SpatialExperiment/SingleCellExperiment object containing PhiSpace scores in reducedDim, OR a matrix-like object (matrix, sparse matrix, data frame) with features in columns and observations in rows.
Integer specifying the number of clusters. Either provide k or k_range but not both.
Integer vector of length 2 specifying range of k values to test (e.g., c(5, 15)). Will use elbow method or silhouette to select optimal k. Either provide k or k_range but not both.
Character string specifying method to select optimal k when k_range is provided. Options: "elbow" (total within-cluster SS) or "silhouette" (average silhouette width). Default is "silhouette".
Integer specifying number of principal components to use for clustering. If NULL (default), uses min(30, nfeatures - 1). If "all", uses nfeatures - 1 components.
Character string specifying which reducedDim slot contains the PhiSpace scores. Only used when x is a SingleCellExperiment or SpatialExperiment. Default is "PhiSpace".
Character string specifying which assay to use if extracting data from a SingleCellExperiment/SpatialExperiment object instead of using reducedDim. If NULL (default), uses reducedDim specified by reducedDimName. Common options: "logcounts", "counts", "normcounts".
Integer specifying number of random starts for k-means. Default is 20.
Integer specifying maximum number of iterations. Default is 500.
Character string specifying k-means algorithm. Options are "Hartigan-Wong", "Lloyd", "Forgy", "MacQueen". Default is "Lloyd".
Logical indicating whether to center data before PCA. Default is TRUE.
Logical indicating whether to scale data before PCA. Default is FALSE.
Integer seed for reproducibility. Default is NULL (no seed set).
Logical indicating whether to return PCA results. Default is TRUE.
Logical indicating whether to store cluster assignments in the colData of the input object (only applicable when x is an SCE/SPE). Default is FALSE.
Character string specifying the column name for cluster assignments if store_in_colData is TRUE. Default is "PhiClust".
A list with class "PhiSpaceClustering" containing:
Factor vector of cluster assignments
Matrix of cluster centers in PC space
Full kmeans object from stats::kmeans
PCA results (if return_pca = TRUE)
Matrix of PC scores used for clustering
Selected k value (relevant when k_range is used)
List with selection metrics (if k_range was used)
List of parameters used
Updated object (if store_in_colData = TRUE and x is SCE/SPE)
This function implements a common workflow for spatial clustering:
Extract data (from reducedDim, assay, or use directly if matrix)
Perform PCA to reduce dimensionality
Select top principal components
Apply k-means clustering
The function can either:
Use a fixed k value (specify k parameter)
Automatically select k from a range (specify k_range parameter)
When k_range is provided, the function tests all k values in the range and selects the optimal k using either:
Silhouette method (default): Maximizes average silhouette width
Elbow method: Identifies elbow point in total within-cluster sum of squares
The function accepts multiple input types:
SingleCellExperiment/SpatialExperiment: Uses reducedDim (default) or assay
Matrix: Standard R matrix with observations in rows
Sparse matrix: dgCMatrix or similar sparse formats
Data frame: Coerced to matrix
The number of PCs to use affects clustering resolution:
Fewer PCs (e.g., 10-15): Broader, more general clusters
More PCs (e.g., 30-50): Finer, more specific clusters
Default (30): Good balance for most applications
Lloyd (default): Standard algorithm, good balance of speed and quality
Hartigan-Wong: Often better results but slower
MacQueen: Faster but may converge to local optima
Forgy: Similar to Lloyd
if (FALSE) { # \dontrun{
# Example 1: Using SingleCellExperiment with PhiSpace scores
result <- clusterPhiSpace(
x = lung_data,
k = 9,
ncomp = 30,
seed = 123
)
# Example 2: Using matrix directly
phi_matrix <- reducedDim(lung_data, "PhiSpace")
result <- clusterPhiSpace(
x = phi_matrix,
k = 9,
ncomp = 30
)
# Example 3: Cluster on normalized counts instead
result <- clusterPhiSpace(
x = lung_data,
use_assay = "logcounts",
k = 9,
ncomp = 50
)
# Example 4: Using data frame
df <- as.data.frame(reducedDim(lung_data, "PhiSpace"))
result <- clusterPhiSpace(x = df, k = 9)
# Example 5: Automatic k selection
result <- clusterPhiSpace(
x = phi_matrix,
k_range = c(5, 15),
select_k_method = "silhouette"
)
} # }