taxonomy_mapping.Rd
This function performs mapping using four methods (Correlation-based, tree-based, heirarchical, and Seurat-based) and will return the top one (or in some cases more) best matching reference cell type along with some associated QC metrics.
taxonomy_mapping(
AIT.anndata,
query.data,
corr.map = TRUE,
tree.map = TRUE,
mapmycells.hierarchical.map = TRUE,
mapmycells.flat.map = TRUE,
seurat.map = TRUE,
label.cols = AIT.anndata$uns$hierarchy,
genes.to.use = paste0("highly_variable_genes_", AIT.anndata$uns$mode),
mapmycells_params_list = list()
)
A reference taxonomy object.
A logCPM normalized matrix to be annotated.
Should correlation mapping be performed? (see methods)
Should tree mapping be performed? (see methods)
Should mapmycells' hierarchical mapping be performed? (see methods)
Should mapmycells' flat mapping be performed? (see methods)
Should seurat mapping be performed? (see methods)
Column names of annotations to map against. Note that this only works for metadata that represent clusters or groups of clusters (e.g., subclass, supertype, neighborhood, class) and will default to whatever is included in AIT.anndata$uns$hierarchy. This is highly related to the variable called "hierarchy" in other functions.
The set of genes to use for correlation calculation and/or Seurat integration (default is the highly_variable_genes associated with the current mode). Can be (1) a character vector of gene names, (2) a TRUE/FALSE (logical) vector of which genes to include, or (3) a column name in AIT.anndata$var corresponding to a logical vector of variable genes.
Mapping methods currently available in taxonomy_mapping
include:
corr.map: This method calculates the Pearson correlation between each cell and each cluster median, and returns the cluster with the highest correlation along with the associated correlation score. Despite being a very simple method, this works quite well in some circumstances.
tree.map: Historical implementation of tree mapping used for assigning cell types to patch-seq cells in several studies of mouse visual cortex. This method requires a dendogram and iteratively walks down the tree from the root node to the leave nodes deciding the most likely cell type based on a distict set of marker genes at each node. By subsampling genes, this method provides a bootstrapping probability/confidence. Implementation of tree mapping herein is not fully tested, so use with caution.
mapmycells.hierarchical.map: Current version of iterative (or hierarchical) mapping used in MapMyCells, this function imports the python cell_type_mapper
library. It requires a leveled hierarchy (e.g., cluster columns corresponding to cell type definitions at different levels of resolutions such as "cluster" and "subclass" and "class") and performs correlation-based mapping with different marker genes for each level, iterating through the levels similar to tree mapping. Like tree mapping this method provides a bootstrapping probability/confidence by subsampling genes. We find that this method works quite well in some circumstances.
mapmycells.flat.map: A single-level implementation of hierarchical mapping. Essentially it is the same as corr.map, except that it uses a prespecified set of marker genes for calculating the correlation and that it outputs bootstrapping probabilities.
seurat.map: Historical implementation of Seurat mapping for assigning cell types to patch-seq cells in a study of human temporal cortex. This method performs integration and label transfer using FindTransferAnchors
and TransferData
functions in Seurat (v4.4.0) with a prespecified set of variable genes and a reasonable set of parameters. We are not maintaining this method for compatibility in Seurat versions 5.0 or higher, and therefore this function will likely fail outside of the Docker environment.
Mapping results from all methods.