Create a taxonomy mode for mapping to taxonomy subset (or child)
buildTaxonomyMode.Rd
This function creates a new mode for mapping to a subset of a taxonomy. This includes filtering cells, setting new variable genes and/or marker genes, calculating mapping statistics,
Usage
buildTaxonomyMode(
AIT.anndata,
mode.name,
retain.cells = NULL,
retain.clusters = NULL,
subsample = 100,
highly_variable_genes = NULL,
marker_genes = NULL,
embeddings = NULL,
number.of.pcs = 30,
add.dendrogram.markers = FALSE,
addMapMyCells = TRUE,
overwrite = FALSE,
save.normalized.data = TRUE,
write.taxonomy = TRUE,
...
)
Arguments
- AIT.anndata
A reference taxonomy anndata object.
- mode.name
A name to identify the new taxonomy version.
- retain.cells
A boolean vector of length number of cells indicating which cells should be retained (TRUE) or filtered (FALSE) -OR- a character vector with sample names indicating which cells should be retained. Default is to retain the cells included in "stadard" mode.
- retain.clusters
A character vector with cluster names (e.g., values in the "cluster_id" column) indicating which clusters should be retained. Default is to retain all clusters with at least 2 retained cells (clusters with exactly 1 cell can cause some functions to crash).
- subsample
The number of cells to retain per cluster (default = 100). Note that subsampling happens AFTER retain.cells and retail.clusters filtering.
- highly_variable_genes
Set of features defined as highly variable genes OR a number of binary genes to calculate (we recommend ~1000 - ~5000, for <100 to ~5000 cell types). If a feature list is provided, provide either as a named list of vectors, or as a single vector (in which case the name "highly_variable_genes_mode.name" will be used). "highly_variable_genes_mode.name" will also be used for calculated variable genes. Optional input, but for proper mapping we recommend including either highly_variable_genes or marker_genes. If nothing is provided (default=NULL), standard mode markers will be used for mapping algorithms based on these gene lists and may cause problems.
- marker_genes
Set of features defined as marker genes. Provide either as a named list of vectors, or as a single vector (in which case the name "marker_genes_mode.name" will be used).
- embeddings
Dimensionality reduction coordinate data.frame with 2 columns or a string with the column name for marker_genes or variable_genes from which a UMAP should be calculated. If coordinates are provided, rownames must be equal to colnames of counts. Either provide as a named list or as a single data.frame (in which case the name "default_mode.name" will be used). Optional - if nothing is provided (default=NULL) the relevant subset of the default standard embedding will be used.
- number.of.pcs
Number of principle components to use for calculating UMAP coordinates (default=30). This is only used in embeddings corresponds to a variable gene column from which a UMAP should be calculated.
- add.dendrogram.markers
If TRUE (default=FALSE), will also add dendrogram markers to prep the taxonomy for tree mapping
- addMapMyCells
If TRUE (default), will also prep this mode of the taxonomy for hierarchical MapMyCells mapping
- overwrite
If mode.name already exists, should it be overwritten (default = FALSE)
- save.normalized.data
If TRUE (default), will save normalized data when writing out h5ad file. Otherwise, will remove normalized data to save space (in which case it will be recalculated automatically upon
loadTaxonomy
)- ...
Additional variables to be passed to
addDendrogramMarkers