This function checks for outliers looks for unexpected combinations of marker gene expression (e.g., GAD1 + SLC17A7) and for particularly high or low expression of indicated QC metrics, and flags any of the clusters meeting those criteria as potential outliers. This should (in theory) find things like poor quality clusters and clusters of doublets. Specific genes and thresholds currnetly hard-coded in, but might be updated in later iterations.
check_outlier(
anno,
cluster,
norm.dat,
select.cells = colnames(norm.dat),
keep.cl = NULL,
neun.thresh = 0.5,
neun.colname = "facs_population_plan",
neun.val = "NeuN-pos",
qc.metrics = c("Genes.Detected.CPM", "percent_reads_aligned_total", "complexity_cg"),
test.genes = c("SNAP25", "GAD1", "GAD2", "SLC17A7", "SLC17A6"),
expr.th = 3,
prop.th = c(0.4, 0.4, 0.4, 0.4, 0.4),
min.prop.th = 0.8,
plot = TRUE,
plot.path = "output/"
)anno dataframe which must include column names listed in `neun.colname` and `qc.metrics`. "cluster" is added from `cluster` parameter below.
cluster labels for all cells along with sample_id as their names
expression dataframe with columns as cells and rows as gene names and cpm normalized
column nmaes of norm.dat
clusters to definitely keep in analysis (e.g., to exclude from consideration as an outlier cluster) default is NULL
fraction of cells expressing NeuN to be considered NeuN positive (default is 0.5)
column name in anno with the Nuen information
value corresponding to non-neuronal marker in neun.colname in anno
required columns from anno dataframe. default is Genes.Detected.CPM", "percent_reads_aligned_total", "complexity_cg"
CURRENTLY NOT USED. This function will eventually allow for a pre-defined set of genes to be entered. default is "SNAP25", "GAD1", "GAD2", "SLC17A7", "SLC17A6"
expression threshold for detecting test genes
proportion threshold of detected genes by cluster (default is 0.4, 0.4, 0.4, 0.4, 0.4)
one of last 4 test genes should have detection at least at this amount
default is TRUE
path of plot, default is ./output/
gives outlier clusters and exploratory plots