Doublet detection in single-cell RNA sequencing data — doubletFinder • scrattch.hicat

doubletFinder(
  data,
  select.genes,
  proportion.artificial = 0.2,
  k = NULL,
  plot = FALSE
)

Arguments

data

gene x sample matrix with counts (non-normalized)

select.genes

list of genes with highest variance between samples

proportion.artificial

The proportion (from 0-1) of the merged real-artificial dataset that is artificial. In other words, this argument defines the total number of artificial doublets. Default is set to 20%

k

The number of nearest neighbours of the merged real-artificial dataset used to define each cell's neighborhood in PC space. Value is the minimum of 1

plot

An list of doublet.scores per samples and plots depicting the doublet scores for cells and artificial doublets. Adopted from https://www.cell.com/cell-systems/fulltext/S2405-4712(19)30073-0 (https://github.com/chris-mcginnis-ucsf/DoubletFinder) This function generates artificial nearest neighbors from existing single-cell RNA sequencing data. First, real and artificial data are merged. Second, dimension reduction is performed on the merged real-artificial dataset using PCA. Third, the proportion of artificial nearest neighbors is defined for each real cell. Finally, real cells are rank- ordered and predicted doublets are defined via thresholding based on the expected number of doublets.