scrattch

Single-cell RNA-seq analysis for transcriptomic type characterization (or "scrattch") is a suite of R and python scripts from the Allen Institute for Brain Science. The core R libraries are linked in the umbrella package called scrattch, which is modeled after the tidyverse package. You can use scrattch to automatically install or update some of the underlying packages and can run the remaining packages in docker environments. This page describes all core and adjacent scrattch content.

Scrattch packages

Scrattch includes several packages for clustering, mapping, and data formatting and visualization, along with example data for demos. These include:

Data preparation: file formats and schema

scrattch.taxonomy - Taxonomy building scripts for RNA-seq based taxonomies following the Allen Institute (AIT) schema. A table of available AIT-formatted taxonomies can be found here.
scrattch.io - [deprecated]. Library for file handling and data formatting, replaced by scrattch.taxonomy in 2024.

Data analysis: cell clustering and mapping (also called label transfer)

scrattch.hicat - Hierarchical, iterative clustering for analysis of transcriptomics
scrattch.bigcat - Clustering analysis for extremely large single cell dataset
scrattch.mapping - Generalized mapping scripts for single cell RNA-seq, Patch-seq, spatial transcriptomics, or related data types
scrattch.patchseq - Functions for generating additional QC metrics and output files for patch-seq analysis

Data visualization

scrattch.vis - Plotting functions for visualization of single cell RNA-seq data

Example data: small RNA-seq data sets

tasic2016data - Data from Tasic, et al. (2016), which is used for demos
hodge2019data - Data subset from Hodge, et al. (2019), which is used for demos
A table of actual AIT-formatted taxonomies can be found here.

If you're interested in only one of these modules, you can install them separately. That said, we recommend using the installation instructions below to install combinations of scrattch packages to ensure they interact properly.

Find a scrattch function

Use this table to search for the exported functions in the core scrattch suite R libraries. While we have attempted to keep this table complete, we would encourage accessing the most up-to-date function descriptions via the '?' call in R. This table does not include python functions.

Several related websites and R and python libraries are outside of the scrattch suite, but are either used as part of scrattch libraries or directly work with scrattch outputs. These include (but are not limited to):

scrattch.example - Collection of notebooks for visualization of cell type taxonomies using constellation, dendrogram, and sunburst plots.
bmark - Standardized strategies for benchmarking clustering and mapping results
ReportCards - Companion GitHub page to scrattch.taxonomy which shows mapping predictions in determining cluster labels in a self-projection evaluation.
transcriptomic_clustering - Python implementation of scrattch.hicat clustering
cell_type_mapper - Python implementation of hierarchical mapping algorithm used in scrattch.mapping and MapMyCells
/ACE - Web-based and R Shiny app for comparison of annotations, including clustering and mapping results
mfishtools - Functions for gene selection and analysis of spatial transcriptomics data

Installation

We strongly encourage the use of docker to install the scrattch suite. In particular, several functions in scrattch.taxonomy and scrattch.mapping have known issues in certain R environments. That said, we provide options for installing and running R in both a docker environment and through standard R approaches.

Using docker (RECOMMENDED)

The current docker version is accessible through Docker Hub. As of 26 March 2025 the Docker version is docker://alleninst/scrattch:1.1.2. This corresponds to AIT (v1.1.2) (see the Allen Institute Taxonomy GitHub respository for details).

Docker can be run on some HPC environments that use singularity as follows:

Non-interactive: singularity shell --cleanenv [Docker version] Rscript YOUR_CODE.R
Interactive: singularity shell --cleanenv [Docker version]
To create a sif file for use in other environments: singularity pull scrattch:[#.#.#].sif [Docker version]

If you cannot figure out how to use Docker in your specific environment, please post an issue.

--WARNING-- The 1.1.2 docker listed above provides all the tooling for AIT (v1.1.2) and some functionality for scrattch.mapping, but is broken for hierarchical mapping and all scrattch.patchseq functionality. An update mid-April will bring both of these packages back up to speed with the AIT schema / format.

Installing `scrattch` in R

While we advise using the provided docker, you can install all scrattch packages along with their GitHub and BioConductor dependencies, as follows:

devtools::install_github("AllenInstitute/scrattch")
scrattch::install_scrattch()

Note that doMC may need to be installed manually from the download link at https://r-forge.r-project.org/R/?group_id=947 if you use Windows.

Installing previous versions

Two historical versions of scrattch are included in this package. These can be safely run without using docker, but are missing several recent components of the scrattch suite.

scrattch_2023 is the stable version of the package prior to the release of scrattch.mapping, scrattch.taxonomy, scrattch.patchseq, and hodge2019data
archive is the original package from ~2018, and should not be used for most folks

Should you need one of these previous versions, they can still be installed using:

devtools::install_github("AllenInstitute/scrattch", ref = "scrattch_2023") # -OR-
devtools::install_github("AllenInstitute/scrattch", ref = "archive")

Use cases

This section collects many existing use cases for all aspects of single cell analysis including clustering, visualization, mapping, and more.

Clustering single cell/nucleus data

Two examples workflows for clustering data are provided as part of the scrattch.hicat package. These both include full workflows including data QC, iterative analysis, and using visualizations and statistics to assess cluster quality. These functions will work for moderately-sized data sets (up to a few hundred-thousand cells and a few hundred cell types). Clusters are often grouped into hierarchies (e.g., "SST" and "PVALB" cells are both types of "GABAergic interneurons" in a process not discussed here.

Clustering vignette - An overview of the main functions in scrattch.hicat
Clustering tutorial - An interactive walkthrough of the major steps in clustering for scrattch.hicat
[Large data sets] - For data sets with more than a few hundred thousand cells, scrattch.bigcat has similar functionality (but no formal examples at this time).
Running iterative clustering in python - Another option performing iterative clustering using the scrattch.hicat algorithm is to install the transcriptomic_clustering python library. This example shows how to use this script, and will work for data sets of any size.

Visualizing single cell/nucleus data

The scrattch suite has been used to visualize single cell/nucleus RNA-seq data in publications from the Allen Institute for nearly a decade. These examples below show how you can create similar plots for your data.

scrattch.vis plots - Example code for creating many types of plots, including dot plots, heatmaps, bar plots, violin plots, fire plots, beeswarm plots, and box plots.
Constellation diagram - An example for plotting a constellation diagram plot (points are represent clusters, with links showing cluster connections).
Hierarchical Tree - An example for plotting a hierarchical tree (not dendrogram!) based on result from hierarchical clustering.
Sunburst diagram - An example for plotting a sunburst diagram based on result from hierarchical clustering. (These are the circle plots on the Cell Type Knowledge Explorer .

Clustering tutorials and vignettes include additional examples for visualizing dendrograms, confusion matrices, tSNE (or UMAP) plots, and heatmaps.

Creating a cell type taxonomy

Clustering provides a critical first step in defining a cell type taxonomy, by defining cell types of the highest resolution in a hierarchy. However, for many downstream use cases (e.g., integration with CELLxGENE) it is critical to have the data and associated data in a standard format with a standard schema. These examples describe how to convert your project into the Allen Institute Taxonomy format for use with other scrattch functionalities, such as cell type mapping.

Build a basic AIT file - This example provides the basics for creating a new taxonomy compatible with scrattch.mapping mapping functions using the data from tasic2016data as a starting point.
Create a human MTG taxonomy in AIT format with a neuron only 'child' taxonomy - This example provides a step-by-step process for downloading human MTG data from adult neurotypical humans along with the associated SEA-AD taxonomy (from here), converting it to an AIT file that aligns with the AIT schema, and adding a child taxonomy subsetting to only neuronal types for use with Patch-seq mapping (see scrattch.patchseq library).
Available AIT files - Many cell type taxonomies from the Allen Institute can be found at this single link.
[Large data sets] - Future python scripts will allow creation of AIT files for cells with more than a few hundred thousand cells. Stay tuned!

Mapping user data to a cell type taxonomy

In many cases it is useful to transfer cell type labels from an existing cell type taxonomy to user data (called "mapping") in addition to or instead of performing de novo clustering. Note that any AIT file created using scrattch.taxonomy should be compatible with these examples.

Run Flat, Tree, and Seurat taxonomy mapping - This example shows how to use scrattch.mapping for standard taxonomy mapping.
Mapping to HMBA Basal Ganglia AIT - This tutorial shows how to map against the HMBA Human and Macaque Basal Ganglia consensus taxonomies.
MapMyCells - A drag and drop GUI for mapping user data to select cell type taxonomies hosted on Allen Brain Map (no code required!).
cell_type_mapper - the backbone of MapMyCells and a preferred scrattch.mapping algorithm. The main page includes multiple detailed use cases in python, and is the recommended mapping strategy for large taxonomies.

QC and mapping of patch-seq data

The scrattch suite provides specialized scripts for analysis of transcriptomics data collected from patch-seq experiments, including extension of reference taxonomies, definition of QC metrics, and cell type mapping. We provide a couple of examples describing how to apply these scripts.

Map against a small mouse PatchSeq taxonomy - This example provides the basics for updating a taxonomy to be compatible with patch-seq style mapping and visualization, and for collecting QC metrics of potential use to everyone. This is a continuation of examples in scrattch.taxonomy and scrattch.mapping building data from Tasic et al 2016 as reference and inhibitory neurons from Gouwens, Sorensen, et al 2020 as query.
Build and map against a human MTG PatchSeq taxonomy - Similar example using data from the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) as reference and patch-seq data from layer 2-3, excitatory neurons from Berg et al (2021) as query.

Analysis of spatial transcriptomics data

We encourage the use of the mapping algorithms discussed above for mapping reference cell types to (cell-centric) spatial transcriptomics data set. In addition mfishtools include code for creation of marker gene panels, and sanity checking and mapping spatial transcriptomics data sets.

Building a combinatorial marker gene panel for spatial transcriptomics - Example demonstrating how to generate a computationally "optimal" marker gene panel based on reference single cell RNAseq data. Relevant statistics and plots show the predicted success for the panel.
Python version of mfishtools - Includes a couple of Jupyter notebooks reproducing the above example in python, with a few modifications.
Mapping cells from spatial transcriptomics data sets to reference cell types - Example showing a modified correlation-based cell type mapping method optimized for (historic) spatial transcriptomics data. Also includes code for predicting the accuracy of the calls based on reference data. Note that this script is experimental and results should be confirmed.

Scrattch publications

As of August 2025, more than 70 publications from the Allen Institute and external scientists have cited and/or used functions from the scrattch suite and related R and python libraries. If you are aware of any relevant publications not listed in this table, please post an issue on GitHub.

Usage and contribution

scrattch is under active development. Please reach out if you have any challenges or suggestions, and feel free to contibute to the code base!

License

The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch/blob/master/LICENSE.

Contribution Agreement

If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch/blob/master/CONTRIBUTION.

Level of Support

We frequently update the child packages of this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests. We encourage community involvement in child packages directly, rather than through the scrattch umbrella package, when appropriate.

Contributors

The scrattch suite includes code developed by the following individuals (listed alphabetically): Lucas Graybuck, Nelson Johansen, Inkar Kapen, Changkyu Lee, Jeremy Miller, Cindy van Velthoven, and Zizhen Yao. Contributors to related tools can be found on their respective landing pages.

scrattch

scrattch

Scrattch packages

Find a scrattch function

Related content

Installation

Using docker (RECOMMENDED)

Installing scrattch in R

Installing previous versions

Use cases

Clustering single cell/nucleus data

Visualizing single cell/nucleus data

Creating a cell type taxonomy

Mapping user data to a cell type taxonomy

QC and mapping of patch-seq data

Analysis of spatial transcriptomics data

Scrattch publications

Usage and contribution

License

Contribution Agreement

Level of Support

Contributors

Installing `scrattch` in R