scrattch
Single-cell RNA-seq analysis for transcriptomic type characterization (or "scrattch") is a suite of R and python scripts from the Allen Institute for Brain Science. The core R libraries are linked in the umbrella package called scrattch, which is modeled after the tidyverse package. You can use scrattch to automatically install or update some of the underlying packages and can run the remaining packages in docker environments. This page describes all core and adjacent scrattch content.
Scrattch packages
Scrattch includes several packages for clustering, mapping, and data formatting and visualization, along with example data for demos. These include:
Data preparation: file formats and schema
-
scrattch.taxonomy- Taxonomy building scripts for RNA-seq based taxonomies following the Allen Institute (AIT) schema. A table of available AIT-formatted taxonomies can be found here. scrattch.io- [deprecated]. Library for file handling and data formatting, replaced byscrattch.taxonomyin 2024.
Data analysis: cell clustering and mapping (also called label transfer)
-
scrattch.hicat- Hierarchical, iterative clustering for analysis of transcriptomics -
scrattch.bigcat- Clustering analysis for extremely large single cell dataset -
scrattch.mapping- Generalized mapping scripts for single cell RNA-seq, Patch-seq, spatial transcriptomics, or related data types -
scrattch.patchseq- Functions for generating additional QC metrics and output files for patch-seq analysis
Data visualization
-
scrattch.vis- Plotting functions for visualization of single cell RNA-seq data
Example data: small RNA-seq data sets
-
tasic2016data- Data from Tasic, et al. (2016), which is used for demos -
hodge2019data- Data subset from Hodge, et al. (2019), which is used for demos - A table of actual AIT-formatted taxonomies can be found here.
If you're interested in only one of these modules, you can install them separately. That said, we recommend using the installation instructions below to install combinations of scrattch packages to ensure they interact properly.
Find a scrattch function
Use this table to search for the exported functions in the core scrattch suite R libraries. While we have attempted to keep this table complete, we would encourage accessing the most up-to-date function descriptions via the '?' call in R. This table does not include python functions.
Related content
Several related websites and R and python libraries are outside of the scrattch suite, but are either used as part of scrattch libraries or directly work with scrattch outputs. These include (but are not limited to):
-
scrattch.example- Collection of notebooks for visualization of cell type taxonomies using constellation, dendrogram, and sunburst plots. -
bmark- Standardized strategies for benchmarking clustering and mapping results -
ReportCards- Companion GitHub page to scrattch.taxonomy which shows mapping predictions in determining cluster labels in a self-projection evaluation. -
transcriptomic_clustering- Python implementation of scrattch.hicat clustering -
cell_type_mapper- Python implementation of hierarchical mapping algorithm used inscrattch.mappingandMapMyCells - /
ACE- Web-based and R Shiny app for comparison of annotations, including clustering and mapping results -
mfishtools- Functions for gene selection and analysis of spatial transcriptomics data
Installation
We strongly encourage the use of docker to install the scrattch suite. In particular, several functions in scrattch.taxonomy and scrattch.mapping have known issues in certain R environments. That said, we provide options for installing and running R in both a docker environment and through standard R approaches.
Using docker (RECOMMENDED)
The current docker version is accessible through Docker Hub. As of 26 March 2025 the Docker version is docker://alleninst/scrattch:1.1.2. This corresponds to AIT (v1.1.2) (see the Allen Institute Taxonomy GitHub respository for details).
Docker can be run on some HPC environments that use singularity as follows:
- Non-interactive:
singularity shell --cleanenv [Docker version] Rscript YOUR_CODE.R - Interactive:
singularity shell --cleanenv [Docker version] - To create a sif file for use in other environments:
singularity pull scrattch:[#.#.#].sif [Docker version]
If you cannot figure out how to use Docker in your specific environment, please post an issue.
--WARNING-- The 1.1.2 docker listed above provides all the tooling for AIT (v1.1.2) and some functionality for scrattch.mapping, but is broken for hierarchical mapping and all scrattch.patchseq functionality. An update mid-April will bring both of these packages back up to speed with the AIT schema / format.
Installing scrattch in R
While we advise using the provided docker, you can install all scrattch packages along with their GitHub and BioConductor dependencies, as follows:
devtools::install_github("AllenInstitute/scrattch")
scrattch::install_scrattch()
Note that doMC may need to be installed manually from the download link at https://r-forge.r-project.org/R/?group_id=947 if you use Windows.
Installing previous versions
Two historical versions of scrattch are included in this package. These can be safely run without using docker, but are missing several recent components of the scrattch suite.
- scrattch_2023 is the stable version of the package prior to the release of
scrattch.mapping,scrattch.taxonomy,scrattch.patchseq, andhodge2019data - archive is the original package from ~2018, and should not be used for most folks
Should you need one of these previous versions, they can still be installed using:
devtools::install_github("AllenInstitute/scrattch", ref = "scrattch_2023") # -OR-
devtools::install_github("AllenInstitute/scrattch", ref = "archive")
Use cases
This section collects many existing use cases for all aspects of single cell analysis including clustering, visualization, mapping, and more.
Clustering single cell/nucleus data
Two examples workflows for clustering data are provided as part of the scrattch.hicat package. These both include full workflows including data QC, iterative analysis, and using visualizations and statistics to assess cluster quality. These functions will work for moderately-sized data sets (up to a few hundred-thousand cells and a few hundred cell types). Clusters are often grouped into hierarchies (e.g., "SST" and "PVALB" cells are both types of "GABAergic interneurons" in a process not discussed here.
- Clustering vignette - An overview of the main functions in
scrattch.hicat - Clustering tutorial - An interactive walkthrough of the major steps in clustering for
scrattch.hicat - [Large data sets] - For data sets with more than a few hundred thousand cells,
scrattch.bigcathas similar functionality (but no formal examples at this time). - Running iterative clustering in python - Another option performing iterative clustering using the
scrattch.hicatalgorithm is to install thetranscriptomic_clusteringpython library. This example shows how to use this script, and will work for data sets of any size.
Visualizing single cell/nucleus data
The scrattch suite has been used to visualize single cell/nucleus RNA-seq data in publications from the Allen Institute for nearly a decade. These examples below show how you can create similar plots for your data.
-
scrattch.visplots - Example code for creating many types of plots, including dot plots, heatmaps, bar plots, violin plots, fire plots, beeswarm plots, and box plots. - Constellation diagram - An example for plotting a constellation diagram plot (points are represent clusters, with links showing cluster connections).
- Hierarchical Tree - An example for plotting a hierarchical tree (not dendrogram!) based on result from hierarchical clustering.
- Sunburst diagram - An example for plotting a sunburst diagram based on result from hierarchical clustering. (These are the circle plots on the Cell Type Knowledge Explorer.
- Clustering tutorials and vignettes include additional examples for visualizing dendrograms, confusion matrices, tSNE (or UMAP) plots, and heatmaps.
Creating a cell type taxonomy
Clustering provides a critical first step in defining a cell type taxonomy, by defining cell types of the highest resolution in a hierarchy. However, for many downstream use cases (e.g., integration with CELLxGENE) it is critical to have the data and associated data in a standard format with a standard schema. These examples describe how to convert your project into the Allen Institute Taxonomy format for use with other scrattch functionalities, such as cell type mapping.
- Build a basic AIT file - This example provides the basics for creating a new taxonomy compatible with scrattch.mapping mapping functions using the data from
tasic2016dataas a starting point. - Create a human MTG taxonomy in AIT format with a neuron only 'child' taxonomy - This example provides a step-by-step process for downloading human MTG data from adult neurotypical humans along with the associated SEA-AD taxonomy (from here), converting it to an AIT file that aligns with the AIT schema, and adding a child taxonomy subsetting to only neuronal types for use with Patch-seq mapping (see
scrattch.patchseqlibrary). - Available AIT files - Many cell type taxonomies from the Allen Institute can be found at this single link.
- [Large data sets] - Future python scripts will allow creation of AIT files for cells with more than a few hundred thousand cells. Stay tuned!
Mapping user data to a cell type taxonomy
In many cases it is useful to transfer cell type labels from an existing cell type taxonomy to user data (called "mapping") in addition to or instead of performing de novo clustering. Note that any AIT file created using scrattch.taxonomy should be compatible with these examples.
- Run Flat, Tree, and Seurat taxonomy mapping - This example shows how to use
scrattch.mappingfor standard taxonomy mapping. - Mapping to HMBA Basal Ganglia AIT - This tutorial shows how to map against the HMBA Human and Macaque Basal Ganglia consensus taxonomies.
- MapMyCells - A drag and drop GUI for mapping user data to select cell type taxonomies hosted on Allen Brain Map (no code required!).
- cell_type_mapper - the backbone of MapMyCells and a preferred
scrattch.mappingalgorithm. The main page includes multiple detailed use cases in python, and is the recommended mapping strategy for large taxonomies.
QC and mapping of patch-seq data
The scrattch suite provides specialized scripts for analysis of transcriptomics data collected from patch-seq experiments, including extension of reference taxonomies, definition of QC metrics, and cell type mapping. We provide a couple of examples describing how to apply these scripts.
- Map against a small mouse PatchSeq taxonomy - This example provides the basics for updating a taxonomy to be compatible with patch-seq style mapping and visualization, and for collecting QC metrics of potential use to everyone. This is a continuation of examples in
scrattch.taxonomyandscrattch.mappingbuilding data from Tasic et al 2016 as reference and inhibitory neurons from Gouwens, Sorensen, et al 2020 as query. - Build and map against a human MTG PatchSeq taxonomy - Similar example using data from the Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) as reference and patch-seq data from layer 2-3, excitatory neurons from Berg et al (2021) as query.
Analysis of spatial transcriptomics data
We encourage the use of the mapping algorithms discussed above for mapping reference cell types to (cell-centric) spatial transcriptomics data set. In addition mfishtools include code for creation of marker gene panels, and sanity checking and mapping spatial transcriptomics data sets.
- Building a combinatorial marker gene panel for spatial transcriptomics - Example demonstrating how to generate a computationally "optimal" marker gene panel based on reference single cell RNAseq data. Relevant statistics and plots show the predicted success for the panel.
- Python version of mfishtools - Includes a couple of Jupyter notebooks reproducing the above example in python, with a few modifications.
- Mapping cells from spatial transcriptomics data sets to reference cell types - Example showing a modified correlation-based cell type mapping method optimized for (historic) spatial transcriptomics data. Also includes code for predicting the accuracy of the calls based on reference data. Note that this script is experimental and results should be confirmed.
Scrattch publications
As of August 2025, more than 70 publications from the Allen Institute and external scientists have cited and/or used functions from the scrattch suite and related R and python libraries. If you are aware of any relevant publications not listed in this table, please post an issue on GitHub.
Usage and contribution
scrattch is under active development. Please reach out if you have any challenges or suggestions, and feel free to contibute to the code base!
License
The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch/blob/master/LICENSE.
Contribution Agreement
If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch/blob/master/CONTRIBUTION.
Level of Support
We frequently update the child packages of this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests. We encourage community involvement in child packages directly, rather than through the scrattch umbrella package, when appropriate.
Contributors
The scrattch suite includes code developed by the following individuals (listed alphabetically): Lucas Graybuck, Nelson Johansen, Inkar Kapen, Changkyu Lee, Jeremy Miller, Cindy van Velthoven, and Zizhen Yao. Contributors to related tools can be found on their respective landing pages.