scrattch

Single-cell RNA-seq analysis for transcriptomic type characterization (or "scrattch") is a suite of R and python scripts from the Allen Institute for Brain Science. The core R libraries are linked in the umbrella package called scrattch, which is modeled after the tidyverse package. You can use scrattch to automatically install or update some of the underlying packages and can run the remaining packages in docker environments. This page describes all core and adjacent scrattch content.

scrattch hex logo


Scrattch packages

Scrattch includes several packages for clustering, mapping, and data formatting and visualization, along with example data for demos. These include:

Data preparation: file formats and schema

Data analysis: cell clustering and mapping (also called label transfer)

Data visualization

Example data: small RNA-seq data sets

If you're interested in only one of these modules, you can install them separately. That said, we recommend using the installation instructions below to install combinations of scrattch packages to ensure they interact properly.

Find a scrattch function

Use this table to search for the exported functions in the core scrattch suite R libraries. While we have attempted to keep this table complete, we would encourage accessing the most up-to-date function descriptions via the '?' call in R. This table does not include python functions.

Several related websites and R and python libraries are outside of the scrattch suite, but are either used as part of scrattch libraries or directly work with scrattch outputs. These include (but are not limited to):



Installation

We strongly encourage the use of docker to install the scrattch suite. In particular, several functions in scrattch.taxonomy and scrattch.mapping have known issues in certain R environments. That said, we provide options for installing and running R in both a docker environment and through standard R approaches.

Using docker (RECOMMENDED)

The current docker version is accessible through Docker Hub. As of 26 March 2025 the Docker version is docker://alleninst/scrattch:1.1.2. This corresponds to AIT (v1.1.2) (see the Allen Institute Taxonomy GitHub respository for details).

Docker can be run on some HPC environments that use singularity as follows:

If you cannot figure out how to use Docker in your specific environment, please post an issue.

--WARNING-- The 1.1.2 docker listed above provides all the tooling for AIT (v1.1.2) and some functionality for scrattch.mapping, but is broken for hierarchical mapping and all scrattch.patchseq functionality. An update mid-April will bring both of these packages back up to speed with the AIT schema / format.

Installing scrattch in R

While we advise using the provided docker, you can install all scrattch packages along with their GitHub and BioConductor dependencies, as follows:

devtools::install_github("AllenInstitute/scrattch")
scrattch::install_scrattch()

Note that doMC may need to be installed manually from the download link at https://r-forge.r-project.org/R/?group_id=947 if you use Windows.

Installing previous versions

Two historical versions of scrattch are included in this package. These can be safely run without using docker, but are missing several recent components of the scrattch suite.

Should you need one of these previous versions, they can still be installed using:

devtools::install_github("AllenInstitute/scrattch", ref = "scrattch_2023") # -OR-
devtools::install_github("AllenInstitute/scrattch", ref = "archive")


Use cases

This section collects many existing use cases for all aspects of single cell analysis including clustering, visualization, mapping, and more.

Clustering single cell/nucleus data

Two examples workflows for clustering data are provided as part of the scrattch.hicat package. These both include full workflows including data QC, iterative analysis, and using visualizations and statistics to assess cluster quality. These functions will work for moderately-sized data sets (up to a few hundred-thousand cells and a few hundred cell types). Clusters are often grouped into hierarchies (e.g., "SST" and "PVALB" cells are both types of "GABAergic interneurons" in a process not discussed here.

Visualizing single cell/nucleus data

The scrattch suite has been used to visualize single cell/nucleus RNA-seq data in publications from the Allen Institute for nearly a decade. These examples below show how you can create similar plots for your data.

Creating a cell type taxonomy

Clustering provides a critical first step in defining a cell type taxonomy, by defining cell types of the highest resolution in a hierarchy. However, for many downstream use cases (e.g., integration with CELLxGENE) it is critical to have the data and associated data in a standard format with a standard schema. These examples describe how to convert your project into the Allen Institute Taxonomy format for use with other scrattch functionalities, such as cell type mapping.

Mapping user data to a cell type taxonomy

In many cases it is useful to transfer cell type labels from an existing cell type taxonomy to user data (called "mapping") in addition to or instead of performing de novo clustering. Note that any AIT file created using scrattch.taxonomy should be compatible with these examples.

QC and mapping of patch-seq data

The scrattch suite provides specialized scripts for analysis of transcriptomics data collected from patch-seq experiments, including extension of reference taxonomies, definition of QC metrics, and cell type mapping. We provide a couple of examples describing how to apply these scripts.

Analysis of spatial transcriptomics data

We encourage the use of the mapping algorithms discussed above for mapping reference cell types to (cell-centric) spatial transcriptomics data set. In addition mfishtools include code for creation of marker gene panels, and sanity checking and mapping spatial transcriptomics data sets.



Scrattch publications

As of August 2025, more than 70 publications from the Allen Institute and external scientists have cited and/or used functions from the scrattch suite and related R and python libraries. If you are aware of any relevant publications not listed in this table, please post an issue on GitHub.



Usage and contribution

scrattch is under active development. Please reach out if you have any challenges or suggestions, and feel free to contibute to the code base!

License

The license for this package is available on Github at: https://github.com/AllenInstitute/scrattch/blob/master/LICENSE.

Contribution Agreement

If you contribute code to this repository through pull requests or other mechanisms, you are subject to the Allen Institute Contribution Agreement, which is available in full at: https://github.com/AllenInstitute/scrattch/blob/master/CONTRIBUTION.

Level of Support

We frequently update the child packages of this tool with no fixed schedule. Community involvement is encouraged through both issues and pull requests. We encourage community involvement in child packages directly, rather than through the scrattch umbrella package, when appropriate.

Contributors

The scrattch suite includes code developed by the following individuals (listed alphabetically): Lucas Graybuck, Nelson Johansen, Inkar Kapen, Changkyu Lee, Jeremy Miller, Cindy van Velthoven, and Zizhen Yao. Contributors to related tools can be found on their respective landing pages.