Getting started

Getting started#

Data associated with the Allen Brain Cell Atlas is hosted on Amazon Web Services (AWS) in an S3 bucket as a AWS Public Dataset. No account or login is required. The S3 bucket is located here arn:aws:s3:::allen-brain-cell-atlas. You will need to be connected to the internet to run this notebook.

Each data release has an associated manifest.json which lists all the specific version of directories and files that are part of the release. We recommend using the AbcProjectCache to download the data.

Expression matricies are stored in the anndata h5ad format and need to be downloaded to a local file system for usage.

This notebook shows how to use the AbcProjectCache to download the data required for the tutorials.

Below we install the python library we will be using throughout to this python enviroment.

pip install "abc_atlas_access[notebooks] @ git+https://github.com/alleninstitute/abc_atlas_access.git"

Collecting abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git (from abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git)
  Cloning https://github.com/alleninstitute/abc_atlas_access.git to /tmp/pip-install-pcukg7fd/abc-atlas-access_76c099cc5bd749be864e0a2486475e8c
  Running command git clone --quiet https://github.com/alleninstitute/abc_atlas_access.git /tmp/pip-install-pcukg7fd/abc-atlas-access_76c099cc5bd749be864e0a2486475e8c
  Resolved https://github.com/alleninstitute/abc_atlas_access.git to commit a82a6770c99ad166105c3e6fccd47f31ee69b44c
  Installing build dependencies ... ?25ldone
?25h  Getting requirements to build wheel ... ?25ldone
?25h  Preparing metadata (pyproject.toml) ... ?25ldone
?25hRequirement already satisfied: anndata in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (0.11.3)
Requirement already satisfied: boto3 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.35.42)
Requirement already satisfied: numpy in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.1.2)
Requirement already satisfied: pandas in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.2.3)
Requirement already satisfied: pydantic in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.9.2)
Requirement already satisfied: tqdm in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (4.66.5)
Requirement already satisfied: matplotlib in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (3.9.2)
Requirement already satisfied: scipy in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.14.1)
Requirement already satisfied: simpleitk in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.4.0)
Requirement already satisfied: array-api-compat!=1.5,>1.4 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.9)
Requirement already satisfied: h5py>=3.7 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (3.12.1)
Requirement already satisfied: natsort in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (8.4.0)
Requirement already satisfied: packaging>=20.0 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (24.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2024.2)
Requirement already satisfied: tzdata>=2022.7 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2024.2)
Requirement already satisfied: botocore<1.36.0,>=1.35.42 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.35.42)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (0.10.3)
Requirement already satisfied: contourpy>=1.0.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.4.7)
Requirement already satisfied: pillow>=8 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (3.2.0)
Requirement already satisfied: annotated-types>=0.6.0 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.23.4)
Requirement already satisfied: typing-extensions>=4.6.1 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (4.12.2)
Requirement already satisfied: urllib3!=2.2.0,<3,>=1.25.4 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from botocore<1.36.0,>=1.35.42->boto3->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (2.2.3)
Requirement already satisfied: six>=1.5 in /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->abc_atlas_access@ git+https://github.com/alleninstitute/abc_atlas_access.git->abc_atlas_access[notebooks]@ git+https://github.com/alleninstitute/abc_atlas_access.git) (1.16.0)
Note: you may need to restart the kernel to use updated packages.

After installing these new packages we need to restart the python kernel in this notebook. This can either be done by selecting Restart Kernel... under the Kernel drop down menu above or uncommenting and running the cell below.

get_ipython().kernel.do_shutdown(restart=True)

{'status': 'ok', 'restart': True}

IPython magic command to render matplotlib plots.

from pathlib import Path
from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

Using the cache#

Below we show how to setup up the cache to download from S3, how to list and switch to a different data release, and additionally how to list the directories available, their size, and the files in that directory.

Setup the AbcProjectCache object by specifying a directory and calling from_cache_dir as shown below. We also print what version of the manifest is being currently loaded by the cache. This will automatically instantiate the cache and set it up to either download data via a AWS S3 enabled cache or to load it through local read only cache depending on if the user has write access. The later is useful if accessing the data directly through a s3fs-fuse or similar mount of the AWS S3 bucket directly such as on CodeOcean.

Users can also specify a download enabled or read only local cache explicitly by using the funcitons from_s3_cache and from_local_cache respectively.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(download_base)

abc_cache.current_manifest

'releases/20250531/manifest.json'

List the all of the different releases available and usable by the cache object we have just loaded.

abc_cache.list_manifest_file_names

['releases/20230630/manifest.json',
 'releases/20230830/manifest.json',
 'releases/20231215/manifest.json',
 'releases/20240330/manifest.json',
 'releases/20240831/manifest.json',
 'releases/20241115/manifest.json',
 'releases/20241130/manifest.json',
 'releases/20250131/manifest.json',
 'releases/20250331/manifest.json',
 'releases/20250531/manifest.json']

We can switch to a specific manifest and release version of the data using the load_manifest method. This determines what version of the released data the cache will download/return to the user. The cache will keep track of which version was last used across sessions. Upon instantiating a cache, the current manifest can be viewed with the method: current_manifest. Note that a warning will be thrown if the manifest loaded by the cache is older than the most recent manifest available.

Below we show an example of loading an older manifest. Any of the strings returned by list_manifest_file_names will be valid manifests, however, we’ll stick to the current manifest for this tutorial to avoid confusion.

abc_cache.load_manifest('releases/20230630/manifest.json')
print("old manifest loaded:", abc_cache.current_manifest)

# Return to the latest manifest
abc_cache.load_latest_manifest()
print("after latest manifest loaded:", abc_cache.current_manifest)

/allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:575: OutdatedManifestWarning: 

The manifest file you are loading is not the most up to date manifest file available for this dataset. The most up to data manifest file available for this dataset is 

releases/20250531/manifest.json

To see the differences between these manifests,run

type.compare_manifests('releases/20250531/manifest.json', 'releases/20230630/manifest.json')

To see all of the manifest files currently downloaded onto your local system, run

self.list_all_downloaded_manifests()

If you just want to load the latest manifest, run

self.load_latest_manifest()

  warnings.warn(msg, OutdatedManifestWarning)

old manifest loaded: releases/20230630/manifest.json
after latest manifest loaded: releases/20250531/manifest.json

We can list all available directories in the release we loaded using the method below. We can then list all the available data and metadata files in those directories. Note that the cache will raise an exception if the requested kind of files (data files [e.g. h5ad expression_matricies, nii.gz image_volumes] or metadata files [e.g. csv files]) are not available in the directory.

abc_cache.list_directories

['ASAP-PMDBS-10X',
 'ASAP-PMDBS-taxonomy',
 'Allen-CCF-2020',
 'HMBA-10xMultiome-BG',
 'HMBA-10xMultiome-BG-Aligned',
 'HMBA-BG-taxonomy-CCN20250428',
 'MERFISH-C57BL6J-638850',
 'MERFISH-C57BL6J-638850-CCF',
 'MERFISH-C57BL6J-638850-imputed',
 'MERFISH-C57BL6J-638850-sections',
 'SEAAD',
 'SEAAD-taxonomy',
 'WHB-10Xv3',
 'WHB-taxonomy',
 'WMB-10X',
 'WMB-10XMulti',
 'WMB-10Xv2',
 'WMB-10Xv3',
 'WMB-neighborhoods',
 'WMB-taxonomy',
 'Zeng-Aging-Mouse-10Xv3',
 'Zeng-Aging-Mouse-WMB-taxonomy',
 'Zhuang-ABCA-1',
 'Zhuang-ABCA-1-CCF',
 'Zhuang-ABCA-2',
 'Zhuang-ABCA-2-CCF',
 'Zhuang-ABCA-3',
 'Zhuang-ABCA-3-CCF',
 'Zhuang-ABCA-4',
 'Zhuang-ABCA-4-CCF']

abc_cache.list_data_files('WMB-10Xv2')

['WMB-10Xv2-CTXsp/log2',
 'WMB-10Xv2-CTXsp/raw',
 'WMB-10Xv2-HPF/log2',
 'WMB-10Xv2-HPF/raw',
 'WMB-10Xv2-HY/log2',
 'WMB-10Xv2-HY/raw',
 'WMB-10Xv2-Isocortex-1/log2',
 'WMB-10Xv2-Isocortex-1/raw',
 'WMB-10Xv2-Isocortex-2/log2',
 'WMB-10Xv2-Isocortex-2/raw',
 'WMB-10Xv2-Isocortex-3/log2',
 'WMB-10Xv2-Isocortex-3/raw',
 'WMB-10Xv2-Isocortex-4/log2',
 'WMB-10Xv2-Isocortex-4/raw',
 'WMB-10Xv2-MB/log2',
 'WMB-10Xv2-MB/raw',
 'WMB-10Xv2-OLF/log2',
 'WMB-10Xv2-OLF/raw',
 'WMB-10Xv2-TH/log2',
 'WMB-10Xv2-TH/raw']

abc_cache.list_metadata_files('WMB-taxonomy')

['cluster',
 'cluster_annotation_term',
 'cluster_annotation_term_set',
 'cluster_annotation_term_with_counts',
 'cluster_to_cluster_annotation_membership',
 'cluster_to_cluster_annotation_membership_color',
 'cluster_to_cluster_annotation_membership_pivoted']

Before we start downloading data, we can check how much total data is in a given directory for both data files and metadata files.

abc_cache.get_directory_data_size('WMB-10Xv2')

'104.16 GB'

abc_cache.get_directory_metadata_size('WMB-taxonomy')

'4.65 MB'

Downloading files#

The next set of examples shows how to download data to the directory you specified when setting up the cache object. There are two main ways of downloading the data: individually by file or by full directory.

Downloading all data files or metadata files in a directory.#

Here we show how one can download the full set of data files or metadata files contained in a directory in the release. Use the list_directories as a guide here as to what data is available. Here we download all the data in two directories we know to be small. Once the download of all files is complete, a list of Paths to the downloaded files is returned.

The user should be warned that several directories are significant in size, >100 GB. If a directory is over 10 GB in size total, the cache will warn the user when requesting to download the data in the directory.

allen_ccf_list = abc_cache.get_directory_data('Allen-CCF-2020')
print("Allen-CCF-2020 data files:\n\t", allen_ccf_list)

annotation_25.nii.gz: 100%|██████████| 3.63M/3.63M [00:00<00:00, 13.8MMB/s]
annotation_boundary_25.nii.gz: 100%|██████████| 3.13M/3.13M [00:00<00:00, 9.63MMB/s]
average_template_25.nii.gz: 100%|██████████| 33.0M/33.0M [00:01<00:00, 25.3MMB/s]  

Allen-CCF-2020 data files:
	 [PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_25.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_boundary_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_boundary_25.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/average_template_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/average_template_25.nii.gz')]

allen_ccf_list = abc_cache.get_directory_metadata('WMB-taxonomy')
print("WMB-taxonomy metadata files:\n\t", allen_ccf_list)

cluster.csv: 100%|██████████| 131k/131k [00:00<00:00, 892kMB/s]  
cluster_annotation_term.csv: 100%|██████████| 861k/861k [00:00<00:00, 4.37MMB/s] 
cluster_annotation_term_set.csv: 100%|██████████| 1.11k/1.11k [00:00<00:00, 17.6kMB/s]
cluster_annotation_term_with_counts.csv: 100%|██████████| 902k/902k [00:00<00:00, 5.01MMB/s] 
cluster_to_cluster_annotation_membership.csv: 100%|██████████| 2.21M/2.21M [00:00<00:00, 13.9MMB/s]
cluster_to_cluster_annotation_membership_color.csv: 100%|██████████| 239k/239k [00:00<00:00, 1.63MMB/s] 
cluster_to_cluster_annotation_membership_pivoted.csv: 100%|██████████| 531k/531k [00:00<00:00, 4.16MMB/s]

WMB-taxonomy metadata files:
	 [PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_annotation_term.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_annotation_term_set.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_annotation_term_with_counts.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_to_cluster_annotation_membership.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_color.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_pivoted.csv')]

Note that, after downloading the file successfully, running the get_directory_data or get_directory_metadata methods will return the list of the local paths without having to redownload the files.

allen_ccf_list = abc_cache.get_directory_data('Allen-CCF-2020')
print("Allen-CCF-2020 data files:\n\t", allen_ccf_list, "\n\n")
allen_ccf_list = abc_cache.get_directory_metadata('WMB-taxonomy')
print("WMB-taxonomy metadata files:\n\t", allen_ccf_list)

Allen-CCF-2020 data files:
	 [PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_25.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_boundary_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/annotation_boundary_25.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/average_template_10.nii.gz'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/image_volumes/Allen-CCF-2020/20250331/average_template_25.nii.gz')] 


WMB-taxonomy metadata files:
	 [PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_annotation_term.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_annotation_term_set.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_annotation_term_with_counts.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/cluster_to_cluster_annotation_membership.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_color.csv'), PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_pivoted.csv')]

Downloading individual files.#

The option also exists to download files individually. We can use list_directories and the methods list_data_files and list_metadata_files to guide us as to what is available to download. Below we will download one metadata file from the WMB-10X directory/dataset and one expression matrix data file from the WMB-10XMulti directory/dataset.

Downloading individual metadata files#

abc_cache.list_metadata_files('WMB-10X')

['cell_metadata',
 'cell_metadata_with_cluster_annotation',
 'example_genes_all_cells_expression',
 'gene',
 'region_of_interest_metadata']

abc_cache.get_metadata_path(directory='WMB-10X', file_name='gene')

gene.csv: 100%|██████████| 2.30M/2.30M [00:00<00:00, 5.43MMB/s]

PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/metadata/WMB-10X/20241115/gene.csv')

The cache can also return a dataframe for metadata objects. They are loaded with a generic index. Note that when using this method, it can accept additional argments that will be passed to the pandas.read_csv method. Examples of this are used throughout the notebooks in this repo.

abc_cache.get_metadata_dataframe(directory='WMB-10X', file_name='gene')

	gene_identifier	gene_symbol	name	mapped_ncbi_identifier	comment
0	ENSMUSG00000051951	Xkr4	X-linked Kx blood group related 4	NCBIGene:497097	NaN
1	ENSMUSG00000089699	Gm1992	predicted gene 1992	NaN	NaN
2	ENSMUSG00000102331	Gm19938	predicted gene, 19938	NaN	NaN
3	ENSMUSG00000102343	Gm37381	predicted gene, 37381	NaN	NaN
4	ENSMUSG00000025900	Rp1	retinitis pigmentosa 1 (human)	NCBIGene:19888	NaN
...	...	...	...	...	...
32280	ENSMUSG00000095523	AC124606.1	PRAME family member 8-like	NCBIGene:100038995	no expression
32281	ENSMUSG00000095475	AC133095.2	uncharacterized LOC545763	NCBIGene:545763	no expression
32282	ENSMUSG00000094855	AC133095.1	uncharacterized LOC620639	NCBIGene:620639	no expression
32283	ENSMUSG00000095019	AC234645.1	NaN	NaN	no expression
32284	ENSMUSG00000095041	AC149090.1	NaN	NaN	NaN

32285 rows × 5 columns

Downloading individual data files#

abc_cache.list_data_files('WMB-10XMulti')

['WMB-10XMulti/log2', 'WMB-10XMulti/raw']

Note how log2 and raw is added to the end of the file name returned by the above function and used below. If we were not to specify this in the input, the code will throw an error describing the ambiguity.

abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti/log2')

WMB-10XMulti-log2.h5ad: 100%|██████████| 89.3M/89.3M [00:03<00:00, 27.2MMB/s]  

PosixPath('/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/expression_matrices/WMB-10XMulti/20230830/WMB-10XMulti-log2.h5ad')

abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti')

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[18], line 1
----> 1 abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti')

File /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/abc_project_cache.py:508, in AbcProjectCache.get_data_path(self, directory, file_name, force_download, skip_hash_check)
   data_path = self._get_local_path(
       directory=directory,
       file_name=file_name
   )
else:
--> 508     data_path = self.cache.download_data(
       directory=directory,
       file_name=file_name,
       force_download=force_download,
       skip_hash_check=skip_hash_check
   )
return data_path

File /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:836, in CloudCacheBase.download_data(self, directory, file_name, force_download, skip_hash_check)
def download_data(
   self,
   directory: str,
   (...)
   skip_hash_check: bool = False
) -> Path:
   """
   Return the local path to a data file, downloading the file
   if necessary
   (...)
       If the file cannot be downloaded
   """
--> 836     super_attributes = self.data_path(directory=directory,
                                     file_name=file_name)
   file_attributes = super_attributes['file_attributes']
   # If the file exists, check that it was downloaded successfully.

File /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:403, in BasicLocalCache.data_path(self, directory, file_name)
def data_path(self, directory: str, file_name: str) -> dict:
   """
   Return the local path to a data file, and test for the
   file's existence
   (...)
       If the file cannot be downloaded
   """
--> 403     output = self._get_file_path(directory=directory, file_name=file_name)
   return output

File /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:329, in BasicLocalCache._get_file_path(self, directory, file_name)
def _get_file_path(self, directory: str, file_name: str) -> dict:
   """
   Return the local path to a data file, and test for the
   file's existence.
   (...)
       If the file cannot be downloaded
   """
--> 329     file_attributes = self._manifest.get_file_attributes(
       directory=directory,
       file_name=file_name
   )
   exists = self._file_exists(file_attributes)
   local_path = file_attributes.local_path

File /allen/aibs/informatics/chris.morrison/miniconda/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/manifest.py:238, in Manifest.get_file_attributes(self, directory, file_name)
           file_attributes = self._create_file_attributes(
               remote_path=files_data[kind]["files"][file_type][
                   'url'],
   (...)
               file_hash=files_data[kind]["files"][file_type]['file_hash']  # noqa: E501
           )
       elif kind is None and "files" not in files_data.keys():
--> 238             raise KeyError(
               f"File {file_name} found in directory but multiple "
               f"files found: {list(files_data.keys())}. Please "
               "specify the file name as one of "
               f"{['%s/%s' % (file_name, key) for key in files_data.keys()]}"  # noqa: E501
           )
if file_attributes is None:
   raise KeyError(
       f"File {file_name} not found in directory {directory}."
   )

KeyError: "File WMB-10XMulti found in directory but multiple files found: ['log2', 'raw']. Please specify the file name as one of ['WMB-10XMulti/log2', 'WMB-10XMulti/raw']"

Advanced Options#

Forcing the cache to redownload data#

For all methods that download files, the option exists to force the cache to redownload the file(s). This can be useful if the downloaded file has become corrupted or accidentially deleted/changed. Below are examples of using it while downloading an inividual file or a full directory of files.

abc_cache.get_metadata_dataframe(directory='WMB-10X', file_name='gene', force_download=True)

gene.csv: 100%|██████████| 2.30M/2.30M [00:00<00:00, 10.4MMB/s]

	gene_identifier	gene_symbol	name	mapped_ncbi_identifier	comment
0	ENSMUSG00000051951	Xkr4	X-linked Kx blood group related 4	NCBIGene:497097	NaN
1	ENSMUSG00000089699	Gm1992	predicted gene 1992	NaN	NaN
2	ENSMUSG00000102331	Gm19938	predicted gene, 19938	NaN	NaN
3	ENSMUSG00000102343	Gm37381	predicted gene, 37381	NaN	NaN
4	ENSMUSG00000025900	Rp1	retinitis pigmentosa 1 (human)	NCBIGene:19888	NaN
...	...	...	...	...	...
32280	ENSMUSG00000095523	AC124606.1	PRAME family member 8-like	NCBIGene:100038995	no expression
32281	ENSMUSG00000095475	AC133095.2	uncharacterized LOC545763	NCBIGene:545763	no expression
32282	ENSMUSG00000094855	AC133095.1	uncharacterized LOC620639	NCBIGene:620639	no expression
32283	ENSMUSG00000095019	AC234645.1	NaN	NaN	no expression
32284	ENSMUSG00000095041	AC149090.1	NaN	NaN	NaN