Getting started#

Data associated with the Allen Brain Cell Atlas is hosted on Amazon Web Services (AWS) in an S3 bucket as a AWS Public Dataset. No account or login is required. The S3 bucket is located here arn:aws:s3:::allen-brain-cell-atlas. You will need to be connected to the internet to run this notebook.

Each data release has an associated manifest.json which lists all the specific version of directories and files that are part of the release. We recommend using the AbcProjectCache to download the data.

Expression matricies are stored in the anndata h5ad format and need to be downloaded to a local file system for usage.

This notebook shows how to use the AbcProjectCache to download the data required for the tutorials.

Below we install the python library we will be using throughout to this python enviroment.

pip install -U git+https://github.com/alleninstitute/abc_atlas_access
Collecting git+https://github.com/alleninstitute/abc_atlas_access@u/morriscb/updateJupyterBook
  Cloning https://github.com/alleninstitute/abc_atlas_access (to revision u/morriscb/updateJupyterBook) to /private/var/folders/kc/7glrmt5n67x16yj_tg86t49c0000gp/T/pip-req-build-k_k2xa_5
  Running command git clone --filter=blob:none --quiet https://github.com/alleninstitute/abc_atlas_access /private/var/folders/kc/7glrmt5n67x16yj_tg86t49c0000gp/T/pip-req-build-k_k2xa_5
  Running command git checkout -b u/morriscb/updateJupyterBook --track origin/u/morriscb/updateJupyterBook
  Switched to a new branch 'u/morriscb/updateJupyterBook'
  branch 'u/morriscb/updateJupyterBook' set up to track 'origin/u/morriscb/updateJupyterBook'.
  Resolved https://github.com/alleninstitute/abc_atlas_access to commit 5cbeb4e1fe7ebb6492696b6dae9a76697b8c4cd0
  Installing build dependencies ... ?25ldone
?25h  Getting requirements to build wheel ... ?25ldone
?25h  Preparing metadata (pyproject.toml) ... ?25ldone
?25hRequirement already satisfied: anndata in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (0.10.5.post1)
Requirement already satisfied: boto3 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (1.34.55)
Requirement already satisfied: ghp-import in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (2.1.0)
Requirement already satisfied: matplotlib in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (3.8.3)
Requirement already satisfied: moto in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (5.0.2)
Requirement already satisfied: numpy in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (1.26.4)
Requirement already satisfied: pandas in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (2.2.1)
Requirement already satisfied: pydantic in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (2.6.3)
Requirement already satisfied: pytest in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (8.0.2)
Requirement already satisfied: requests in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (2.31.0)
Requirement already satisfied: scipy in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (1.12.0)
Requirement already satisfied: simpleitk in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (2.3.1)
Requirement already satisfied: tqdm in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from abc_atlas_access==0.0.1) (4.66.2)
Requirement already satisfied: array-api-compat in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access==0.0.1) (1.4.1)
Requirement already satisfied: h5py>=3 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access==0.0.1) (3.10.0)
Requirement already satisfied: natsort in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access==0.0.1) (8.4.0)
Requirement already satisfied: packaging>=20 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from anndata->abc_atlas_access==0.0.1) (23.2)
Requirement already satisfied: python-dateutil>=2.8.2 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access==0.0.1) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access==0.0.1) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pandas->abc_atlas_access==0.0.1) (2024.1)
Requirement already satisfied: botocore<1.35.0,>=1.34.55 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access==0.0.1) (1.34.55)
Requirement already satisfied: jmespath<2.0.0,>=0.7.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access==0.0.1) (1.0.1)
Requirement already satisfied: s3transfer<0.11.0,>=0.10.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from boto3->abc_atlas_access==0.0.1) (0.10.0)
Requirement already satisfied: contourpy>=1.0.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (1.2.0)
Requirement already satisfied: cycler>=0.10 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (4.49.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (1.4.5)
Requirement already satisfied: pillow>=8 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (10.2.0)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from matplotlib->abc_atlas_access==0.0.1) (3.1.1)
Requirement already satisfied: cryptography>=3.3.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from moto->abc_atlas_access==0.0.1) (42.0.5)
Requirement already satisfied: xmltodict in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from moto->abc_atlas_access==0.0.1) (0.13.0)
Requirement already satisfied: werkzeug!=2.2.0,!=2.2.1,>=0.5 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from moto->abc_atlas_access==0.0.1) (3.0.1)
Requirement already satisfied: responses>=0.15.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from moto->abc_atlas_access==0.0.1) (0.25.0)
Requirement already satisfied: Jinja2>=2.10.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from moto->abc_atlas_access==0.0.1) (3.1.3)
Requirement already satisfied: charset-normalizer<4,>=2 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from requests->abc_atlas_access==0.0.1) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from requests->abc_atlas_access==0.0.1) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from requests->abc_atlas_access==0.0.1) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from requests->abc_atlas_access==0.0.1) (2024.2.2)
Requirement already satisfied: annotated-types>=0.4.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access==0.0.1) (0.6.0)
Requirement already satisfied: pydantic-core==2.16.3 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access==0.0.1) (2.16.3)
Requirement already satisfied: typing-extensions>=4.6.1 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pydantic->abc_atlas_access==0.0.1) (4.10.0)
Requirement already satisfied: iniconfig in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pytest->abc_atlas_access==0.0.1) (2.0.0)
Requirement already satisfied: pluggy<2.0,>=1.3.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from pytest->abc_atlas_access==0.0.1) (1.4.0)
Requirement already satisfied: cffi>=1.12 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from cryptography>=3.3.1->moto->abc_atlas_access==0.0.1) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from Jinja2>=2.10.1->moto->abc_atlas_access==0.0.1) (2.1.5)
Requirement already satisfied: six>=1.5 in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from python-dateutil>=2.8.2->pandas->abc_atlas_access==0.0.1) (1.16.0)
Requirement already satisfied: pyyaml in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from responses>=0.15.0->moto->abc_atlas_access==0.0.1) (6.0.1)
Requirement already satisfied: pycparser in /Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages (from cffi>=1.12->cryptography>=3.3.1->moto->abc_atlas_access==0.0.1) (2.21)
Building wheels for collected packages: abc_atlas_access
  Building wheel for abc_atlas_access (pyproject.toml) ... ?25ldone
?25h  Created wheel for abc_atlas_access: filename=abc_atlas_access-0.0.1-py3-none-any.whl size=20811 sha256=d8098f32f446e44b53e54ad633cdea61188d55b3cc708d45cb702c309ee9224c
  Stored in directory: /private/var/folders/kc/7glrmt5n67x16yj_tg86t49c0000gp/T/pip-ephem-wheel-cache-c8464kd3/wheels/d8/36/c5/498927e9ff1fdc24cca5fd8a35f6daeb0d2b61623f3ee82a07
Successfully built abc_atlas_access
Installing collected packages: abc_atlas_access
  Attempting uninstall: abc_atlas_access
    Found existing installation: abc_atlas_access 0.0.1
    Uninstalling abc_atlas_access-0.0.1:
      Successfully uninstalled abc_atlas_access-0.0.1
Successfully installed abc_atlas_access-0.0.1
Note: you may need to restart the kernel to use updated packages.

After installing these new packages we need to restart the python kernel in this notebook. This can either be done by selecting Restart Kernel... under the Kernel drop down menu above or uncommenting and running the cell below.

# get_ipython().kernel.do_shutdown()
from pathlib import Path
from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

Using the cache#

Below we show how to setup up the cache to download from S3, how to list and switch to a different data release, and additionally how to list the directories available, their size, and the files in that directory.

Setup the AbcProjectCache object by specifying a directory and calling from_s3_cache as shown below. We also print what version of the manifest is being currently loaded by the cache.

download_base = Path('../../abc_download_root') # Path to where you would like to write the downloaded data.
abc_cache = AbcProjectCache.from_s3_cache(download_base)
abc_cache.current_manifest
'releases/20240330/manifest.json'

List the all of the different releases available and usable by the cache object we have just loaded.

abc_cache.list_manifest_file_names
['releases/20230630/manifest.json',
 'releases/20230830/manifest.json',
 'releases/20231215/manifest.json',
 'releases/20240330/manifest.json']

We can switch to a specific manifest and release version of the data using the load_manifest method. This determines what version of the released data the cache will download/return to the user. The cache will keep track of which version was last used across sessions. Upon instantiating a cache, the current manifest can be viewed with the method: current_manifest. Note that a warning will be thrown if the manifest loaded by the cache is older than the most recent manifest available.

Below we show an example of loading an older manifest. Any of the strings returned by list_manifest_file_names will be valid manifests, however, we’ll stick to the current manifest for this tutorial to avoid confusion.

abc_cache.load_manifest('releases/20230630/manifest.json')
print("old manifest loaded:", abc_cache.current_manifest)

# Return to the latest manifest
abc_cache.load_latest_manifest()
print("after latest manifest loaded:", abc_cache.current_manifest)
/Users/chris.morrison/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:567: OutdatedManifestWarning: 

The manifest file you are loading is not the most up to date manifest file available for this dataset. The most up to data manifest file available for this dataset is 

releases/20240330/manifest.json

To see the differences between these manifests,run

type.compare_manifests('releases/20240330/manifest.json', 'releases/20230630/manifest.json')

To see all of the manifest files currently downloaded onto your local system, run

self.list_all_downloaded_manifests()

If you just want to load the latest manifest, run

self.load_latest_manifest()


  warnings.warn(msg, OutdatedManifestWarning)
old manifest loaded: releases/20230630/manifest.json
after latest manifest loaded: releases/20240330/manifest.json

We can list all available directories in the release we loaded using the method below. We can then list all the available data and metadata files in those directories. Note that the cache will raise an exception if the requested kind of files (data files [e.g. h5ad expression_matricies, nii.gz image_volumes] or metadata files [e.g. csv files]) are not available in the directory.

abc_cache.list_directories
['Allen-CCF-2020',
 'MERFISH-C57BL6J-638850',
 'MERFISH-C57BL6J-638850-CCF',
 'MERFISH-C57BL6J-638850-sections',
 'WHB-10Xv3',
 'WHB-taxonomy',
 'WMB-10X',
 'WMB-10XMulti',
 'WMB-10Xv2',
 'WMB-10Xv3',
 'WMB-neighborhoods',
 'WMB-taxonomy',
 'Zhuang-ABCA-1',
 'Zhuang-ABCA-1-CCF',
 'Zhuang-ABCA-2',
 'Zhuang-ABCA-2-CCF',
 'Zhuang-ABCA-3',
 'Zhuang-ABCA-3-CCF',
 'Zhuang-ABCA-4',
 'Zhuang-ABCA-4-CCF']
abc_cache.list_data_files('WMB-10Xv2')
['WMB-10Xv2-CTXsp/log2',
 'WMB-10Xv2-CTXsp/raw',
 'WMB-10Xv2-HPF/log2',
 'WMB-10Xv2-HPF/raw',
 'WMB-10Xv2-HY/log2',
 'WMB-10Xv2-HY/raw',
 'WMB-10Xv2-Isocortex-1/log2',
 'WMB-10Xv2-Isocortex-1/raw',
 'WMB-10Xv2-Isocortex-2/log2',
 'WMB-10Xv2-Isocortex-2/raw',
 'WMB-10Xv2-Isocortex-3/log2',
 'WMB-10Xv2-Isocortex-3/raw',
 'WMB-10Xv2-Isocortex-4/log2',
 'WMB-10Xv2-Isocortex-4/raw',
 'WMB-10Xv2-MB/log2',
 'WMB-10Xv2-MB/raw',
 'WMB-10Xv2-OLF/log2',
 'WMB-10Xv2-OLF/raw',
 'WMB-10Xv2-TH/log2',
 'WMB-10Xv2-TH/raw']
abc_cache.list_metadata_files('WMB-taxonomy')
['cluster',
 'cluster_annotation_term',
 'cluster_annotation_term_set',
 'cluster_annotation_term_with_counts',
 'cluster_to_cluster_annotation_membership',
 'cluster_to_cluster_annotation_membership_color',
 'cluster_to_cluster_annotation_membership_pivoted']

Before we start downloading data, we can check how much total data is in a given directory for both data files and metadata files.

abc_cache.get_directory_data_size('WMB-10Xv2')
'104.16 GB'
abc_cache.get_directory_metadata_size('WMB-taxonomy')
'4.65 MB'

Downloading files#

The next set of examples shows how to download data to the directory you specified when setting up the cache object. There are two main ways of downloading the data: individually by file or by full directory.

Downloading all data files or metadata files in a directory.#

Here we show how one can download the full set of data files or metadata files contained in a directory in the release. Use the list_directories as a guide here as to what data is available. Here we download all the data in two directories we know to be small. Once the download of all files is complete, a list of Paths to the downloaded files is returned.

The user should be warned that several directories are significant in size, >100 GB. If a directory is over 10 GB in size total, the cache will warn the user when requesting to download the data in the directory.

allen_ccf_list = abc_cache.get_directory_data('Allen-CCF-2020')
print("Allen-CCF-2020 data files:\n\t", allen_ccf_list)
annotation_10.nii.gz: 100%|██████████| 27.5M/27.5M [00:01<00:00, 22.8MMB/s]  
annotation_boundary_10.nii.gz: 100%|██████████| 27.4M/27.4M [00:01<00:00, 19.3MMB/s]  
average_template_10.nii.gz: 100%|██████████| 343M/343M [00:11<00:00, 29.2MMB/s]    
Allen-CCF-2020 data files:
	 [PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_boundary_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/average_template_10.nii.gz')]

allen_ccf_list = abc_cache.get_directory_metadata('WMB-taxonomy')
print("WMB-taxonomy metadata files:\n\t", allen_ccf_list)
cluster.csv: 100%|██████████| 131k/131k [00:00<00:00, 838kMB/s]  
cluster_annotation_term.csv: 100%|██████████| 861k/861k [00:00<00:00, 5.60MMB/s] 
cluster_annotation_term_set.csv: 100%|██████████| 1.11k/1.11k [00:00<00:00, 13.7kMB/s]
cluster_annotation_term_with_counts.csv: 100%|██████████| 902k/902k [00:00<00:00, 5.30MMB/s] 
cluster_to_cluster_annotation_membership.csv: 100%|██████████| 2.21M/2.21M [00:00<00:00, 15.0MMB/s]
cluster_to_cluster_annotation_membership_color.csv: 100%|██████████| 239k/239k [00:00<00:00, 1.99MMB/s]
cluster_to_cluster_annotation_membership_pivoted.csv: 100%|██████████| 531k/531k [00:00<00:00, 3.41MMB/s] 
WMB-taxonomy metadata files:
	 [PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_annotation_term.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_annotation_term_set.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_annotation_term_with_counts.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_to_cluster_annotation_membership.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_color.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_pivoted.csv')]

Note that, after downloading the file successfully, running the get_directory_data or get_directory_metadata methods will return the list of the local paths without having to redownload the files.

allen_ccf_list = abc_cache.get_directory_data('Allen-CCF-2020')
print("Allen-CCF-2020 data files:\n\t", allen_ccf_list, "\n\n")
allen_ccf_list = abc_cache.get_directory_metadata('WMB-taxonomy')
print("WMB-taxonomy metadata files:\n\t", allen_ccf_list)
Allen-CCF-2020 data files:
	 [PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_boundary_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/average_template_10.nii.gz')] 


WMB-taxonomy metadata files:
	 [PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_annotation_term.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_annotation_term_set.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_annotation_term_with_counts.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/cluster_to_cluster_annotation_membership.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_color.csv'), PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-taxonomy/20231215/views/cluster_to_cluster_annotation_membership_pivoted.csv')]

Downloading individual files.#

The option also exists to download files individually. We can use list_directories and the methods list_data_files and list_metadata_files to guide us as to what is available to download. Below we will download one metadata file from the WMB-10X directory/dataset and one expression matrix data file from the WMB-10XMulti directory/dataset.

Downloading individual metadata files#

abc_cache.list_metadata_files('WMB-10X')
['cell_metadata',
 'cell_metadata_with_cluster_annotation',
 'example_genes_all_cells_expression',
 'gene',
 'region_of_interest_metadata']
abc_cache.get_metadata_path(directory='WMB-10X', file_name='gene')
gene.csv: 100%|██████████| 2.30M/2.30M [00:00<00:00, 4.95MMB/s]
PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/WMB-10X/20231215/gene.csv')

The cache can also return a dataframe for metadata objects. They are loaded with a generic index. Note that when using this method, it can accept additional argments that will be passed to the pandas.read_csv method. Examples of this are used throughout the notebooks in this repo.

abc_cache.get_metadata_dataframe(directory='WMB-10X', file_name='gene')
gene_identifier gene_symbol name mapped_ncbi_identifier comment
0 ENSMUSG00000051951 Xkr4 X-linked Kx blood group related 4 NCBIGene:497097 NaN
1 ENSMUSG00000089699 Gm1992 predicted gene 1992 NaN NaN
2 ENSMUSG00000102331 Gm19938 predicted gene, 19938 NaN NaN
3 ENSMUSG00000102343 Gm37381 predicted gene, 37381 NaN NaN
4 ENSMUSG00000025900 Rp1 retinitis pigmentosa 1 (human) NCBIGene:19888 NaN
... ... ... ... ... ...
32280 ENSMUSG00000095523 AC124606.1 PRAME family member 8-like NCBIGene:100038995 no expression
32281 ENSMUSG00000095475 AC133095.2 uncharacterized LOC545763 NCBIGene:545763 no expression
32282 ENSMUSG00000094855 AC133095.1 uncharacterized LOC620639 NCBIGene:620639 no expression
32283 ENSMUSG00000095019 AC234645.1 NaN NaN no expression
32284 ENSMUSG00000095041 AC149090.1 NaN NaN NaN

32285 rows × 5 columns

Downloading individual data files#

abc_cache.list_data_files('WMB-10XMulti')
['WMB-10XMulti/log2', 'WMB-10XMulti/raw']

Note how log2 and raw is added to the end of the file name returned by the above function and used below. If we were not to specify this in the input, the code will throw an error describing the ambiguity.

abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti/log2')
WMB-10XMulti-log2.h5ad: 100%|██████████| 89.3M/89.3M [00:03<00:00, 24.0MMB/s]  
PosixPath('/Users/chris.morrison/src/abc_download_root/expression_matrices/WMB-10XMulti/20230830/WMB-10XMulti-log2.h5ad')
abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti')
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 abc_cache.get_data_path(directory='WMB-10XMulti', file_name='WMB-10XMulti')

File ~/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/abc_project_cache.py:477, in AbcProjectCache.get_data_path(self, directory, file_name, force_download, skip_hash_check)
    472     data_path = self._get_local_path(
    473         directory=directory,
    474         file_name=file_name
    475     )
    476 else:
--> 477     data_path = self.cache.download_data(
    478         directory=directory,
    479         file_name=file_name,
    480         force_download=force_download,
    481         skip_hash_check=skip_hash_check
    482     )
    483 return data_path

File ~/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:822, in CloudCacheBase.download_data(self, directory, file_name, force_download, skip_hash_check)
    788 def download_data(
    789     self,
    790     directory: str,
   (...)
    793     skip_hash_check: bool = False
    794 ) -> Path:
    795     """
    796     Return the local path to a data file, downloading the file
    797     if necessary
   (...)
    820         If the file cannot be downloaded
    821     """
--> 822     super_attributes = self.data_path(directory=directory,
    823                                       file_name=file_name)
    824     file_attributes = super_attributes['file_attributes']
    825     # If the file exists, check that it was downloaded successfully.

File ~/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:395, in BasicLocalCache.data_path(self, directory, file_name)
    366 def data_path(self, directory: str, file_name: str) -> dict:
    367     """
    368     Return the local path to a data file, and test for the
    369     file's existence
   (...)
    393         If the file cannot be downloaded
    394     """
--> 395     output = self._get_file_path(directory=directory, file_name=file_name)
    397     return output

File ~/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/cloud_cache.py:321, in BasicLocalCache._get_file_path(self, directory, file_name)
    292 def _get_file_path(self, directory: str, file_name: str) -> dict:
    293     """
    294     Return the local path to a data file, and test for the
    295     file's existence.
   (...)
    319         If the file cannot be downloaded
    320     """
--> 321     file_attributes = self._manifest.get_file_attributes(
    322         directory=directory,
    323         file_name=file_name
    324     )
    325     exists = self._file_exists(file_attributes)
    326     local_path = file_attributes.local_path

File ~/src/miniconda3/envs/abc_atlas_access/lib/python3.11/site-packages/abc_atlas_access/abc_atlas_cache/manifest.py:238, in Manifest.get_file_attributes(self, directory, file_name)
    226             file_attributes = self._create_file_attributes(
    227                 remote_path=files_data[kind]["files"][file_type][
    228                     'url'],
   (...)
    235                 file_hash=files_data[kind]["files"][file_type]['file_hash']  # noqa: E501
    236             )
    237         elif kind is None and "files" not in files_data.keys():
--> 238             raise KeyError(
    239                 f"File {file_name} found in directory but multiple "
    240                 f"files found: {list(files_data.keys())}. Please "
    241                 "specify the file name as one of "
    242                 f"{['%s/%s' % (file_name, key) for key in files_data.keys()]}"  # noqa: E501
    243             )
    244 if file_attributes is None:
    245     raise KeyError(
    246         f"File {file_name} not found in directory {directory}."
    247     )

KeyError: "File WMB-10XMulti found in directory but multiple files found: ['log2', 'raw']. Please specify the file name as one of ['WMB-10XMulti/log2', 'WMB-10XMulti/raw']"

Advanced Options#

Forcing the cache to redownload data#

For all methods that download files, the option exists to force the cache to redownload the file(s). This can be useful if the downloaded file has become corrupted or accidentially deleted/changed. Below are examples of using it while downloading an inividual file or a full directory of files.

abc_cache.get_metadata_dataframe(directory='WMB-10X', file_name='gene', force_download=True)
gene.csv: 100%|██████████| 2.30M/2.30M [00:00<00:00, 5.64MMB/s]
gene_identifier gene_symbol name mapped_ncbi_identifier comment
0 ENSMUSG00000051951 Xkr4 X-linked Kx blood group related 4 NCBIGene:497097 NaN
1 ENSMUSG00000089699 Gm1992 predicted gene 1992 NaN NaN
2 ENSMUSG00000102331 Gm19938 predicted gene, 19938 NaN NaN
3 ENSMUSG00000102343 Gm37381 predicted gene, 37381 NaN NaN
4 ENSMUSG00000025900 Rp1 retinitis pigmentosa 1 (human) NCBIGene:19888 NaN
... ... ... ... ... ...
32280 ENSMUSG00000095523 AC124606.1 PRAME family member 8-like NCBIGene:100038995 no expression
32281 ENSMUSG00000095475 AC133095.2 uncharacterized LOC545763 NCBIGene:545763 no expression
32282 ENSMUSG00000094855 AC133095.1 uncharacterized LOC620639 NCBIGene:620639 no expression
32283 ENSMUSG00000095019 AC234645.1 NaN NaN no expression
32284 ENSMUSG00000095041 AC149090.1 NaN NaN NaN

32285 rows × 5 columns

allen_ccf_list = abc_cache.get_directory_data(directory='Allen-CCF-2020', force_download=True)
print("Allen-CCF-2020 data files:\n\t", allen_ccf_list)
annotation_10.nii.gz: 100%|██████████| 27.5M/27.5M [00:00<00:00, 42.1MMB/s]
annotation_boundary_10.nii.gz: 100%|██████████| 27.4M/27.4M [00:00<00:00, 38.3MMB/s]
average_template_10.nii.gz: 100%|██████████| 343M/343M [00:07<00:00, 48.1MMB/s] 
Allen-CCF-2020 data files:
	 [PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/annotation_boundary_10.nii.gz'), PosixPath('/Users/chris.morrison/src/abc_download_root/image_volumes/Allen-CCF-2020/20230630/average_template_10.nii.gz')]

Skipping the file hashing check#

When a download is completed, a hash of the downloaded file is computed and checked against the expected hash in the manifest. While this check is recommeneded it can add overhead to the download process. skip_hash_check allows the user to skip computing the hash and assume the download has been completed successfully.

abc_cache.get_metadata_dataframe(directory='WMB-neighborhoods', file_name='UMAP20230830-TH-EPI-Glut', skip_hash_check=True)
UMAP20230830-TH-EPI-Glut.csv: 100%|██████████| 6.46M/6.46M [00:00<00:00, 16.1MMB/s]
cell_label x y
0 CTCACACTCGTAGATC-044_D01 -4.603476 -6.148670
1 CCGTACTCATCCAACA-036_D01 -4.817812 -6.366151
2 AGCGGTCCATGGGAAC-037_A01 -4.798783 -6.577992
3 GATGAGGCATGTTCCC-037_B01 -5.188138 -5.892220
4 TAGTGGTAGGCGACAT-037_B01 -4.715829 -6.606307
... ... ... ...
126166 TTTGTTGTCCGACATA-290_B01 10.042111 10.349521
126167 TTTGTTGTCGTCTACC-294_B05 -1.630137 9.033476
126168 TTTGTTGTCGTTCCTG-463_A05 -6.848272 12.645908
126169 TTTGTTGTCGTTGCCT-621_A02 -6.982306 14.718120
126170 TTTGTTGTCTTTCGAT-574_A02 -5.292696 6.804039

126171 rows × 3 columns

abc_cache.get_directory_metadata(directory='Allen-CCF-2020', skip_hash_check=True)
parcellation.csv: 100%|██████████| 41.2k/41.2k [00:00<00:00, 606kMB/s]
parcellation_term.csv: 100%|██████████| 177k/177k [00:00<00:00, 1.27MMB/s] 
parcellation_term_set.csv: 100%|██████████| 628/628 [00:00<00:00, 8.04kMB/s]
parcellation_term_set_membership.csv: 100%|██████████| 114k/114k [00:00<00:00, 956kMB/s]  
parcellation_term_with_counts.csv: 100%|██████████| 137k/137k [00:00<00:00, 908kMB/s]  
parcellation_to_parcellation_term_membership.csv: 100%|██████████| 680k/680k [00:00<00:00, 3.71MMB/s] 
parcellation_to_parcellation_term_membership_acronym.csv: 100%|██████████| 22.3k/22.3k [00:00<00:00, 96.8kMB/s]
parcellation_to_parcellation_term_membership_blue.csv: 100%|██████████| 16.4k/16.4k [00:00<00:00, 232kMB/s]
parcellation_to_parcellation_term_membership_color.csv: 100%|██████████| 30.5k/30.5k [00:00<00:00, 432kMB/s]
parcellation_to_parcellation_term_membership_green.csv: 100%|██████████| 16.5k/16.5k [00:00<00:00, 234kMB/s]
parcellation_to_parcellation_term_membership_name.csv: 100%|██████████| 75.8k/75.8k [00:00<00:00, 315kMB/s] 
parcellation_to_parcellation_term_membership_red.csv: 100%|██████████| 16.0k/16.0k [00:00<00:00, 215kMB/s]
[PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/parcellation.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/parcellation_term.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/parcellation_term_set.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/parcellation_term_set_membership.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_term_with_counts.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/parcellation_to_parcellation_term_membership.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_acronym.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_blue.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_color.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_green.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_name.csv'),
 PosixPath('/Users/chris.morrison/src/abc_download_root/metadata/Allen-CCF-2020/20230630/views/parcellation_to_parcellation_term_membership_red.csv')]