MERFISH whole mouse brain spatial transcriptomics (Xiaowei Zhuang)

MERFISH whole mouse brain spatial transcriptomics (Xiaowei Zhuang)#

A collection of in situ, spatially resolved transcriptomic profiles of individual cells in the whole mouse brain by multiplexed error-robust fluorescence in situ hybridization (MERFISH) consisting of ~9 million cells using a 1122 gene panel. We performed MERFISH imaging on 245 coronal and sagittal sections from four animal, obtained 9.3 million segmented cells that passed quality control, and integrated the MERFISH data from the four animals with the scRNA-seq data from the Allen Institute to classify cells. We applied a series of filters to select a subset of cells to be visualized on the ABC atlas. We first removed six fractured tissue slices and 9.1 million cells remained after this step. Then we aligned the spatial coordinates of the cells to the Allen-CCF-2020. For coronal slices that can be registered to the CCF, we used the CCF coordinates to define the coordinates of the center point of the midline and removed cells that substantially passed the midline in the other hemisphere (which has not been registered to the CCF). For the sagittal slices that can be registered to the CCF, we used the CCF coordinates to define the coordinates of the center point of the tissue and removed cells that substantially passed the posterior edge (which has not been registered to the CCF). For the 31 anterior and posterior coronal slices and 3 lateral sagittal slices that cannot be registered to the CCF, we manually aligned and oriented the slices. The x, y coordinates are experimentally measured coordinates after rotating and aligning the tissue slices to the CCF, and the z coordinates are estimated position of each tissue slice in the 3D Allen-CCF 2020 space along the slicing axis based on either the registration results (for slices that can be registered to CCF) or positions of the slices measured during tissue sectioning (for the slices that cannot be registered). The z position is set to zero when the estimated position becomes zero or negative. 8.4 million cells remained after this step. The cell-by-gene matrix of the 8.4 millions cells can be downloaded from the AWS bucket of this animal. We then filtered the cells by cell-classification (label transfer) confidence scores calculated during MERFISH-scRNAseq data integration. 7.0 million cells passed the confidence score threshold for cell subclass label transfer and 5.8 million cells further passed the confidence score threshold for cell cluster label transfer. These 5.8 million cells are included in the cell metadata file that can be downloaded from the the AWS bucket and are displayed on the ABC Atlas. The CCF coordinates of the 5.4 million cells that were registered to the 3D Allen-CCF can be downloaded from the CCF coordinate files in the AWB bucket. The collection spans four mouse specimens (2 coronal sets and 2 sagittal sets). Cells are mapped to the whole mouse brain taxonomy (WMB-taxonomy) and Allen Common Coordinate Framework (Allen-CCF-2020). Refer to Zhang et al, 2023 for more details.

import pandas as pd
from pathlib import Path
import numpy as np
import anndata
import time
import matplotlib.pyplot as plt

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requsted data on disk. For metadata, the cache can also directly serve a up a Pandas Dataframe. See the getting_started notebook for more details on using the cache including installing it if it has not already been.

Change the download_base variable to where you have downloaded the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(download_base)

abc_cache.current_manifest

'releases/20241130/manifest.json'

datasets = ['Zhuang-ABCA-1', 'Zhuang-ABCA-2', 'Zhuang-ABCA-3', 'Zhuang-ABCA-4']
example_section = {'Zhuang-ABCA-1': 'Zhuang-ABCA-1.079',
                   'Zhuang-ABCA-2': 'Zhuang-ABCA-2.037',
                   'Zhuang-ABCA-3': 'Zhuang-ABCA-3.010',
                   'Zhuang-ABCA-4': 'Zhuang-ABCA-4.002'}

Data overview#

Cell metadata#

Essential cell metadata is stored as a dataframe. Each row represents one cell indexed by a cell label.

Each cell is associated with a brain section label, donor label, donor genotype, donor sex and matrix_prefix identifying which data package this cell is part of. Each cell also has a set of x, y, z coordinates generated by rotating each section so that it is upright with midline approximatedly in the middle of the frame.

Each cell is mapped to the whole mouse brain taxonomy resulting in the assignment of a cluster alias and confidence scores.

cell = {}

for d in datasets :

    cell[d] = abc_cache.get_metadata_dataframe(
        directory=d,
        file_name='cell_metadata',
        dtype={"cell_label": str}
    )
    cell[d].set_index('cell_label', inplace=True)
    
    sdf = cell[d].groupby('brain_section_label')
    
    print(d,":","Number of cells = ", len(cell[d]), ", ", "Number of sections =", len(sdf))

cell_metadata.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 661M/661M [00:27<00:00, 23.7MMB/s]

Zhuang-ABCA-1 : Number of cells =  2846908 ,  Number of sections = 147

cell_metadata.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 286M/286M [00:11<00:00, 24.0MMB/s]

Zhuang-ABCA-2 : Number of cells =  1227408 ,  Number of sections = 66

cell_metadata.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 369M/369M [00:14<00:00, 24.8MMB/s]

Zhuang-ABCA-3 : Number of cells =  1585843 ,  Number of sections = 23

cell_metadata.csv: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 38.0M/38.0M [00:01<00:00, 20.3MMB/s]

Zhuang-ABCA-4 : Number of cells =  162578 ,  Number of sections = 3

cell[datasets[0]]

	brain_section_label	feature_matrix_label	donor_label	donor_genotype	donor_sex	cluster_alias	x	y	z	subclass_confidence_score	cluster_confidence_score	high_quality_transfer	abc_sample_id
cell_label
182941331246012878296807398333956011710	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	704	0.682522	3.366483	7.829530	0.969933	0.718088	True	79bda012-4dd4-43d7-8f66-1f29997f6780
221260934538535633595532020856387724686	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5243	0.667690	3.442241	7.829530	0.850554	0.850554	True	2f0b3159-2766-4f9e-a8cd-8dd16bae05fa
22228792606814781533240955623030943708	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	0.638731	3.474328	7.829530	0.888285	0.649581	True	fe1f5f73-5afb-4e51-b4f0-cf6690257086
272043042552227961220474294517855477150	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	0.653425	3.433218	7.829530	0.900000	0.607080	True	a13e1c1c-9828-4d3b-9aae-e2ab055a39ad
110116287883089187971185374239350249328	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5254	0.623896	3.513574	7.829530	0.999978	0.689511	True	add02bc8-456b-486c-9f13-db578c62cc5a
...	...	...	...	...	...	...	...	...	...	...	...	...	...
94310525370042131911495836073267655162	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5255	0.997247	3.823236	9.717769	0.971385	0.943456	True	3e178fe9-3440-4cf1-ab79-2192d25e3d02
298798481479578578007190103666214714353	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	1.043871	3.706231	9.717769	0.956377	0.905493	True	22cfcaf3-0c26-41b7-ab14-f76398fd18a4
330756942354980576352210203729462562749	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5261	1.037680	3.759890	9.717769	0.811520	0.788901	True	bc4eb252-c3fd-4ba0-9739-70eff08b29fe
47305871059582831548494138048361484565	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5252	1.044169	3.758463	9.717769	0.991924	0.634152	True	f450f6fa-667b-40bf-a865-000131b57d2c
64578198410898899234789748167671783948	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5263	1.042301	3.589421	9.717769	0.889413	0.645770	True	11b45e69-80fe-4cd7-9480-ce3a04a9b1c5

2846908 rows × 13 columns

Cluster annotation#

Read in the pivot table from the “cluster annotation tutorial” to associate each cell with terms at each cell type classification level and the corresponding color.

cluster_details = abc_cache.get_metadata_dataframe(
    directory='WMB-taxonomy',
    file_name='cluster_to_cluster_annotation_membership_pivoted',
    keep_default_na=False
)
cluster_details.set_index('cluster_alias', inplace=True)

cluster_colors = abc_cache.get_metadata_dataframe(
    directory='WMB-taxonomy',
    file_name='cluster_to_cluster_annotation_membership_color',
)
cluster_colors.set_index('cluster_alias', inplace=True)

cluster_details

	neurotransmitter	class	subclass	supertype	cluster
cluster_alias
1	Glut	01 IT-ET Glut	018 L2 IT PPP-APr Glut	0082 L2 IT PPP-APr Glut_3	0326 L2 IT PPP-APr Glut_3
2	Glut	01 IT-ET Glut	018 L2 IT PPP-APr Glut	0082 L2 IT PPP-APr Glut_3	0327 L2 IT PPP-APr Glut_3
3	Glut	01 IT-ET Glut	018 L2 IT PPP-APr Glut	0081 L2 IT PPP-APr Glut_2	0322 L2 IT PPP-APr Glut_2
4	Glut	01 IT-ET Glut	018 L2 IT PPP-APr Glut	0081 L2 IT PPP-APr Glut_2	0323 L2 IT PPP-APr Glut_2
5	Glut	01 IT-ET Glut	018 L2 IT PPP-APr Glut	0081 L2 IT PPP-APr Glut_2	0325 L2 IT PPP-APr Glut_2
...	...	...	...	...	...
34368	GABA-Glyc	27 MY GABA	288 MDRN Hoxb5 Ebf2 Gly-Gaba	1102 MDRN Hoxb5 Ebf2 Gly-Gaba_1	4955 MDRN Hoxb5 Ebf2 Gly-Gaba_1
34372	GABA-Glyc	27 MY GABA	285 MY Lhx1 Gly-Gaba	1091 MY Lhx1 Gly-Gaba_3	4901 MY Lhx1 Gly-Gaba_3
34374	GABA-Glyc	27 MY GABA	285 MY Lhx1 Gly-Gaba	1091 MY Lhx1 Gly-Gaba_3	4902 MY Lhx1 Gly-Gaba_3
34376	GABA-Glyc	27 MY GABA	285 MY Lhx1 Gly-Gaba	1091 MY Lhx1 Gly-Gaba_3	4903 MY Lhx1 Gly-Gaba_3
34380	GABA-Glyc	27 MY GABA	285 MY Lhx1 Gly-Gaba	1095 MY Lhx1 Gly-Gaba_7	4924 MY Lhx1 Gly-Gaba_7

5322 rows × 5 columns

cluster_colors

	neurotransmitter_color	class_color	subclass_color	supertype_color	cluster_color
cluster_alias
1	#2B93DF	#FA0087	#0F6632	#266DFF	#64661F
2	#2B93DF	#FA0087	#0F6632	#266DFF	#CCA73D
3	#2B93DF	#FA0087	#0F6632	#002BCC	#99000D
4	#2B93DF	#FA0087	#0F6632	#002BCC	#5C8899
5	#2B93DF	#FA0087	#0F6632	#002BCC	#473D66
...	...	...	...	...	...
34368	#820e57	#0096C7	#660038	#5CCCA4	#500099
34372	#820e57	#0096C7	#f20985	#976df9	#0F6627
34374	#820e57	#0096C7	#f20985	#976df9	#2E4799
34376	#820e57	#0096C7	#f20985	#976df9	#15FF00
34380	#820e57	#0096C7	#f20985	#FF2B26	#459988

5322 rows × 5 columns

cell_extended = {}

for d in datasets :
    cell_extended[d] = cell[d].join(cluster_details, on='cluster_alias')
    cell_extended[d] = cell_extended[d].join(cluster_colors, on='cluster_alias')

cell_extended[datasets[0]]

	brain_section_label	feature_matrix_label	donor_label	donor_genotype	donor_sex	cluster_alias	x	y	z	subclass_confidence_score	...	neurotransmitter	class	subclass	supertype	cluster	neurotransmitter_color	class_color	subclass_color	supertype_color	cluster_color
cell_label
182941331246012878296807398333956011710	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	704	0.682522	3.366483	7.829530	0.969933	...	GABA	06 CTX-CGE GABA	049 Lamp5 Gaba	0199 Lamp5 Gaba_1	0709 Lamp5 Gaba_1	#FF3358	#CCFF33	#FF764D	#DC00FF	#998900
221260934538535633595532020856387724686	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5243	0.667690	3.442241	7.829530	0.850554	...		33 Vascular	331 Peri NN	1191 Peri NN_1	5304 Peri NN_1	#666666	#858881	#82992E	#2F00CC	#BB1FCC
22228792606814781533240955623030943708	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	0.638731	3.474328	7.829530	0.888285	...		30 Astro-Epen	319 Astro-TE NN	1163 Astro-TE NN_3	5225 Astro-TE NN_3	#666666	#594a26	#3DCCB1	#a8afa5	#551799
272043042552227961220474294517855477150	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	0.653425	3.433218	7.829530	0.900000	...		30 Astro-Epen	319 Astro-TE NN	1163 Astro-TE NN_3	5225 Astro-TE NN_3	#666666	#594a26	#3DCCB1	#a8afa5	#551799
110116287883089187971185374239350249328	Zhuang-ABCA-1.089	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5254	0.623896	3.513574	7.829530	0.999978	...		33 Vascular	333 Endo NN	1193 Endo NN_1	5310 Endo NN_1	#666666	#858881	#994567	#00992A	#FFB473
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
94310525370042131911495836073267655162	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5255	0.997247	3.823236	9.717769	0.971385	...		33 Vascular	333 Endo NN	1193 Endo NN_1	5311 Endo NN_1	#666666	#858881	#994567	#00992A	#CC3D76
298798481479578578007190103666214714353	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	14939	1.043871	3.706231	9.717769	0.956377	...		30 Astro-Epen	319 Astro-TE NN	1163 Astro-TE NN_3	5225 Astro-TE NN_3	#666666	#594a26	#3DCCB1	#a8afa5	#551799
330756942354980576352210203729462562749	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5261	1.037680	3.759890	9.717769	0.811520	...		33 Vascular	330 VLMC NN	1188 VLMC NN_2	5301 VLMC NN_2	#666666	#858881	#653D66	#4D5CFF	#79CC5C
47305871059582831548494138048361484565	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5252	1.044169	3.758463	9.717769	0.991924	...		33 Vascular	333 Endo NN	1193 Endo NN_1	5309 Endo NN_1	#666666	#858881	#994567	#00992A	#5C9994
64578198410898899234789748167671783948	Zhuang-ABCA-1.110	Zhuang-ABCA-1	Zhuang-ABCA-1	wt/wt	F	5263	1.042301	3.589421	9.717769	0.889413	...		33 Vascular	330 VLMC NN	1187 VLMC NN_1	5298 VLMC NN_1	#666666	#858881	#653D66	#66391F	#3B9900

2846908 rows × 23 columns

The cell_extended dataframes are available in their respective Zhuang-ABCA-[1,2,3,4] directories as cell_metadata_with_cluster_annotation.

Gene panel#

All 4 datasets shares the same 1122 gene panel selected to enable faciliate the mapping to transcriptomically defined cell types taxonomies. Each gene is uniquely identifier by an Ensembl ID. It is best practice to gene identifier to for tracking and data interchange as gene symbols are not unique and can change over time.

Each row of the gene dataframe has Ensembl gene identifier, a gene symbol and name.

gene = abc_cache.get_metadata_dataframe(directory=datasets[0],
                                        file_name='gene')
gene.set_index('gene_identifier', inplace=True)
print("Number of genes = ", len(gene))
gene.head(5)

gene.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 84.7k/84.7k [00:00<00:00, 689kMB/s]

Number of genes =  1122

	gene_symbol	name	mapped_ncbi_identifier
gene_identifier
ENSMUSG00000024798	Htr7	5-hydroxytryptamine (serotonin) receptor 7	NCBIGene:15566
ENSMUSG00000042385	Gzmk	granzyme K	NCBIGene:14945
ENSMUSG00000036198	Arhgap36	Rho GTPase activating protein 36	NCBIGene:75404
ENSMUSG00000028780	Sema3c	sema domain, immunoglobulin domain (Ig), short...	NCBIGene:20348
ENSMUSG00000015843	Rxrg	retinoid X receptor gamma	NCBIGene:20183

Gene expression matrix#

Expression values for all sections are stored in an anndata h5ad format per brain with minimal metadata. There are two h5ad files one storing the raw counts and the other log normalization of it. In this next section, we provide example code on how to open the file and connect with the rich cell level metadata discussed above.

We define a small helper function plot sections to visualize the cells in anatomical context colorized by: neurotransmitter identity, cell types division, class and subclass.

def subplot_section(ax, xx, yy, cc = None, val = None, cmap = None) :
    
    if cmap is not None :
        ax.scatter(xx, yy, s=0.5, c=val, marker='.', cmap=cmap)
    elif cc is not None :
        ax.scatter(xx, yy, s=0.5, color=cc, marker='.')
    ax.set_ylim(11, 0)
    ax.set_xlim(0, 11)
    ax.axis('equal')
    ax.set_xticks([])
    ax.set_yticks([])

def plot_sections(cell_extended, example_section, cc = None, val = None, fig_width = 10, fig_height = 10, cmap = None) :
    
    fig, ax = plt.subplots(2, 2)
    fig.set_size_inches(fig_width, fig_height)
    
    for i, d in enumerate(cell_extended):
        
        pred = (cell_extended[d]['brain_section_label'] == example_section[d])
        section = cell_extended[d][pred] 
        
        if cmap is not None :
            subplot_section( ax.flat[i], section['x'], section['y'], val=section[val], cmap=cmap)
        elif cc is not None :
            subplot_section( ax.flat[i], section['x'], section['y'], section[cc])
            
        ax.flat[i].set_title(d)
        
    return fig, ax

fig, ax = plot_sections(cell_extended, example_section, 'neurotransmitter_color')
res = fig.suptitle('Neurotransmitter Identity', fontsize=14)
plt.show()

../_images/ad11a718de488467ebc989f84521b65c7b6b6b81deb3bd41fe802527cd0819e6.png

fig, ax = plot_sections(cell_extended, example_section, 'class_color')
res = fig.suptitle('Cell Type Classes', fontsize=14)
plt.show()

../_images/1ed1b3f1aa99f850d061f5d579c423641fce4e29a3aedb3b9dd191d0e8de491e.png

fig, ax = plot_sections(cell_extended, example_section, 'subclass_color')
res = fig.suptitle('Cell Type Subclasses', fontsize=14)
plt.show()

../_images/bd501a7a2bc84b9601a136ab00e681aa17214542e5cb89c3b474379fcd8aa064.png

Example use case#

In this section, we visualize the expression of nine canonical neurotransmitter transporter genes. To support these use cases, we will create a smaller submatrix (all cells and 9 genes) that read it into dataframe. Note this operation takes around 2-5 minutes.

gnames = ['Slc17a7', 'Slc17a6', 'Slc17a8', 'Slc32a1', 'Slc6a5', 'Slc6a3', 'Slc6a4']
pred = [x in gnames for x in gene.gene_symbol]
gene_filtered = gene[pred]
gene_filtered

	gene_symbol	name	mapped_ncbi_identifier
gene_identifier
ENSMUSG00000019935	Slc17a8	solute carrier family 17 (sodium-dependent ino...	NCBIGene:216227
ENSMUSG00000021609	Slc6a3	solute carrier family 6 (neurotransmitter tran...	NCBIGene:13162
ENSMUSG00000037771	Slc32a1	solute carrier family 32 (GABA vesicular trans...	NCBIGene:22348
ENSMUSG00000039728	Slc6a5	solute carrier family 6 (neurotransmitter tran...	NCBIGene:104245
ENSMUSG00000070570	Slc17a7	solute carrier family 17 (sodium-dependent ino...	NCBIGene:72961
ENSMUSG00000020838	Slc6a4	solute carrier family 6 (neurotransmitter tran...	NCBIGene:15567
ENSMUSG00000030500	Slc17a6	solute carrier family 17 (sodium-dependent ino...	NCBIGene:140919

abc_cache.list_data_files('Zhuang-ABCA-2')

['Zhuang-ABCA-2/log2', 'Zhuang-ABCA-2/raw']

cell_expression = {}

for d in datasets:    
    file = abc_cache.get_data_path(directory=d, file_name=f"{d}/log2")
    
    adata = anndata.read_h5ad(file, backed='r')
    
    start = time.process_time()
    gdata = adata[:, gene_filtered.index].to_df()
    gdata.columns = gene_filtered.gene_symbol
    cell_expression[d] = cell_extended[d].join(gdata)
    
    print(d,"-","time taken: ", time.process_time() - start)
    
    adata.file.close()
    del adata

Zhuang-ABCA-1-log2.h5ad: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.13G/2.13G [01:12<00:00, 29.2MMB/s]

Zhuang-ABCA-1 - time taken:  26.463893791000004

Zhuang-ABCA-2-log2.h5ad: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 871M/871M [00:35<00:00, 24.9MMB/s]

Zhuang-ABCA-2 - time taken:  11.567487181999994

Zhuang-ABCA-3-log2.h5ad: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.16G/1.16G [00:43<00:00, 26.8MMB/s]

Zhuang-ABCA-3 - time taken:  12.91951303800002

Zhuang-ABCA-4-log2.h5ad: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 107M/107M [00:04<00:00, 24.0MMB/s]

Zhuang-ABCA-4 - time taken:  1.3839122699999962

Visualize genes Slc17a7, Slc17a6, Slc32a1 for an example section from each brain

fig, ax = plot_sections(cell_expression, example_section, val='Slc17a7', cmap=plt.cm.magma_r)
res = fig.suptitle('Gene Slc17a7', fontsize=14)
plt.show()

../_images/0f06f1de9b86153a5882f49515c0da494117fac2180c98cf5f305a19b54fac83.png

fig, ax = plot_sections(cell_expression, example_section, val='Slc17a6', cmap=plt.cm.magma_r)
res = fig.suptitle('Gene Slc17a6', fontsize=14)
plt.show()

../_images/6a60a011ac3d9990e0669f74982c1e397c79a1f585e735b9f12db25c23c9164a.png

fig, ax = plot_sections(cell_expression, example_section, val='Slc32a1', cmap=plt.cm.magma_r)
res = fig.suptitle('Gene Slc32a1', fontsize=14)
plt.show()

../_images/3e7a476546fdb272a9c75105237fc51f14aace4111ddbc99d06983f580423428.png

CCF registration and parcellation annotation#

Each brain specimen has been registered to Allen CCFv3 atlas, resulting in an x, y, z coordinates and parcellation_index for each cell.

ccf_coordinates = {}

for d in datasets :

    ccf_coordinates[d] = abc_cache.get_metadata_dataframe(directory=f"{d}-CCF", file_name='ccf_coordinates')
    ccf_coordinates[d].set_index('cell_label', inplace=True)
    ccf_coordinates[d].rename(columns={'x': 'x_ccf',
                                       'y': 'y_ccf',
                                       'z': 'z_ccf'},
                              inplace=True)
    
    cell_extended[d] = cell_extended[d].join(ccf_coordinates[d], how='inner')

ccf_coordinates.csv: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 221M/221M [00:08<00:00, 25.4MMB/s]
ccf_coordinates.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 89.1M/89.1M [00:03<00:00, 23.9MMB/s]
ccf_coordinates.csv: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 132M/132M [00:04<00:00, 26.5MMB/s]
ccf_coordinates.csv: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13.8M/13.8M [00:00<00:00, 18.4MMB/s]

ccf_coordinates[datasets[0]]

	x_ccf	y_ccf	z_ccf	parcellation_index
cell_label
182941331246012878296807398333956011710	7.902190	3.048426	0.582962	0
221260934538535633595532020856387724686	7.906513	3.145200	0.577602	0
22228792606814781533240955623030943708	7.906110	3.182761	0.553731	0
272043042552227961220474294517855477150	7.904627	3.131808	0.563525	0
110116287883089187971185374239350249328	7.907236	3.230647	0.543048	0
...	...	...	...	...
94310525370042131911495836073267655162	9.681244	4.453979	0.852027	0
298798481479578578007190103666214714353	9.676999	4.291647	0.899531	1109
330756942354980576352210203729462562749	9.678760	4.363282	0.894082	1109
47305871059582831548494138048361484565	9.678641	4.360346	0.901195	1109
64578198410898899234789748167671783948	9.673530	4.138034	0.897311	1109

2616328 rows × 4 columns

Read in the pivot table from the “parcellation annotation tutorial” to associate each cell with terms at each anatomical parcellation level and the corresponding color.

parcellation_annotation = abc_cache.get_metadata_dataframe(directory="Allen-CCF-2020",
                                                           file_name='parcellation_to_parcellation_term_membership_acronym')
parcellation_annotation.set_index('parcellation_index', inplace=True)
parcellation_annotation.columns = ['parcellation_%s'% x for x in  parcellation_annotation.columns]

parcellation_color = abc_cache.get_metadata_dataframe(directory="Allen-CCF-2020",
                                                      file_name='parcellation_to_parcellation_term_membership_color')
parcellation_color.set_index('parcellation_index', inplace=True)
parcellation_color.columns = ['parcellation_%s'% x for x in  parcellation_color.columns]

for d in datasets :
    cell_extended[d] = cell_extended[d].join(parcellation_annotation, on='parcellation_index')
    cell_extended[d] = cell_extended[d].join(parcellation_color, on='parcellation_index')   

fig, ax = plot_sections(cell_extended, example_section, 'parcellation_division_color')
res = fig.suptitle('Parcellation - division', fontsize=14)
plt.show()

../_images/78eb7d54e767ec71232cb24f443170b8872747d5adcc7c7eea19a8c1b2f9886a.png

fig, ax = plot_sections(cell_extended, example_section, 'parcellation_structure_color')
res = fig.suptitle('Parcellation - structure', fontsize=14)
plt.show()

../_images/c77d24d333f361acdf51b441eb6242c61d7ddadf1ae0a82615bf72e518790aa4.png

fig, ax = plot_sections(cell_extended, example_section, 'parcellation_substructure_color')
res = fig.suptitle('Parcellation - substructure', fontsize=14)
plt.show()

../_images/5a93b6ce889d13e09c2159f16ea16c0b424ebece00ba4a8c660e601132bfa923.png