ASAP Human Postmortem-Derived Brain Sequencing Collection (PMDBS): Data Overview

ASAP Human Postmortem-Derived Brain Sequencing Collection (PMDBS): Data Overview#

Aligning Science Across Parkinson’s (ASAP), The Michael J. Fox Foundation for Parkinson’s Research (MJFF), and the Allen Institute for Brain Science (AIBS) are teaming up to further the mission of the ASAP Collaborative Research Network (CRN) program, to accelerate discoveries in the Parkinson’s disease (PD) and neurodegenerative disease research communities. Together we will annotate, enhance and add knowledge to the growing data catalog in the ASAP CRN Cloud through integration of cell type taxonomies using the Allen Institute’s MapMyCells tool and visualization through the Allen Brain Cell (ABC) Atlas web application. This integration of data and knowledge will allow users to visualize and explore the changes in gene expression of specific, highly resolved brain cell types in the context of a large PD cohort of donors.

This initial collaboration focuses on the Human Postmortem-derived Brain Sequencing Collection (PMDBS), a harmonized repository comprised of single nucleus and PolyA RNA-seq data contributed by five ASAP CRN teams (Hafler, Lee, Jakobsson, Scherzer, Hardy). Sequencing data were uniformly aligned to the GRCh38.p13 reference genome (Gencode V32), quality control was performed and low-quality cells were filtered out. A set of highly variable genes were identified and the scVI workflow resulted in an integrated latent variable representation, 2D UMAP coordinates and a set of 30 clusters. Currently, the repository spans roughly 3 millions cells obtained from 9 brain regions and 211 donors with various pathologies (including healthy control). For more details on this dataset and to access the raw data used in its preparation, please visit the ASAP CRN Cloud webpage.

In this notebook we assemble the metadata associated with individual cells derived from donor, sample and data integration process information.

You need to be connected to the internet to run this notebook and should have run through the getting started notebook.

import anndata
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pathlib import Path

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requested data on disk. For metadata, the cache can also directly serve up a Pandas DataFrame. See the getting started notebook notebook for more details on using the cache including installing it if it has not already been.

Change the download_base variable to where you have downloaded the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(download_base)

abc_cache.current_manifest

'releases/20250331/manifest.json'

Data Overview#

ASAP Data#

We simplify the ASAP-PMDB metadata into a single cell_metadata.csv file and companion tables describing the donor (including sex, age, race, disease state etc.), sample (including data processing metadata, team information, and region of interest from which the sample was dissected from), and value_sets.csv which provides a mapping from unique terms to their plotting color and term order.

We’ll first load the cell metadata. This provides the UMAP x, y coordinates and clusters provided by ASAP. Refer to the collection documentation for description of the data processing and source datasets.

cell_metadata = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='cell_metadata'
).set_index('cell_label')
print("Number of cells = ", len(cell_metadata))
cell_metadata.head()

cell_metadata.csv: 100%|███████████████████████████████████████████████████████████████████████| 555M/555M [01:43<00:00, 5.37MMB/s]

Number of cells =  2796736

	cell_barcode	sample_label	x	y	cluster_label	cluster_label_order	cluster_label_color	dataset_label	feature_matrix_label	abc_sample_id
cell_label
AAACCCAAGAAACCAT-1_ASAP_PMBDS_000060_s002_Rep1	AAACCCAAGAAACCAT-1	ASAP_PMBDS_000060_s002	0.016373	1.403025	cluster_000	1	#171c97	ASAP-PMDBS-10X	ASAP-PMDBS-10X	cc633260-614c-4e5c-aa18-d3e3a0ef81bb
AAACCCAAGAAACTCA-1_ASAP_PMBDS_000088_s001_Rep1	AAACCCAAGAAACTCA-1	ASAP_PMBDS_000088_s001	10.029368	1.342043	cluster_013	14	#aa5ed0	ASAP-PMDBS-10X	ASAP-PMDBS-10X	4b4f4355-6463-4bf2-9544-3f6d493de741
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000177_s001_rep1	AAACCCAAGAAAGCGA-1	ASAP_PMBDS_000177_s001	-9.057206	-1.032092	cluster_006	7	#f591a5	ASAP-PMDBS-10X	ASAP-PMDBS-10X	0ea675f3-4b39-4c16-9518-66ddd9c409a1
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000185_s001_rep1	AAACCCAAGAAAGCGA-1	ASAP_PMBDS_000185_s001	-0.685645	-4.943793	cluster_001	2	#e4a9ba	ASAP-PMDBS-10X	ASAP-PMDBS-10X	0eb0506b-e76b-4f9a-a3b6-bc6bbccfdd32
AAACCCAAGAAATCCA-1_ASAP_PMBDS_000122_s002_1	AAACCCAAGAAATCCA-1	ASAP_PMBDS_000122_s002	-3.507261	13.855699	cluster_002	3	#a02226	ASAP-PMDBS-10X	ASAP-PMDBS-10X	1f67fc2a-7e81-4448-8437-e2c6ef8b4b1b

We can use pandas groupby function to see how many unique items are associated for each field and list them out if the number of items is small.

def print_column_info(df):
    
    for c in df.columns:
        grouped = df[[c]].groupby(c).count()
        members = ''
        if len(grouped) < 30:
            members = str(list(grouped.index))
        print("Number of unique %s = %d %s" % (c, len(grouped), members))

print_column_info(cell_metadata)

Number of unique cell_barcode = 2040051 
Number of unique sample_label = 383 
Number of unique x = 2735345 
Number of unique y = 2711023 
Number of unique cluster_label = 30 
Number of unique cluster_label_order = 30 
Number of unique cluster_label_color = 30 
Number of unique dataset_label = 1 ['ASAP-PMDBS-10X']
Number of unique feature_matrix_label = 1 ['ASAP-PMDBS-10X']
Number of unique abc_sample_id = 2796736 

Nex we’ll load and join the sample metadata. This contains information about the individual tissue samples that make up the combined dataset. This includes which ASAP Team collected and processed the cells, the technique used to measure the gene expression, and the region of interest in the brain the sample came from. The sample table is a version of the Sample metadata shown here with some post processing to disambiguate the data.

sample = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='sample'
).set_index('sample_label')
sample.head()

sample.csv: 100%|█████████████████████████████████████████████████████████████████████████████| 50.1k/50.1k [00:00<00:00, 695kMB/s]

	donor_label	source_dataset_label	source_dataset_label_order	source_dataset_label_color	technique	technique_order	technique_color	region_of_interest_label	region_of_interest_label_order	region_of_interest_label_color
sample_label
ASAP_PMBDS_000001_s001	ASAP_PMBDS_000001	team_lee_sn_rnaseq	1	#e41a1c	v1	1	#03738C	hippocampus	6	#bfb5d5
ASAP_PMBDS_000001_s002	ASAP_PMBDS_000001	team_lee_sn_rnaseq	1	#e41a1c	v1	1	#03738C	middle frontal gyrus	1	#d4b235
ASAP_PMBDS_000001_s003	ASAP_PMBDS_000001	team_lee_sn_rnaseq	1	#e41a1c	v1	1	#03738C	substantia nigra	9	#6ca9bf
ASAP_PMBDS_000002_s001	ASAP_PMBDS_000002	team_lee_sn_rnaseq	1	#e41a1c	v1	1	#03738C	hippocampus	6	#bfb5d5
ASAP_PMBDS_000002_s002	ASAP_PMBDS_000002	team_lee_sn_rnaseq	1	#e41a1c	v1	1	#03738C	middle frontal gyrus	1	#d4b235

We join the sample information into the full cell metadata table.

cell_metadata = cell_metadata.join(sample, on='sample_label')

Next, we load the donor table. This contains information on the donors (age, sex, race, pathology, etc.) that make up the study and is a processed copy of the Donor table from the same ASAP data description page linked above.

donor = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='donor'
).set_index('donor_label')
donor.head()

donor.csv: 100%|██████████████████████████████████████████████████████████████████████████████| 50.3k/50.3k [00:00<00:00, 557kMB/s]

	donor_race	donor_race_order	donor_race_color	donor_sex	donor_sex_order	donor_sex_color	primary_diagnosis	primary_diagnosis_order	primary_diagnosis_color	age_at_death	...	cerad_score_color	cognitive_status	cognitive_status_order	cognitive_status_color	lewy_body_disease_pathology	lewy_body_disease_pathology_order	lewy_body_disease_pathology_color	thal_phase	thal_phase_order	thal_phase_color
donor_label
ASAP_PMBDS_000001	White	2	#38CAE8	Male	2	#ADC4C3	No PD nor other neurological disorder	2	#99d594	78 - 89 yrs	...	#e8e8e8	Normal	1	#4daf4a	Olfactory bulb only	2	#edf8b1	Unknown	8	#e8e8e8
ASAP_PMBDS_000002	White	2	#38CAE8	Male	2	#ADC4C3	Other neurological disorder	3	#e6f598	78 - 89 yrs	...	#e8e8e8	Normal	1	#4daf4a	Brainstem predominant	4	#7fcdbb	Thal 0	1	#f1eef6
ASAP_PMBDS_000003	White	2	#38CAE8	Male	2	#ADC4C3	Idiopathic Parkinson's disease	7	#d53e4f	< 65 yrs	...	#e8e8e8	Mild Cognitive Impairment	2	#377eb8	Limbic (transitional)	7	#225ea8	Thal 0	1	#f1eef6
ASAP_PMBDS_000004	White	2	#38CAE8	Male	2	#ADC4C3	Idiopathic Parkinson's disease	7	#d53e4f	78 - 89 yrs	...	#e8e8e8	Mild Cognitive Impairment	2	#377eb8	Neocortical	8	#253494	Unknown	8	#e8e8e8
ASAP_PMBDS_000005	White	2	#38CAE8	Male	2	#ADC4C3	Idiopathic Parkinson's disease	7	#d53e4f	78 - 89 yrs	...	#4daf4a	Normal	1	#4daf4a	Neocortical	8	#253494	Unknown	8	#e8e8e8

5 rows × 30 columns

cell_metadata = cell_metadata.join(donor, on='donor_label')

Let’s print out information on our final table.

print_column_info(cell_metadata)

Number of unique cell_barcode = 2040051 
Number of unique sample_label = 383 
Number of unique x = 2735345 
Number of unique y = 2711023 
Number of unique cluster_label = 30 
Number of unique cluster_label_order = 30 
Number of unique cluster_label_color = 30 
Number of unique dataset_label = 1 ['ASAP-PMDBS-10X']
Number of unique feature_matrix_label = 1 ['ASAP-PMDBS-10X']
Number of unique abc_sample_id = 2796736 
Number of unique donor_label = 211 
Number of unique source_dataset_label = 5 ['team_hafler_sn_rnaseq_pfc', 'team_hardy_sn_rnaseq', 'team_jakobsson_sn_rnaseq', 'team_lee_sn_rnaseq', 'team_scherzer_sn_rnaseq_mtg']
Number of unique source_dataset_label_order = 5 [1, 2, 3, 4, 5]
Number of unique source_dataset_label_color = 5 ['#377eb8', '#4daf4a', '#984ea3', '#e41a1c', '#ff7f00']
Number of unique technique = 3 ['v1', 'v3.1 - Dual Index', 'v3.1 - Single Index']
Number of unique technique_order = 3 [1, 2, 3]
Number of unique technique_color = 3 ['#03738C', '#F27329', '#F2B591']
Number of unique region_of_interest_label = 9 ['amygdaloid complex', 'anterior cingulate gyrus', 'hippocampus', 'inferior parietal lobule', 'middle frontal gyrus', 'middle temporal gyrus', 'prefrontal cortex', 'putamen', 'substantia nigra']
Number of unique region_of_interest_label_order = 9 [1, 2, 3, 4, 5, 6, 7, 8, 9]
Number of unique region_of_interest_label_color = 9 ['#6ca9bf', '#a8b485', '#bfb5d5', '#c9e2b1', '#d38e32', '#d4b235', '#d55c92', '#eee0a8', '#efb9be']
Number of unique donor_race = 3 ['Black or African American', 'Unknown', 'White']
Number of unique donor_race_order = 3 [1, 2, 3]
Number of unique donor_race_color = 3 ['#38CAE8', '#8F3A5C', '#e8e8e8']
Number of unique donor_sex = 2 ['Female', 'Male']
Number of unique donor_sex_order = 2 [1, 2]
Number of unique donor_sex_color = 2 ['#565353', '#ADC4C3']
Number of unique primary_diagnosis = 7 ["Alzheimer's disease", 'Healthy Control', "Idiopathic Parkinson's disease", 'No PD nor other neurological disorder', 'Other neurological disorder', "Parkinson's disease", "Prodromal motor Parkinson's disease"]
Number of unique primary_diagnosis_order = 7 [1, 2, 3, 4, 5, 6, 7]
Number of unique primary_diagnosis_color = 7 ['#3288bd', '#99d594', '#d53e4f', '#e6f598', '#fc8d59', '#fee08b', '#ffffbf']
Number of unique age_at_death = 4 ['65 - 77 yrs', '78 - 89 yrs', '90+ yrs', '< 65 yrs']
Number of unique age_at_death_order = 4 [1, 2, 3, 4]
Number of unique age_at_death_color = 4 ['#2b8cbe', '#7bccc4', '#bae4bc', '#f0f9e8']
Number of unique apoe4_status = 7 ['Unknown', 'e2/e2', 'e2/e3', 'e2/e4', 'e3/e3', 'e3/e4', 'e4/e4']
Number of unique apoe4_status_order = 7 [1, 2, 3, 4, 5, 6, 7]
Number of unique apoe4_status_color = 7 ['#1b9e77', '#66a61e', '#7570b3', '#d95f02', '#e6ab02', '#e7298a', '#e8e8e8']
Number of unique braak_stage = 7 ['Braak 0', 'Braak I', 'Braak II', 'Braak III', 'Braak IV', 'Braak VI', 'Unknown']
Number of unique braak_stage_order = 6 [1, 2, 3, 4, 5, 7]
Number of unique braak_stage_color = 7 ['#91003f', '#c994c7', '#d4b9da', '#df65b0', '#e7298a', '#e8e8e8', '#f1eef6']
Number of unique cerad_score = 4 ['Frequent', 'Moderate', 'Sparse', 'Unknown']
Number of unique cerad_score_order = 4 [1, 2, 3, 4]
Number of unique cerad_score_color = 4 ['#377eb8', '#4daf4a', '#e41a1c', '#e8e8e8']
Number of unique cognitive_status = 4 ['Dementia', 'Mild Cognitive Impairment', 'Normal', 'Unknown']
Number of unique cognitive_status_order = 4 [1, 2, 3, 4]
Number of unique cognitive_status_color = 4 ['#377eb8', '#4daf4a', '#e41a1c', '#e8e8e8']
Number of unique lewy_body_disease_pathology = 10 ['Absent', 'Amygdala predominant', 'Brainstem predominant', 'Brainstem/Limbic', 'Diffuse, neocortical (brainstem, limbic and neocortical involvement)', 'Limbic (transitional)', 'Limbic predominant', 'Neocortical', 'Olfactory bulb only', 'Unknown']
Number of unique lewy_body_disease_pathology_order = 10 [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Number of unique lewy_body_disease_pathology_color = 10 ['#081d58', '#1d91c0', '#225ea8', '#253494', '#41b6c4', '#7fcdbb', '#c7e9b4', '#e8e8e8', '#edf8b1', '#ffffd9']
Number of unique thal_phase = 8 ['Thal 0', 'Thal 1', 'Thal 2', 'Thal 3', 'Thal 4', 'Thal 4/5', 'Thal 5', 'Unknown']
Number of unique thal_phase_order = 8 [1, 2, 3, 4, 5, 6, 7, 8]
Number of unique thal_phase_color = 8 ['#91003f', '#c994c7', '#ce1256', '#d4b9da', '#df65b0', '#e7298a', '#e8e8e8', '#f1eef6']

Finally, we’ll load the value_sets table. This is mainly a mapping between the different sets of terms (e.g. regions of interest, cell types, age ranges), providing unique colors for each term in a set and an order of the set. We’ll primarily use it as a lookup table to order or different terms in plot legends.

value_sets = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='value_sets'
).set_index('label')
value_sets.head()

value_sets.csv: 100%|████████████████████████████████████████████████████████████████████████| 4.25k/4.25k [00:00<00:00, 71.3kMB/s]

	field	order	color_hex_triplet
label
v1	technique	1	#03738C
v3.1 - Dual Index	technique	3	#F27329
v3.1 - Single Index	technique	2	#F2B591
cluster_000	cluster_label	1	#171c97
cluster_001	cluster_label	2	#e4a9ba

UMAP Plotting#

We define convenience function to plot the Uniform Manifold Approximation and Projection (UMAP) of the ASAP PMDB data.

def plot_umap(
    xx,
    yy,
    cc=None,
    val=None,
    fig_width=8,
    fig_height=8,
    cmap=None,
    labels=None,
    term_order_lookup=None,
    colorbar=False,
    sizes=None
):
    """
    """
    if sizes is None:
        sizes = 1
    fig, ax = plt.subplots()
    fig.set_size_inches(fig_width, fig_height)

    if cmap is not None:
        scatt = ax.scatter(xx, yy, c=val, s=0.5, marker='.', cmap=cmap, alpha=sizes)
    elif cc is not None:
        scatt = ax.scatter(xx, yy, c=cc, s=0.5, marker='.', alpha=sizes)

    if labels is not None:
        from matplotlib.patches import Rectangle
        unique_labels = labels.unique()
        unique_colors = cc.unique()

        if term_order_lookup is not None:
            term_order = np.argsort(term_order_lookup.loc[unique_labels, 'order'])
            unique_labels = unique_labels[term_order]
            unique_colors = unique_colors[term_order]
            
        rects = []
        for color in unique_colors:
            rects.append(Rectangle((0, 0), 1, 1, fc=color))

        legend = ax.legend(rects, unique_labels, loc=0)
        # ax.add_artist(legend)

    if colorbar:
        fig.colorbar(scatt, ax=ax)
    
    return fig, ax

Data Acquisition Metadata#

Below we plot various information associated with the collection of the data. This includes which of the five teams collected the data, the 10X chemistry version used, and the region of interest the tissue was extracted from. Note, throughout the presentation of these data, that some values are missing for collections of cells. This is due to the value not being present for these cells and samples in the data. These are marked as Unknown.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['source_dataset_label_color'],
    labels=cell_metadata['source_dataset_label'],
    term_order_lookup=value_sets,
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Source Dataset")
plt.show()

../_images/5f11a1f5ee6445c398f4017e4732f8fce6ac68139c06a3bd8389b96484ea6853.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['technique_color'],
    labels=cell_metadata['technique'],
    term_order_lookup=value_sets,
    fig_width=12,
    fig_height=12
)
res = ax.set_title("10X Chemistry/Technique")
plt.show()

../_images/f943275078aa1547b8be84c6dcc45a396734485a1f2aeea0b117755a7fc5c674.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['region_of_interest_label_color'],
    labels=cell_metadata['region_of_interest_label'],
    term_order_lookup=value_sets,
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Region of Interest")
plt.show()

../_images/08b732e9d3e70bf6fc499deb7a04eaeb7284271a2bf89d258d5797f0c21a89e3.png

Donor Metadata#

Below we plot various information on the donor where these samples/cells originated.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['donor_race_color'],
    labels=cell_metadata['donor_race'],
    term_order_lookup=value_sets[value_sets['field'] == 'donor_race'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Donor Race")
plt.show()

../_images/90016a9ebd63abd1c8f1ec865ffa26fa1a2d47e6c1dbcac1090c3ae4bfceb9af.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['donor_sex_color'],
    labels=cell_metadata['donor_sex'],
    term_order_lookup=value_sets[value_sets['field'] == 'donor_sex'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Donor Sex")
plt.show()

../_images/2bde98c236d3a3311ba247bc415e9a7ac4b0e5dd5edffbe1e287eeb16bbe4164.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['age_at_death_color'],
    labels=cell_metadata['age_at_death'],
    term_order_lookup=value_sets[value_sets['field'] == 'age_at_death'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Donor Age At Death")
plt.show()

../_images/ee800add4b3b252c4f04ceebc7468e2336aa1b33109a8c1ec40d1f6bb204468c.png

Clinical Pathology#

Below are various diagnoses and pathologies identified in the donor including healthy controls.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['primary_diagnosis_color'],
    labels=cell_metadata['primary_diagnosis'],
    term_order_lookup=value_sets[value_sets['field'] == 'primary_diagnosis'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Primary Diagnosis")
plt.show()

../_images/d2ba2b024b12008d3d7d9d4611f961cd5a2518e366792a9430f76f9526bf5c84.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['cognitive_status_color'],
    labels=cell_metadata['cognitive_status'],
    term_order_lookup=value_sets[value_sets['field'] == 'cognitive_status'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Cognitive Status")
plt.show()

../_images/d7f03a2f72dfce089c1d3bf975d2af876746c40483c9bcdc4bbf3e9bbff56851.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['lewy_body_disease_pathology_color'],
    labels=cell_metadata['lewy_body_disease_pathology'],
    term_order_lookup=value_sets[value_sets['field'] == 'lewy_body_disease_pathology'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Lewy Body Disease Pathology")
plt.show()

../_images/13610cf75896b3166e47cc5034ffb6e7263f9ec18aab3c24e5810ddaab7b2411.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['apoe4_status_color'],
    labels=cell_metadata['apoe4_status'],
    term_order_lookup=value_sets[value_sets['field'] == 'apoe4_status'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Apoe4 Status")
plt.show()

../_images/346f4a0f4b4a7db8bfe4c6a015a9757b91c99523973a5f36ed85ef35866f59dd.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['braak_stage_color'],
    labels=cell_metadata['braak_stage'],
    term_order_lookup=value_sets[value_sets['field'] == 'braak_stage'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Braak Stage")
plt.show()

../_images/63c497e418f9da6ab03144374ee35b138c362e2b4ddedfeafad9fb9d5d82db6c.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['cerad_score_color'],
    labels=cell_metadata['cerad_score'],
    term_order_lookup=value_sets[value_sets['field'] == 'cerad_score'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Cerad Score")
plt.show()

../_images/e3939feb6faa74d91f14e0d0716c1419d6e65e206c1b2af7f5f1b43b62483543.png

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['thal_phase_color'],
    labels=cell_metadata['thal_phase'],
    term_order_lookup=value_sets[value_sets['field'] == 'thal_phase'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Thal Phase")
plt.show()

../_images/3b3368ff297b95f1a243aea4a1ed9eb5cf29e054092e677cbdb4e8e994ab4b3c.png

Leiden scVI Clusters#

As part of the ASAP harmonization, scVI was used to generate a latent space representation, 2D UMAP coordinates and set of 30 clusters using their implementation of the Leiden algorithm.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['cluster_label_color'],
    fig_width=12,
    fig_height=12
)
res = ax.set_title("Cluster Label")
plt.show()

../_images/64952c086dd3af6c4aa109c073395eee1a13c681404520662458c6dfd8362793.png

In the next notebook, we demonstrate the usage of Allen Institute’s MapMyCells tool to map cells from the ASAP PMDBS dataset to the whole human brain taxonomy (Siletti et al.) to provide cell type annotation and insights to the Leiden scVI clusters above.

We’ll also explore gene expression in the UMAP in a later notebook.