ASAP Human Postmortem-Derived Brain Sequencing Collection (PMDBS): Mapping to the SEA-AD Taxonomy#

In the previous notebooks, we provided an overview of the ASAP-PMDBS dataset and demonstrated using MapMyCells to provide cell type insights to the dataset by mapping to the 10X Whole Human Brain Taxonomy. In this notebook, we will additionally map the PMDBS cells to the 10X MTG SEA-AD taxonomy using hierarchical correlation mapping method and walkthrough additional cell types insights that can be glean.

The Seattle Alzheimer’s Disease Brain Cell Atlas (SEA-AD) is a consortium focused on gaining a deep molecular and cellular understanding of the early pathogenesis of Alzheimer’s Disease. The SEA-AD taxonomy is a high-resolution transcriptomic cell type atlas from middle temporal gyrus (MTG) from the SEA-AD aged human cohort that spans the spectrum of Alzheimer’s disease. The atlas was created from a single nucleus RNA-sequencing (snRNAseq) and multiomics dataset of roughly 2.0 million cells profiled (~1.4 million cells after QC) from 84 aged donors. The atlas is hierarchically organized into nested levels of classification: 3 classes, 24 subclasses, and 139 supertypes.

You need to be connected to the internet to run this notebook and should have run through the getting started notebook.

import anndata
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from pathlib import Path

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requested data on disk. For metadata, the cache can also directly serve a up a Pandas DataFrame. See the getting started notebook notebook for more details on using the cache including installing it if it has not already been.

Change the download_base variable to where you have downloaded the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(download_base)

abc_cache.current_manifest
'releases/20250331/manifest.json'

Data Overview#

ASAP Data#

We simplify the ASAP PMDB metadata into a single cell_metadata, csv file and there companion tables describing the donor (including sex, age, race, disease state etc.), sample (including data processing metadata, team information, and dissection or region of interest the sample is from), and value_sets which provides a mapping from unique term name (cell type, region of interest etc.) to a plotting color and order.

cell_metadata = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='cell_metadata'
).set_index('cell_label')
donor = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='donor'
).set_index('donor_label')
sample = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='sample'
).set_index('sample_label')
value_sets = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-10X',
    file_name='value_sets'
).set_index('label')

As with our previous tutorials, we’ll first use a Pandas join to merge the sample data into the cell metadata and then merge the donor information. We’ll use the value_sets as a lookup.

cell_metadata = cell_metadata.join(sample, on='sample_label')
cell_metadata = cell_metadata.join(donor, on='donor_label')
cell_metadata.head()
cell_barcode sample_label x y cluster_label cluster_label_order cluster_label_color dataset_label feature_matrix_label abc_sample_id ... cerad_score_color cognitive_status cognitive_status_order cognitive_status_color lewy_body_disease_pathology lewy_body_disease_pathology_order lewy_body_disease_pathology_color thal_phase thal_phase_order thal_phase_color
cell_label
AAACCCAAGAAACCAT-1_ASAP_PMBDS_000060_s002_Rep1 AAACCCAAGAAACCAT-1 ASAP_PMBDS_000060_s002 0.016373 1.403025 cluster_000 1 #171c97 ASAP-PMDBS-10X ASAP-PMDBS-10X cc633260-614c-4e5c-aa18-d3e3a0ef81bb ... #4daf4a Unknown 4 #e8e8e8 Absent 1 #ffffd9 Thal 3 4 #df65b0
AAACCCAAGAAACTCA-1_ASAP_PMBDS_000088_s001_Rep1 AAACCCAAGAAACTCA-1 ASAP_PMBDS_000088_s001 10.029368 1.342043 cluster_013 14 #aa5ed0 ASAP-PMDBS-10X ASAP-PMDBS-10X 4b4f4355-6463-4bf2-9544-3f6d493de741 ... #4daf4a Unknown 4 #e8e8e8 Absent 1 #ffffd9 Thal 2 3 #c994c7
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000177_s001_rep1 AAACCCAAGAAAGCGA-1 ASAP_PMBDS_000177_s001 -9.057206 -1.032092 cluster_006 7 #f591a5 ASAP-PMDBS-10X ASAP-PMDBS-10X 0ea675f3-4b39-4c16-9518-66ddd9c409a1 ... #e41a1c Mild Cognitive Impairment 2 #377eb8 Brainstem/Limbic 5 #41b6c4 Thal 3 4 #df65b0
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000185_s001_rep1 AAACCCAAGAAAGCGA-1 ASAP_PMBDS_000185_s001 -0.685645 -4.943793 cluster_001 2 #e4a9ba ASAP-PMDBS-10X ASAP-PMDBS-10X 0eb0506b-e76b-4f9a-a3b6-bc6bbccfdd32 ... #e41a1c Mild Cognitive Impairment 2 #377eb8 Brainstem predominant 4 #7fcdbb Thal 3 4 #df65b0
AAACCCAAGAAATCCA-1_ASAP_PMBDS_000122_s002_1 AAACCCAAGAAATCCA-1 ASAP_PMBDS_000122_s002 -3.507261 13.855699 cluster_002 3 #a02226 ASAP-PMDBS-10X ASAP-PMDBS-10X 1f67fc2a-7e81-4448-8437-e2c6ef8b4b1b ... #e8e8e8 Unknown 4 #e8e8e8 Unknown 10 #e8e8e8 Unknown 8 #e8e8e8

5 rows × 50 columns

Loading MapMyCells SEA-AD Taxonomy Results#

We load the MapMyCells hierarchical correlation mapping to 10X SEA-AD taxonomy results. These are calculated for each cell and can be merged into our cell_metadata table on cell_label index. For more information on MapMyCells see here.

Note that the SEA-AD taxonomy was created using data from the middle temporal gyrus (MTG) of the cortex. This means the model used in MapMyCells for this taxonomy has no knowledge of cell types specific to non-cortical regions. For most of this section we plot all cells and plot data from cortical regions only later in this notebook.

seaad_mmc_results = abc_cache.get_metadata_dataframe(
    directory='ASAP-PMDBS-taxonomy',
    file_name='mmc_results_seaad',
    comment='#'
).set_index('cell_label')
seaad_mmc_results.head()
mmc_results_seaad.csv: 100%|█████████████████████████████████████████████████████████████████| 1.25G/1.25G [03:51<00:00, 5.40MMB/s]
class_label class_name class_bootstrapping_probability class_correlation_coefficient class_aggregate_probability subclass_label subclass_name subclass_bootstrapping_probability subclass_correlation_coefficient subclass_aggregate_probability supertype_label supertype_name cluster_alias supertype_bootstrapping_probability supertype_correlation_coefficient supertype_aggregate_probability
cell_label
AGCGATTCAGAATGTA-1_ASAP_PMBDS_000105_s001_1 CS20230505_CLAS_0003 Non-neuronal and Non-neural 1.0 0.8412 1.0 CS20230505_SUBC_0020 OPC 1.00 0.9091 1.00 CS20230505_SUPT_0117 OPC_2 OPC_2 0.89 0.5850 0.890
GGTGAAGCAGACAATA-1_ASAP_PMBDS_000105_s001_1 CS20230505_CLAS_0003 Non-neuronal and Non-neural 1.0 0.8666 1.0 CS20230505_SUBC_0021 Oligodendrocyte 1.00 0.9192 1.00 CS20230505_SUPT_0120 Oligo_1 Oligo_1 1.00 0.6472 1.000
ATTTACCGTTCTAACG-1_ASAP_PMBDS_000105_s001_1 CS20230505_CLAS_0003 Non-neuronal and Non-neural 1.0 0.9235 1.0 CS20230505_SUBC_0021 Oligodendrocyte 1.00 0.9607 1.00 CS20230505_SUPT_0120 Oligo_1 Oligo_1 1.00 0.6456 1.000
CTCAAGACACGCTTAA-1_ASAP_PMBDS_000105_s001_1 CS20230505_CLAS_0003 Non-neuronal and Non-neural 1.0 0.7778 1.0 CS20230505_SUBC_0019 Astrocyte 1.00 0.6605 1.00 CS20230505_SUPT_0114 Astro_3 Astro_3 0.96 0.4966 0.960
TCGCTTGGTGCGACAA-1_ASAP_PMBDS_000105_s001_1 CS20230505_CLAS_0003 Non-neuronal and Non-neural 1.0 0.7626 1.0 CS20230505_SUBC_0020 OPC 0.89 0.7246 0.89 CS20230505_SUPT_0118 OPC_2_1-SEAAD OPC_2_1-SEAAD 0.70 0.3631 0.623

SEA-AD Taxonomy Metadata.#

Below we load the various metadata we need to map and plot our taxonomy in the UMAP. We then merge these data into the main cell metadata table. For full details on these tables, see the previous ASAP-PMDBS tutorial on the Siletti taxonomy

cluster_annotation_term = abc_cache.get_metadata_dataframe(
    directory='SEAAD-taxonomy',
    file_name='cluster_annotation_term',
).set_index('name')
membership = abc_cache.get_metadata_dataframe(
    directory='SEAAD-taxonomy',
    file_name='cluster_to_cluster_annotation_membership',
)
cluster_annotation_term_set = abc_cache.get_metadata_dataframe(
    directory='SEAAD-taxonomy',
    file_name='cluster_annotation_term_set',
)

membership_groupby = membership.groupby(['cluster_alias', 'cluster_annotation_term_set_name'])
cluster_details = membership_groupby['cluster_annotation_term_name'].first().unstack()
cluster_details = cluster_details[cluster_annotation_term_set['name']] # order columns
cluster_details.fillna('Other', inplace=True)
cluster_details.sort_values(['class', 'subclass', 'supertype'], inplace=True)

cluster_colors = membership_groupby['color_hex_triplet'].first().unstack()
cluster_colors = cluster_colors[cluster_annotation_term_set['name']]
cluster_colors.sort_values(['class', 'subclass', 'supertype'], inplace=True)

cell_metadata = cell_metadata.join(seaad_mmc_results)
cell_metadata = cell_metadata.join(cluster_details[['abundancechangecps']], on='cluster_alias')
cell_metadata = cell_metadata.join(cluster_colors, on='cluster_alias', rsuffix='_color')
cell_metadata.head()
cluster_annotation_term.csv: 100%|████████████████████████████████████████████████████████████| 19.4k/19.4k [00:00<00:00, 345kMB/s]
cluster_to_cluster_annotation_membership.csv: 100%|███████████████████████████████████████████| 43.4k/43.4k [00:00<00:00, 339kMB/s]
cluster_annotation_term_set.csv: 100%|███████████████████████████████████████████████████████████| 204/204 [00:00<00:00, 2.48kMB/s]
cell_barcode sample_label x y cluster_label cluster_label_order cluster_label_color dataset_label feature_matrix_label abc_sample_id ... supertype_name cluster_alias supertype_bootstrapping_probability supertype_correlation_coefficient supertype_aggregate_probability abundancechangecps class subclass supertype abundancechangecps_color
cell_label
AAACCCAAGAAACCAT-1_ASAP_PMBDS_000060_s002_Rep1 AAACCCAAGAAACCAT-1 ASAP_PMBDS_000060_s002 0.016373 1.403025 cluster_000 1 #171c97 ASAP-PMDBS-10X ASAP-PMDBS-10X cc633260-614c-4e5c-aa18-d3e3a0ef81bb ... L6 IT_2 L6 IT_2 0.78 0.0007 0.277 unchanged #00ADF8 #A19922 #716C18 #F7F7F7
AAACCCAAGAAACTCA-1_ASAP_PMBDS_000088_s001_Rep1 AAACCCAAGAAACTCA-1 ASAP_PMBDS_000088_s001 10.029368 1.342043 cluster_013 14 #aa5ed0 ASAP-PMDBS-10X ASAP-PMDBS-10X 4b4f4355-6463-4bf2-9544-3f6d493de741 ... L6 CT_3 L6 CT_3 0.74 0.3515 0.740 unchanged #00ADF8 #2D8CB8 #3D7A91 #F7F7F7
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000177_s001_rep1 AAACCCAAGAAAGCGA-1 ASAP_PMBDS_000177_s001 -9.057206 -1.032092 cluster_006 7 #f591a5 ASAP-PMDBS-10X ASAP-PMDBS-10X 0ea675f3-4b39-4c16-9518-66ddd9c409a1 ... L4 IT_2 L4 IT_2 0.95 0.4353 0.893 unchanged #00ADF8 #00E5E5 #72C3C7 #F7F7F7
AAACCCAAGAAAGCGA-1_ASAP_PMBDS_000185_s001_rep1 AAACCCAAGAAAGCGA-1 ASAP_PMBDS_000185_s001 -0.685645 -4.943793 cluster_001 2 #e4a9ba ASAP-PMDBS-10X ASAP-PMDBS-10X 0eb0506b-e76b-4f9a-a3b6-bc6bbccfdd32 ... L2/3 IT_1 L2/3 IT_1 0.98 0.5391 0.980 decrease #00ADF8 #B1EC30 #EEF987 #2166AC
AAACCCAAGAAATCCA-1_ASAP_PMBDS_000122_s002_1 AAACCCAAGAAATCCA-1 ASAP_PMBDS_000122_s002 -3.507261 13.855699 cluster_002 3 #a02226 ASAP-PMDBS-10X ASAP-PMDBS-10X 1f67fc2a-7e81-4448-8437-e2c6ef8b4b1b ... Oligo_2_1-SEAAD Oligo_2_1-SEAAD 0.96 0.3915 0.960 unchanged #808080 #53776C #5A9B82 #F7F7F7

5 rows × 71 columns

UMAP Plotting#

We define convenience function to plot the Uniform Manifold Approximation and Projection (UMAP) of the ASAP PMDB data.

def plot_umap(
    xx,
    yy,
    cc=None,
    val=None,
    fig_width=8,
    fig_height=8,
    cmap=None,
    labels=None,
    term_order_lookup=None,
    colorbar=False,
    sizes=None
):
    """
    """
    if sizes is None:
        sizes = 1
    fig, ax = plt.subplots()
    fig.set_size_inches(fig_width, fig_height)

    if cmap is not None:
        scatt = ax.scatter(xx, yy, c=val, s=0.5, marker='.', cmap=cmap, alpha=sizes)
    elif cc is not None:
        scatt = ax.scatter(xx, yy, c=cc, s=0.5, marker='.', alpha=sizes)

    if labels is not None:
        from matplotlib.patches import Rectangle
        unique_labels = labels.unique()
        unique_colors = cc.unique()

        if term_order_lookup is not None:
            term_order = np.argsort(term_order_lookup.loc[unique_labels, 'term_order'])
            unique_labels = unique_labels[term_order]
            unique_colors = unique_colors[term_order]
            
        rects = []
        for color in unique_colors:
            rects.append(Rectangle((0, 0), 1, 1, fc=color))

        legend = ax.legend(rects, unique_labels, loc=2)
        # ax.add_artist(legend)

    if colorbar:
        fig.colorbar(scatt, ax=ax)
    
    return fig, ax

Transferred Whole Human Brain Taxonomy Types#

The next set of plots colorized the PMDBS cells by SEA-AD taxonomy at the class, subclass and supertype levels. At the class level, the SEA-AD divides cells into broad GABAergic, Glutamatergic, non-neuronal and non-neural classes. Compared to Siletti superclusters, colorization by SEA-AD subclasses provides next level annotation to cortical Glut IT (intratelencephalic), CT (corticothalamic) and NP (near-projecting) types with laminar specificity. As well as division of CGE and MGE GABA types into major subclasses: Sst, Pvalb, Sst Chodl, Chandelier, Lamp5 Lhx6, Lamp5, Vip, Sncg and Pax6. Much like the results with WHB taxonomy mapping, we see that the center of the UMAP contains an ambiguous region of mixed assignments.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['class'],
    labels=cell_metadata['class_name'],
    term_order_lookup=cluster_annotation_term,
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Class")
plt.show()
../_images/d5a9c036e6e90dbe536db363da45ce5522791bb09a5011335422b0e2b1407a54.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['subclass'],
    labels=cell_metadata['subclass_name'],
    term_order_lookup=cluster_annotation_term,
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Subclass")
plt.show()
../_images/5c6da6fc82c1187b706b1ff085f872bd8b0fcd9c5bb3d47b313991f914b268aa.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['supertype'],
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Supertype")
plt.show()
../_images/b393d9a3e5d4cf24197417a94d914430becd7de20c94923f178395aeff75b97e.png

Assignment Correlation and Probabilities#

The next three plots colorized PDMBS cells by the boostrapping probability at each level. Here we see that cells in the center part of the UMAP have the lowest probability mapping at every level and corresponding to where we also see highly mixed supercluster assignments. Since the SEA-AD taxonomy is based on cortical data, we also see low probability on additional cluster islands (compared to WHB mapping results) derived from regions outside of the isocortex such as the hippocampus, amygdala and putamen. We refer to these as “non-cortical” in the remainder of the notebook.

The final two plots colorized the cells by the aggregated probability and correlation coefficient at the subclass level.

fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    val=cell_metadata['class_bootstrapping_probability'],
    cmap=plt.cm.magma_r,
    fig_width=16,
    fig_height=12,
    colorbar=True
)
res = ax.set_title("Class Bootstrap Prob")
plt.show()
../_images/3aa94d2d2788f193edd3bc0ea3b989c45becfc7b060718177f53fcfffd15b56b.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    val=cell_metadata['subclass_bootstrapping_probability'],
    cmap=plt.cm.magma_r,
    fig_width=16,
    fig_height=12,
    colorbar=True
)
res = ax.set_title("Subclass Bootstrap Prob")
plt.show()
../_images/94706b74c905e24be1ea43382b6df74a23ee778305a2ee3f6ca69851f42507e9.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    val=cell_metadata['supertype_bootstrapping_probability'],
    cmap=plt.cm.magma_r,
    fig_width=16,
    fig_height=12,
    colorbar=True
)
res = ax.set_title("Supertype Bootstrap Prob")
plt.show()
../_images/d03cea6268e8615e9831ced3fc4e22ef14d4b24b963bf4bc48b05b576f573947.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    val=cell_metadata['supertype_aggregate_probability'],
    cmap=plt.cm.magma_r,
    fig_width=16,
    fig_height=12,
    colorbar=True
)
res = ax.set_title("Joint Bootstrap Prob")
plt.show()
../_images/9ab027d4f4c9e11182b73370455f34030ba03ef0140c9b9cc1113598e5e8cf6d.png
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    val=cell_metadata['subclass_correlation_coefficient'],
    cmap=plt.cm.magma_r,
    fig_width=16,
    fig_height=12,
    colorbar=True
)
res = ax.set_title("Subclass Correlation Coefficient")
plt.show()
../_images/791ae2340e27d6d26486c58f12c8e2a8dc86840fb7187046f1192f95112a4dab.png

Removing cells with low quality mapping and from non-cortical samples#

Since we know that the SEA-AD taxonomy was created using only cortical regions, we will remove this before making our cut on subclass, mapping probability and correlation.

cortical_mask = cell_metadata['region_of_interest_label'].isin([
    'middle frontal gyrus', 'prefrontal cortex', 'inferior parietal lobule',
    'middle temporal gyrus', 'anterior cingulate gyrus'
])
# We make a simple label to tell use which cells are cortical based on the mask above.
cell_metadata['cortex'] = np.where(cortical_mask, 'cortical', 'non-cortical')
cell_metadata['cortex_color'] = np.where(cortical_mask, '#FF0000', '#808080')

cell_seaad_cortex_only = cell_metadata[cortical_mask]
cell_seaad_noncortex = cell_metadata[~cortical_mask]
fig, ax = plot_umap(
    cell_metadata['x'],
    cell_metadata['y'],
    cc=cell_metadata['cortex_color'],
    labels=cell_metadata['cortex'],
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Cortext Cells")
plt.show()
../_images/176408e14b75c94bcb76c6422d897570b443d953d94d316c0c2c3027740bf228.png

Again, like the Siletti tutorial, we make a simple cut of probability and correlation coefficient of > 0.5. We combine this with our cortical mask to get the highest quality cells. These cuts combine to remove roughly half of the cells and most of the cells from the central region of the UMAP. These cuts are not definitive, and users should play with them to achieve a cell mapping quality to their liking.

prob_05_cut = np.logical_and(
    cell_metadata['subclass_aggregate_probability'] > 0.5,
    cell_metadata['subclass_correlation_coefficient'] > 0.5
)
cell_seaad_prob_high = cell_metadata[prob_05_cut & cortical_mask]
cell_seaad_prob_low = cell_metadata[~(prob_05_cut & cortical_mask)]
fig, ax = plot_umap(
    cell_seaad_prob_high['x'],
    cell_seaad_prob_high['y'],
    cc=cell_seaad_prob_high['subclass'],
    labels=cell_seaad_prob_high['subclass_name'],
    term_order_lookup=cluster_annotation_term,
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Subclass (cortical and prob/coeff > 0.5)")
plt.show()
../_images/fc96906e92042f2abda9b043aebe46d40d9d759449881337fa40a63590893cc8.png
fig, ax = plot_umap(
    cell_seaad_prob_low['x'],
    cell_seaad_prob_low['y'],
    cc=cell_seaad_prob_low['subclass'],
    labels=cell_seaad_prob_low['subclass_name'],
    term_order_lookup=cluster_annotation_term,
    fig_width=16,
    fig_height=16
)
res = ax.set_title("Subclass Subclass (non-cortical and prob/coeff < 0.5)")
plt.show()
../_images/304d66c97e67badabdebf3e436e8eb7219460f8b7501f6a3c2c58cf71574ad9a.png

Mapping the SEA-AD subclasses on to the Leiden Clusters#

Finally, we’ll use the mapping of the cells into the SEA-AD taxonomy to infer mappings of the Leiden clusters that the ASAP PMDBS data provide. To do this, we’ll compute the fraction of cells that are in each Leiden cluster that are mapped into a SEA-AD subclass.

group = cell_seaad_prob_high.groupby(['cluster_label', 'subclass_name'])[['subclass']].count()
assignment_df = pd.DataFrame(
    index=cell_seaad_prob_high['subclass_name'].unique(),
    columns=cell_seaad_prob_high['cluster_label'].unique()
).fillna(0.0)

for clust_lab in cell_seaad_prob_high['cluster_label'].unique():
    subgroup = group.loc[clust_lab]
    assignment_df.loc[subgroup.index, clust_lab] = (subgroup / subgroup.sum()).to_numpy()

assignment_df = assignment_df.reindex(
    sorted(assignment_df.columns), axis=1
).loc[
    cluster_annotation_term[cluster_annotation_term['cluster_annotation_term_set_name'] == 'subclass'].sort_values('term_order').index
]

column_order = []
for row_idx, supertype in enumerate(assignment_df.index):
    saved_cols = []
    col_maxes = []
    for col in assignment_df.columns:
        max_arg = np.argmax(assignment_df[col])
        if max_arg == row_idx:
            saved_cols.append(col)
            col_maxes.append(assignment_df[col][max_arg])
    saved_cols = np.array(saved_cols)
    column_order.extend(saved_cols)

assignment_df = assignment_df[column_order]
assignment_df.head()
/var/folders/kc/7glrmt5n67x16yj_tg86t49c0000gp/T/ipykernel_7801/2341538799.py:5: FutureWarning: Downcasting object dtype arrays on .fillna, .ffill, .bfill is deprecated and will change in a future version. Call result.infer_objects(copy=False) instead. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  ).fillna(0.0)
/var/folders/kc/7glrmt5n67x16yj_tg86t49c0000gp/T/ipykernel_7801/2341538799.py:25: FutureWarning: Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
  col_maxes.append(assignment_df[col][max_arg])
cluster_018 cluster_012 cluster_025 cluster_009 cluster_004 cluster_020 cluster_001 cluster_014 cluster_021 cluster_028 ... cluster_024 cluster_027 cluster_008 cluster_000 cluster_002 cluster_015 cluster_022 cluster_010 cluster_007 cluster_026
name
Lamp5 Lhx6 0.983864 0.000589 0.142857 0.000151 0.000174 0.000208 0.000013 0.036299 0.000103 0.0 ... 0.0 0.0 0.000017 0.002577 0.000000 0.0 0.0 0.000028 0.0 0.0
Lamp5 0.012955 0.573468 0.285714 0.000786 0.000348 0.000934 0.000035 0.018505 0.000207 0.0 ... 0.0 0.0 0.000679 0.006559 0.000004 0.0 0.0 0.000197 0.0 0.0
Pax6 0.000000 0.165833 0.000000 0.002025 0.000035 0.000000 0.000019 0.007829 0.000103 0.0 ... 0.0 0.0 0.000000 0.000703 0.000004 0.0 0.0 0.000028 0.0 0.0
Sncg 0.000000 0.248611 0.000000 0.004518 0.000148 0.000208 0.000016 0.005694 0.000103 0.0 ... 0.0 0.0 0.000296 0.001874 0.000000 0.0 0.0 0.000056 0.0 0.0
Vip 0.000694 0.009752 0.000000 0.989771 0.000321 0.001038 0.000131 0.043416 0.000207 0.0 ... 0.0 0.0 0.000609 0.006090 0.000004 0.0 0.0 0.000366 0.0 0.0

5 rows × 30 columns

def heatmap(data, row_labels, col_labels, ax=None,
            cbar_kw=None, cbarlabel="", xtick_colors=None, **kwargs):
    """
    from: https://matplotlib.org/stable/gallery/images_contours_and_fields/image_annotated_heatmap.html

    Create a heatmap from a numpy array and two lists of labels.

    Parameters
    ----------
    data
        A 2D numpy array of shape (M, N).
    row_labels
        A list or array of length M with the labels for the rows.
    col_labels
        A list or array of length N with the labels for the columns.
    ax
        A `matplotlib.axes.Axes` instance to which the heatmap is plotted.  If
        not provided, use current Axes or create a new one.  Optional.
    cbar_kw
        A dictionary with arguments to `matplotlib.Figure.colorbar`.  Optional.
    cbarlabel
        The label for the colorbar.  Optional.
    xtick_colors
        Set of colors to label the x ticklabels by. Must be the same length as
        col_labels. Optional.
    **kwargs
        All other arguments are forwarded to `imshow`.
    """

    if ax is None:
        ax = plt.gca()

    if cbar_kw is None:
        cbar_kw = {}

    # Plot the heatmap
    im = ax.imshow(data, **kwargs)

    # Create colorbar
    cbar = ax.figure.colorbar(im, ax=ax, **cbar_kw)
    cbar.ax.set_ylabel(cbarlabel, rotation=-90, va="bottom")

    # Show all ticks and label them with the respective list entries.
    ax.set_xticks(range(data.shape[1]), labels=col_labels,
                  rotation=-30, ha="right", rotation_mode="anchor")
    ax.set_yticks(range(data.shape[0]), labels=row_labels)

    # Let the horizontal axes labeling appear on top.
    ax.tick_params(top=True, bottom=False,
                   labeltop=True, labelbottom=False)

    # Turn spines off and create white grid.
    ax.spines[:].set_visible(False)

    ax.set_xticks(np.arange(data.shape[1]+1)-.5, minor=True)
    ax.set_yticks(np.arange(data.shape[0]+1)-.5, minor=True)
    ax.grid(which="minor", color="w", linestyle='-', linewidth=3)
    ax.tick_params(which="minor", bottom=False, left=False)

    if xtick_colors is not None:
        xticks = ax.get_xticklabels()

        for idx, tick in enumerate(xticks):
            tick.set_color(xtick_colors[idx])

        ax.set_xticklabels(xticks)

    return im, cbar

Below we plot the heatmap showing the fraction of cells in a Leiden Cluster assigned to a given SEA-AD subclass. Note the Leiden cluster names are colored with the colors they were assigned with and shown in the ASAP Data Overview tutorial.

We see that some clusters have clear assignments to SEA-AD. The ordering we produced above shows a step like structure in the mapping. Horizontal sets show multiple Leiden clusters being assigned to a single SEA-AD subclasses while vertical sets show Leiden clusters that have been split across multiple supertypes that are similar ontologically. We can see that there are also a few Leiden clusters that have ambiguous mappings as well, even after our cut on MapMyCells probability.

fig, ax = plt.subplots()
fig.set_size_inches(16, 8)

heatmap(
    data=assignment_df,
    row_labels=assignment_df.index.to_numpy(),
    col_labels=assignment_df.columns,
    cmap=plt.cm.magma_r,
    vmin=0,
    vmax=1,
    cbarlabel="Fraction assigned to subclass",
    xtick_colors=value_sets.loc[assignment_df.columns]['color_hex_triplet'].to_numpy()
)

fig.tight_layout()
../_images/913166c29e3e22188c57b526cce09d432647e3fdb6fbc108757c6632435fe87c.png

In a related notebook, we explore a similar MapMyCells mapping with the Siletti taxonomy. You can also explore gene expressions in the following notebook.