Human-Mammalian Brain - Basal Ganglia Patchseq#

The basal ganglia (BG) constitute a system of interconnected brain structures that play a crucial role in motor control, learning, behavior, and emotion. These structures are implicated in numerous disorders affecting human health, including Parkinson’s disease, Huntington’s disease, and substance abuse disorders. Despite their importance, little is known about the cellular-level functional properties of the BG in humans and translationally relevant primate species. To address this gap, we performed Patch-seq experiments – integrating patch clamp electrophysiology with single cell RNA-sequencing anchored in the high-resolution, cross-species HMBA consensus basal ganglia cell type taxonomy. This approach enables us to reveal the morpho-electric properties of transcriptomically-defined striatal cell types in primates. Following electrophysiological characterization, nuclei were aspirated for RNA sequencing (SMART-Seq v4), and transcriptomic profiles were mapped to the consensus taxonomy. Each sample was assigned spatial coordinates within an anatomical reference, and cells with high-quality fills and mappings were subsequently reconstructed to obtain morphological features. The data derive from macaque (Macaca mulatta and Macaca nemestrina) striatal tissue obtained through a Tissue Distribution Program. These data are directly comparable to mouse basal ganglia datasets collected using the same protocols.

Patch-seq samples were mapped to striatal neuronal types using the Hierarchical Approximate Nearest Neighbor (HANN) method implemented in the Allen Institute MapMyCells package, excluding non-neuronal types. After applying quality control to the mapping results, 717 macaque samples were retained. For more information on the taxonomy, please refer to the following webpage: Mammalian Basal Ganglia Consensus Cell Type Atlas. Additionally, you can see the associated notebooks linked at the end of this page.

In this notebook, we’ll look at gene expression for these 717 Macaque cells for different regions and taxonomy assignments. We’ll also look at the electrophysiology and morphology summary stats as a function of the taxonomy.

You need to be connected to the internet to run this notebook and have run through the getting started notebook.

from typing import List, Tuple, Optional

import pandas as pd
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache
from abc_atlas_access.abc_atlas_cache.anndata_utils import get_gene_data

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requested data on disk. For metadata, the cache can also directly serve a up a Pandas DataFrame. See the getting_started notebook for more details on using the cache including installing the cache if it has not already been.

Change the download_base variable to where you would like to download the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(
    download_base,
)

abc_cache.current_manifest
'releases/20260228/manifest.json'

Cell metadata#

Below we load the metadata associated with the individual patchseq cells and donor and library metadata. We join the donor and library data into the cell_metadata, this gives us e.g. donor_sex, donor_age information and region of interest labels from the library information.

macaque_cell_metadata = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='cell_metadata',
    dtype={"cell_label": str}
).set_index('cell_label')

macaque_donor = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='donor'
).set_index('donor_label')

macaque_library = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='library'
).set_index('library_label')

macaque_cell_metadata = macaque_cell_metadata.join(macaque_donor, on='donor_label', rsuffix='_donor')
macaque_cell_metadata = macaque_cell_metadata.join(macaque_library, on='library_label', rsuffix='_library')
macaque_cell_metadata.head()
cell_metadata.csv: 100%|██████████| 109k/109k [00:00<00:00, 998kMB/s]
donor.csv: 100%|██████████| 6.56k/6.56k [00:00<00:00, 77.2kMB/s]
library.csv: 100%|██████████| 142k/142k [00:00<00:00, 919kMB/s]  
library_label library_aliquot_label barcoded_cell_sample_label donor_label dataset_label feature_matrix_label alignment_id donor_age_value donor_age_unit age_at_death_reference_point ... library_aliquot_label_library barcoded_cell_sample_label_library barcoded_cell_sample_technique amplified_cdna_label tissue_sample_local_name donor_label_library region_of_interest_label region_of_interest_id library_aliquot_nhash_id library_nhash_id
cell_label
QN22.26.002.14.05.04 L8S4_220222_04_H05 AB-S40302_S424_E1-50 P4S4_220131_155_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710839 15 years birth ... AB-S40302_S424_E1-50 P4S4_220131_155_A01 PatchSeq-SmartSeq A8S4_220215_06_H05 QN22.26.002.14.05 QN22.26.002 Ca HOMBA:10334 LA-JEACHN795730XCYBFE242803 LI-JEACHN795730
QN22.26.002.14.05.05 L8S4_220222_04_A06 AB-S40302_S425_E1-50 P4S4_220131_156_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710837 15 years birth ... AB-S40302_S425_E1-50 P4S4_220131_156_A01 PatchSeq-SmartSeq A8S4_220215_06_A06 QN22.26.002.14.05 QN22.26.002 Ca HOMBA:10334 LA-WCXYWT453000XCYBFE242803 LI-WCXYWT453000
QN22.26.002.14.05.06 L8S4_220222_04_B06 AB-S40302_S426_E1-50 P4S4_220131_157_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710836 15 years birth ... AB-S40302_S426_E1-50 P4S4_220131_157_A01 PatchSeq-SmartSeq A8S4_220215_06_B06 QN22.26.002.14.05 QN22.26.002 Ca HOMBA:10334 LA-JMJMOH539975XCYBFE242803 LI-JMJMOH539975
QN22.26.002.14.05.08 L8S4_220222_04_C06 AB-S40302_S427_E1-50 P4S4_220131_158_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710835 15 years birth ... AB-S40302_S427_E1-50 P4S4_220131_158_A01 PatchSeq-SmartSeq A8S4_220215_06_C06 QN22.26.002.14.05 QN22.26.002 Ca HOMBA:10334 LA-NJMQPI138328XCYBFE242803 LI-NJMQPI138328
QN22.26.002.14.05.09 L8S4_220222_04_D06 AB-S40302_S428_E1-50 P4S4_220131_159_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710834 15 years birth ... AB-S40302_S428_E1-50 P4S4_220131_159_A01 PatchSeq-SmartSeq A8S4_220215_06_D06 QN22.26.002.14.05 QN22.26.002 Ca HOMBA:10334 LA-HIEYXR972206XCYBFE242803 LI-HIEYXR972206

5 rows × 27 columns

We load a value set table that defines colors for various features in the dataset as well as orders for those feature terms. We use these orders later in this notebook.

value_sets = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='value_sets'
).set_index('label')
value_sets
value_sets.csv: 100%|██████████| 1.68k/1.68k [00:00<00:00, 26.1kMB/s]
table field description order external_identifier parent_label color_hex_triplet
label
Female donor donor_sex Female 1 NaN NaN #565353
Male donor donor_sex Male 2 NaN NaN #ADC4C3
Human donor species_genus Human 1 NCBITaxon:9605 NaN #377eb8
Macaque donor species_genus Macaque 2 NCBITaxon:9539 NaN #4daf4a
Marmoset donor species_genus Marmoset 3 NCBITaxon:9481 NaN #FF5F5D
Squirrel monkey donor species_genus Squirrel monkey 4 NCBITaxon:9520 NaN #ffa300
Mouse donor species_genus Mouse 5 NCBITaxon:10088 NaN #c941a7
Homo sapiens donor species_scientific_name Homo sapiens 1 NCBITaxon:9606 NaN #377eb8
Macaca mulatta donor species_scientific_name Macaca mulatta 2 NCBITaxon:9544 NaN #4daf4a
Macaca nemestrina donor species_scientific_name Macaca nemestrina 3 NCBITaxon:9545 NaN #b2df8a
Callithrix jacchus donor species_scientific_name Callithrix jacchus 4 NCBITaxon:9483 NaN #FF5F5D
Saimiri sciureus donor species_scientific_name Saimiri sciureus 5 NCBITaxon:9521 NaN #FF5F5D
Mus musculus donor species_scientific_name Mus musculus 6 NCBITaxon:10090 NaN #ffa300
STR library region_of_interest_label striatum (neostriatum) 1 HOMBA:10333 NaN #048359
DS library region_of_interest_label dorsal striatum (caudoputamen complex-CP, STRd) 2 HOMBA:AA30192 STR #ff399a
Ca library region_of_interest_label caudate nucleus (mediodorsal division of the CP) 3 HOMBA:10334 DS #ff6b9a
Pu library region_of_interest_label putamen (lateroventral division of the CP) 4 HOMBA:10338 DS #9ba2ff
VS library region_of_interest_label ventral striatum (STRv) 5 HOMBA:AA30195 STR #ffe119
NAC library region_of_interest_label nucleus accumbens 6 HOMBA:10339 VS #c29515
ic library region_of_interest_label internal capsule 7 HOMBA:10581 NaN #ffd7b4

We define a convenience function to add colors for the various values in the data (e.g. unique region of interest or donor sex values).

def extract_value_set(cell_metadata_df: pd.DataFrame, input_value_set: pd.DataFrame, input_value_set_label: str):
    """Add color and order columns to the cell metadata dataframe based on the input
    value set.

    Columns are added as {input_value_set_label}_color and {input_value_set_label}_order.

    Parameters
    ----------
    cell_metadata_df : pd.DataFrame
        DataFrame containing cell metadata.
    input_value_set : pd.DataFrame
        DataFrame containing the value set information.
    input_value_set_label : str
        The column name to extract color and order information for. The information will then be added to
        the cell metadata.
    """
    cell_metadata_df[f'{input_value_set_label}_color'] = input_value_set[
        input_value_set['field'] == input_value_set_label
    ].loc[cell_metadata_df[input_value_set_label]]['color_hex_triplet'].values
    cell_metadata_df[f'{input_value_set_label}_order'] = input_value_set[
        input_value_set['field'] == input_value_set_label
    ].loc[cell_metadata_df[input_value_set_label]]['order'].values

Use our function to add the relevant color and order columns to our cell_metadata table.

# Add region of interest color and order
extract_value_set(macaque_cell_metadata, value_sets, 'region_of_interest_label')

Gene expression data#

Now we extract expression for specific genes from the h5ad files and pair it with our gene metadata. In general, we can use the function get_gene_data to extract the expression of specific genes for all the cells in a given dataset or a subset of cells. For more details on using this convenience function, see the Accessing 10x RNA-seq gene expression data tutorial notebook.

For this tutorial we will use precomputed tables of the expression for specific genes to compare expression by spatial location or by transcriptomic group.

To use the gene expression data, we first load the set of genes for the patch-seq data.

macaque_genes = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='gene',
).set_index('gene_identifier')
macaque_genes.head()
gene.csv: 100%|██████████| 3.33M/3.33M [00:00<00:00, 9.36MMB/s]
gene_symbol description gene_id molecular_type synonym
gene_identifier
NCBIGene:712737 A1BG alpha-1-B glycoprotein 712737 protein_coding []
NCBIGene:703806 A1CF APOBEC1 complementation factor 703806 protein_coding []
NCBIGene:716616 A2ML1 alpha-2-macroglobulin like 1 716616 protein_coding []
NCBIGene:711045 A3GALT2 alpha 1,3-galactosyltransferase 2 711045 protein_coding []
NCBIGene:710998 A4GALT alpha 1,4-galactosyltransferase 710998 protein_coding []

Below we list the genes we will use in this notebook and the example method used to load the expression for these specific genes from the h5ad file. More details on how to extract specific genes from the data see our accessing gene expression data tutorial

gene_names = ["WFS1", "CNR1", "CHRM3", "MYO16", "PTHLH", "DRD1", "DRD2", "TAC3", "SST"] 

macaque_gene_data = get_gene_data(
    abc_atlas_cache=abc_cache,
    all_cells=macaque_cell_metadata,
    all_genes=macaque_genes,
    selected_genes=gene_names,
    data_type='log2'
)
macaque_gene_data.head()
loading file: HMBA-Macaque-PatchSeq-BG
HMBA-Macaque-PatchSeq-BG-log2.h5ad: 100%|██████████| 63.8M/63.8M [00:02<00:00, 23.9MMB/s]  
 - time taken:  0.115348
total time taken: 0.905653
	total cells: 717 processed cells: 717
gene_symbol CHRM3 CNR1 DRD1 DRD2 MYO16 PTHLH SST TAC3 WFS1
cell_label
QN22.26.002.14.05.04 10.811937 0.000000 0.000000 8.153246 4.933148 3.680369 6.044661 0.0 5.300721
QN22.26.002.14.05.05 2.088328 5.323032 6.759345 0.000000 0.000000 4.358629 0.000000 0.0 0.000000
QN22.26.002.14.05.06 7.193662 0.000000 0.000000 7.467582 4.027162 0.000000 0.000000 0.0 6.536435
QN22.26.002.14.05.08 7.406147 0.000000 0.000000 7.258967 6.859893 2.354068 0.000000 0.0 3.205592
QN22.26.002.14.05.09 9.808509 0.000000 8.067217 0.000000 0.000000 0.000000 0.000000 0.0 6.322538

Now that we’ve loaded our expression matrix tables, indexed by cell_label, we can merge them into our cell metadata tables.

macaque_cell_metadata_with_genes = macaque_cell_metadata.join(macaque_gene_data, on='cell_label')

Plotting heatmap expression data#

We define a helper functions aggregate_by_metadata to compute the average expression for a given category. Note that this is a simple wrapper of the Pandas GroupBy functionality. Other summary statistics beyond just the mean are listed here: https://pandas.pydata.org/docs/reference/groupby.html#dataframegroupby-computations-descriptive-stats

We also define a function to plot the resultant averaged data in a heatmap.

def aggregate_by_metadata(df: pd.DataFrame, gnames: List[str], value: str, sort: bool = False) -> pd.DataFrame:
    """Aggregate gene expression data by metadata.

    Parameters
    ----------
    df: pandas.DataFrame
        DataFrame containing gene expression and metadata.
    gnames: List[str]
        List of gene names to aggregate.
    value: str
        Metadata column to group by.
    sort: bool
        If True, sort the output by the first gene in gnames.

    Returns
    -------
    pandas.DataFrame
        DataFrame with aggregated gene expression values.
    """
    grouped = df.groupby(value)[gnames].mean()
    if sort:
        grouped = grouped.sort_values(by=gnames[0], ascending=False)
    if f'{value}_order' in df.columns:
        grouped = grouped.loc[df[value].unique()[np.argsort(df[value + '_order'].unique())]]
    return grouped


def plot_heatmap(
        df: pd.DataFrame,
        fig_width: int = 8,
        fig_height: int = 4,
        cmap: mpl.colormaps = plt.cm.magma,
        vmin: Optional[float] = None,
        vmax: Optional[float] = None,
        unit: str = '[log2(CPM + 1)]'
    ) -> Tuple[plt.Figure, plt.Axes]:
    """Plot a heatmap from a DataFrame.

    Parameters
    ----------
    df: pandas.DataFrame
        DataFrame to plot as a heatmap.
    fig_width: int
        Width of the figure (inches).
    fig_height: int
        Height of the figure (inches).
    cmap: matplotlib.colors.Colormap
        Colormap to use for the heatmap.

    Returns
    -------
    fig: matplotlib.pyplot.Figure
        Figure object containing the heatmap.
    ax: matplotlib.pyplot.Axes
        Axes object containing the heatmap.
    """

    arr = df.to_numpy().astype('float')

    if vmin is None:
        vmin = arr.min()
    if vmax is None:
        vmax = arr.max()
    norm = mpl.colors.Normalize(vmin=vmin, vmax=vmax)
    sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
    cmap = sm.get_cmap()

    fig, ax = plt.subplots()
    fig.set_size_inches(fig_width, fig_height)

    ax.imshow(arr, cmap=cmap, aspect='auto', vmin=vmin, vmax=vmax)
    xlabs = df.columns.values
    ylabs = df.index.values

    ax.set_xticks(range(len(xlabs)))
    ax.set_xticklabels(xlabs)

    ax.set_yticks(range(len(ylabs)))
    ax.set_yticklabels(ylabs)

    cbar = fig.colorbar(sm, ax=ax, orientation='vertical', fraction=0.1, pad=0.01)
    cbar.set_label(unit)

    return fig, ax

Visualize gene expression for selected genes by ROI#

We plot the gene expression as a heat map showing the average expression as a function of region of interest (ROI) label. The full names for the region of interest acronyms can be found in the values_sets table we loaded earlier. Note that samples that were not precisely pinned anatomically were assigned to dorsal or ventral striatum based on the patch-clamp experimentalist’s original notes.

macaque_gene_names = ['DRD1', 'DRD2', 'PTHLH', 'SST', 'TAC3', 'CHRM3', 'CNR1', 'MYO16', 'WFS1']

feature_name = "region_of_interest_label"
agg = aggregate_by_metadata(macaque_cell_metadata_with_genes, macaque_gene_names, feature_name)
# We drop ic here as it has a very small number or cells.
plot_heatmap(agg.drop('ic'), 8, 8, vmin=0, vmax=7) # vmin and vmax set based on observed expression ranges
plt.title(f'Macaque - {feature_name}')
plt.show()
../_images/b80f14abe1e036f198a5bbb8a62e23113f11fa4112b0d85098a1fd783ee92841.png

Taxonomy mapping#

Finally, let’s load the taxonomy information so that we can plot gene expression versus various taxons in a heatmap for each species. Below, we load and merge all the information we need to plot cell colors and taxonomy as done in the previous HMBA tutorial.

cluster = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster',
    dtype={'number_of_cells': 'Int64'}
).rename(columns={'label': 'cluster_annotation_term_label'}).set_index('cluster_annotation_term_label')
cluster_annotation_term_set = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term_set'
).rename(columns={'label': 'cluster_annotation_term_label'})

cluster_annotation_term = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term',
).rename(columns={'label': 'cluster_annotation_term_label'}).set_index('cluster_annotation_term_label')

cluster_to_cluster_annotation_membership = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_to_cluster_annotation_membership',
).set_index('cluster_annotation_term_label')
membership_with_cluster_info = cluster_to_cluster_annotation_membership.join(
    cluster.reset_index().set_index('cluster_alias')[['number_of_cells']],
    on='cluster_alias'
)
membership_with_cluster_info = membership_with_cluster_info.join(cluster_annotation_term, rsuffix='_anno_term').reset_index()
membership_groupby = membership_with_cluster_info.groupby(
    ['cluster_alias', 'cluster_annotation_term_set_name']
)

cluster_annotation_term_set = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term_set'
).rename(columns={'label': 'cluster_annotation_term_label'})

# term_sets = abc_cache.get_metadata_dataframe(directory='WHB-taxonomy', file_name='cluster_annotation_term_set').set_index('label')
cluster_details = membership_groupby['cluster_annotation_term_name'].first().unstack()
cluster_order = membership_groupby['term_order'].first().unstack()
cluster_order.sort_values(['Neighborhood', 'Class', 'Subclass', 'Group', 'Cluster'], inplace=True)
cluster_order.rename(
    columns={'Neighborhood': 'Neighborhood_order',
             'Class': 'Class_order',
             'Subclass': 'Subclass_order',
             'Group': 'Group_order',
             'Cluster': 'Cluster_order'},
    inplace=True
)

cluster_colors = membership_groupby['color_hex_triplet'].first().unstack()
cluster_colors = cluster_colors[cluster_annotation_term_set['name']]
cluster_colors.sort_values(
    ['Neighborhood', 'Class', 'Subclass', 'Group', 'Cluster'],
    inplace=True
)

The macaque patch-seq gene expression has been mapped to the HMBA-BG taxonomy using MapMyCells, excluding non-neuronal nodes. We load this table below. A more lightweight version of this table with only mappings between cells to cluster_alias. The table is named cell_to_cluster_membership.

Below we load the complete MMC results with probabilities and correlation coefficients for all assignments at all levels.

macaque_mmc = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='mmc_results',
).set_index('cell_label')
macaque_mmc.head()
mmc_results.csv: 100%|██████████| 201k/201k [00:00<00:00, 1.42MMB/s] 
Neighborhood_label Neighborhood_name Neighborhood_bootstrapping_probability Neighborhood_aggregate_probability Neighborhood_correlation_coefficient Class_label Class_name Class_bootstrapping_probability Class_aggregate_probability Class_correlation_coefficient ... Group_label Group_name Group_bootstrapping_probability Group_aggregate_probability Group_correlation_coefficient Cluster_label cluster_alias Cluster_bootstrapping_probability Cluster_aggregate_probability Cluster_correlation_coefficient
cell_label
QN22.26.002.14.05.04 CS20250428_NEIGH_0002 Subpallium GABA 1.0 1.0 0.7190 CS20250428_CLASS_0003 CN LGE GABA 1.0 1.0 0.7093 ... CS20250428_GROUP_0054 STRd D2 Striosome MSN 1.00 1.00 0.5715 CS20250428_CLUST_0986 Macaque-429 1.00 1.0000 0.5615
QN22.26.002.14.05.05 CS20250428_NEIGH_0002 Subpallium GABA 1.0 1.0 0.7634 CS20250428_CLASS_0003 CN LGE GABA 1.0 1.0 0.7466 ... CS20250428_GROUP_0049 STRd D1 Matrix MSN 1.00 1.00 0.6477 CS20250428_CLUST_0949 Macaque-467 0.99 0.9900 0.5517
QN22.26.002.14.05.06 CS20250428_NEIGH_0002 Subpallium GABA 1.0 1.0 0.6603 CS20250428_CLASS_0003 CN LGE GABA 1.0 1.0 0.6376 ... CS20250428_GROUP_0054 STRd D2 Striosome MSN 1.00 1.00 0.4564 CS20250428_CLUST_0986 Macaque-429 0.62 0.6200 0.4341
QN22.26.002.14.05.08 CS20250428_NEIGH_0002 Subpallium GABA 1.0 1.0 0.6162 CS20250428_CLASS_0003 CN LGE GABA 1.0 1.0 0.5980 ... CS20250428_GROUP_0052 STRd D2 Matrix MSN 0.88 0.88 0.4317 CS20250428_CLUST_0973 Macaque-465 1.00 0.8800 0.4515
QN22.26.002.14.05.09 CS20250428_NEIGH_0002 Subpallium GABA 1.0 1.0 0.6987 CS20250428_CLASS_0003 CN LGE GABA 1.0 1.0 0.6847 ... CS20250428_GROUP_0049 STRd D1 Matrix MSN 0.99 0.99 0.5025 CS20250428_CLUST_0949 Macaque-467 0.57 0.5643 0.3415

5 rows × 25 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(macaque_mmc)
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_details, on='cluster_alias')
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_colors, on='cluster_alias', rsuffix='_color')
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_order, on='cluster_alias')

macaque_cell_metadata_with_genes.head()
library_label library_aliquot_label barcoded_cell_sample_label donor_label dataset_label feature_matrix_label alignment_id donor_age_value donor_age_unit age_at_death_reference_point ... Neighborhood_color Class_color Subclass_color Group_color Cluster_color Class_order Cluster_order Group_order Neighborhood_order Subclass_order
cell_label
QN22.26.002.14.05.04 L8S4_220222_04_H05 AB-S40302_S424_E1-50 P4S4_220131_155_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710839 15 years birth ... #19613b #d0b83c #253c8c #ff9896 #561430 10 1290 55 3 32
QN22.26.002.14.05.05 L8S4_220222_04_A06 AB-S40302_S425_E1-50 P4S4_220131_156_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710837 15 years birth ... #19613b #d0b83c #1655f2 #1f77b4 #b83642 10 1059 50 3 31
QN22.26.002.14.05.06 L8S4_220222_04_B06 AB-S40302_S426_E1-50 P4S4_220131_157_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710836 15 years birth ... #19613b #d0b83c #253c8c #ff9896 #561430 10 1290 55 3 32
QN22.26.002.14.05.08 L8S4_220222_04_C06 AB-S40302_S427_E1-50 P4S4_220131_158_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710835 15 years birth ... #19613b #d0b83c #253c8c #aec7e8 #d71b39 10 1194 53 3 32
QN22.26.002.14.05.09 L8S4_220222_04_D06 AB-S40302_S428_E1-50 P4S4_220131_159_A01 QN22.26.002 HMBA-Macaque-PatchSeq HMBA-Macaque-PatchSeq-BG 1200710834 15 years birth ... #19613b #d0b83c #1655f2 #1f77b4 #b83642 10 1059 50 3 31

5 rows × 78 columns

Visualize gene expression for selected genes by the HMBA-BG taxonomy#

Below we plot the gene expression for the cells as a function the Group level in the taxonomy they were associated with. We plot the MSNs cell-types only. We’ll also use a subset of the genes we extracted from the gene expression matrix.

macaque_gene_names = ['DRD1', 'DRD2', 'PTHLH', 'SST', 'TAC3', 'CHRM3', 'CNR1', 'MYO16', 'WFS1']

feature_name = "Group"
agg = aggregate_by_metadata(
    macaque_cell_metadata_with_genes,
    macaque_gene_names,
    feature_name
)


plot_heatmap(agg, 8, 8, vmin=0, vmax=7.5)
plt.title(f'Macaque - {feature_name}')
plt.show()
../_images/7477300dc4b16b6e2773126a78550661216dc5283a02e97913c7ea82ebf7fced.png

Other Features#

In addition to gene expression, we also look at electrophysiology and morphology features.

Electrophysiological features#

The electrophysiological features are across 4 tables: ap_features, chirp_features, cross_sweep_long_square_features, and sweep_features. In the below example, we load the ap_features table and merge it into our cell_metadata table.

A full list of the electrophysiological features with some short descriptions can be downloaded here

ap_features = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq', file_name='ap_features'
).set_index('cell_label')

# most cells have no measurement for these features so we filter those with values just to see what the data looks like. 
ap_features[pd.notna(ap_features['adp_v_last_rheo'])].head()
ap_features.csv: 100%|██████████| 496k/496k [00:00<00:00, 3.00MMB/s] 
adp_v_last_rheo ahp_delay_5spike ahp_delay_hero ahp_delay_ratio_5spike ahp_delay_ratio_hero downstroke_adapt_ratio downstroke_hero downstroke_ramp downstroke_rheo downstroke_short_square ... width_adapt_ratio width_hero width_ramp width_rheo width_rheo_ms width_short_square width_suprathresh_hero width_suprathresh_ramp width_suprathresh_rheo width_suprathresh_short_square
cell_label
Q21.26.003.12.03.01 -55.000004 0.021870 0.000980 0.191146 0.034789 0.879896 -267.810016 NaN -256.109021 -277.295319 ... 1.090909 0.000375 NaN 0.000385 0.385 0.000370 0.000345 NaN 0.000355 0.000335
Q21.26.006.14.01.04 -50.093750 0.043635 0.043635 0.237018 0.237018 0.879694 -87.819662 NaN -88.143416 -99.554269 ... 1.086124 0.001045 NaN 0.001055 1.055 0.000935 0.000945 NaN 0.000955 0.000850
Q21.26.009.12.01.03 -51.000004 0.155950 0.001390 0.483124 0.066763 0.900912 -179.379158 NaN -169.956722 -191.220558 ... 1.070707 0.000475 NaN 0.000485 0.485 0.000460 0.000430 NaN 0.000455 0.000415
Q21.26.009.12.02.03 -61.062504 0.030185 0.026850 0.120473 0.188131 0.789810 -72.779128 NaN -69.267846 -80.596543 ... 1.152610 0.001270 NaN 0.001305 1.305 0.001220 0.001175 NaN 0.001185 0.001125
Q21.26.009.12.01.04 -52.812500 0.037275 0.026275 0.250311 0.255842 0.982069 -73.422130 NaN -71.413402 -79.772097 ... 1.101770 0.001120 NaN 0.001140 1.140 0.001085 0.001045 NaN 0.001050 0.000990

5 rows × 57 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(ap_features)

Below, we create a function to create a box plot for a given feature by e.g. a taxonomy level, ROI structure, etc.

def plot_boxplot_by_column(
        dataframe, col_name, value_col
    ):
    """
    Plots a boxplot of the specified value column grouped by the specified column.

    Parameters
    ----------
    dataframe : pd.DataFrame
        The input dataframe containing the data to plot.
    col_name : str
        The name of the column to group by.
    value_col : str
        The name of the column containing the values to plot.
    """
    box_values = {}
    for group in dataframe[col_name].unique()[
            np.argsort(dataframe[f"{col_name}_order"].unique()) # Order by the specified order column
    ]:
        col_values = dataframe.loc[dataframe[col_name] == group, value_col]
        if pd.notna(col_values).sum() > 0:
            box_values[group] = col_values[~col_values.isna()].values # Use non-NA values for the boxplot

    plt.boxplot(
        box_values.values()
    )
    plt.xticks(
        ticks=np.arange(1, len(box_values) + 1),
        labels=box_values.keys(),
        rotation=90
    )
    plt.ylabel(value_col)
    plt.title(f'{value_col} by {col_name}')
    plt.show()

We’ll plot the ephys feature, width_rheo in a boxplot against the Groups in this dataset.

plot_boxplot_by_column(dataframe=macaque_cell_metadata_with_genes, col_name='Group', value_col='width_rheo')
../_images/a0926e597d265f8c1687facbce71b649c82c7d045ab8201c55dd621b7d198e01.png

Morphological Features#

We now load our morphological features and merge them into the cell metadata. A full list of the morphological features with short descriptions can be downloaded here.

morphology_features = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq', file_name='morphology_features'
).set_index('cell_label')
morphology_features.head()
morphology_features.csv: 100%|██████████| 97.4k/97.4k [00:00<00:00, 642kMB/s] 
23_Sholl_PC0 23_Sholl_PC1 2_Sholl_PC0 2_Sholl_PC1 3_Sholl_PC0 3_Sholl_PC1 axon_bias_dorsal axon_bias_medial axon_exit_distance axon_exit_theta_coronal ... basal_dendrite_mean_diameter basal_dendrite_num_branches basal_dendrite_soma_percentile_dorsal basal_dendrite_soma_percentile_medial basal_dendrite_stem_exit_MedialLateral basal_dendrite_stem_exit_dorsal basal_dendrite_stem_exit_ventral basal_dendrite_total_length basal_dendrite_total_surface_area soma_surface_area
cell_label
Q19.26.014.11.04.01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Q19.26.014.11.01.01 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
Q19.26.014.11.01.04 NaN NaN NaN NaN 168.820400 29.150249 NaN NaN NaN 0.667150 ... 0.573221 56.0 0.468694 0.334369 0.5 0.125 0.375 5241.027500 9442.031612 548.486434
Q19.26.014.11.01.05 NaN NaN NaN NaN 282.741349 -36.067186 NaN NaN NaN 0.875211 ... 0.565709 69.0 0.489853 0.428372 0.4 0.200 0.400 5909.886270 10526.672804 359.438955
Q19.26.014.11.05.01 NaN NaN NaN NaN 276.829831 -84.814559 NaN NaN NaN 0.447964 ... 0.609324 77.0 0.576005 0.407282 0.8 0.200 0.000 5946.308781 11421.095519 502.026310

5 rows × 39 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(morphology_features)

Finally, we plot the soma surface area feature against the taxonomy Group.

plot_boxplot_by_column(dataframe=macaque_cell_metadata_with_genes, col_name='Group', value_col='soma_surface_area')
../_images/04263ddc8191f30ebbd97707c0fcc1109e1ab90ad0705d25e33b6882ecc8affb.png

To learn more about the spatial transcriptomic data see the HMBA-BG Spatial Slabs and Taxonomy notebook.

To learn more about the taxonomy used in this notebook and the data it is derived from, see the HMBA-BG Clustering Analysis and Taxonomy notebook.