Human-Mammalian Brain - Basal Ganglia Patchseq

Human-Mammalian Brain - Basal Ganglia Patchseq#

The basal ganglia (BG) constitute a system of interconnected brain structures that play a crucial role in motor control, learning, behavior, and emotion. These structures are implicated in numerous disorders affecting human health, including Parkinson’s disease, Huntington’s disease, and substance abuse disorders. Despite their importance, little is known about the cellular-level functional properties of the BG in humans and translationally relevant primate species. To address this gap, we performed Patch-seq experiments – integrating patch clamp electrophysiology with single cell RNA-sequencing anchored in the high-resolution, cross-species HMBA consensus basal ganglia cell type taxonomy. This approach enables us to reveal the morpho-electric properties of transcriptomically-defined striatal cell types in primates. Following electrophysiological characterization, nuclei were aspirated for RNA sequencing (SMART-Seq v4), and transcriptomic profiles were mapped to the consensus taxonomy. Each sample was assigned spatial coordinates within an anatomical reference, and cells with high-quality fills and mappings were subsequently reconstructed to obtain morphological features. The data derive from macaque (Macaca mulatta and Macaca nemestrina) striatal tissue obtained through a Tissue Distribution Program. These data are directly comparable to mouse basal ganglia datasets collected using the same protocols.

Patch-seq samples were mapped to striatal neuronal types using the Hierarchical Approximate Nearest Neighbor (HANN) method implemented in the Allen Institute MapMyCells package, excluding non-neuronal types. After applying quality control to the mapping results, 717 macaque samples were retained. For more information on the taxonomy, please refer to the following webpage: Mammalian Basal Ganglia Consensus Cell Type Atlas. Additionally, you can see the associated notebooks linked at the end of this page.

In this notebook, we’ll look at gene expression for these 717 Macaque cells for different regions and taxonomy assignments. We’ll also look at the electrophysiology and morphology summary stats as a function of the taxonomy.

You need to be connected to the internet to run this notebook and have run through the getting started notebook.

from typing import List, Tuple, Optional

import pandas as pd
from pathlib import Path
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache
from abc_atlas_access.abc_atlas_cache.anndata_utils import get_gene_data

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requested data on disk. For metadata, the cache can also directly serve a up a Pandas DataFrame. See the getting_started notebook for more details on using the cache including installing the cache if it has not already been.

Change the download_base variable to where you would like to download the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(
    download_base,
)

abc_cache.current_manifest

'releases/20260228/manifest.json'

Cell metadata#

Below we load the metadata associated with the individual patchseq cells and donor and library metadata. We join the donor and library data into the cell_metadata, this gives us e.g. donor_sex, donor_age information and region of interest labels from the library information.

macaque_cell_metadata = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='cell_metadata',
    dtype={"cell_label": str}
).set_index('cell_label')

macaque_donor = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='donor'
).set_index('donor_label')

macaque_library = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='library'
).set_index('library_label')

macaque_cell_metadata = macaque_cell_metadata.join(macaque_donor, on='donor_label', rsuffix='_donor')
macaque_cell_metadata = macaque_cell_metadata.join(macaque_library, on='library_label', rsuffix='_library')
macaque_cell_metadata.head()

cell_metadata.csv: 100%|██████████| 109k/109k [00:00<00:00, 998kMB/s]
donor.csv: 100%|██████████| 6.56k/6.56k [00:00<00:00, 77.2kMB/s]
library.csv: 100%|██████████| 142k/142k [00:00<00:00, 919kMB/s]  

	library_label	library_aliquot_label	barcoded_cell_sample_label	donor_label	dataset_label	feature_matrix_label	alignment_id	donor_age_value	donor_age_unit	age_at_death_reference_point	...	library_aliquot_label_library	barcoded_cell_sample_label_library	barcoded_cell_sample_technique	amplified_cdna_label	tissue_sample_local_name	donor_label_library	region_of_interest_label	region_of_interest_id	library_aliquot_nhash_id	library_nhash_id
cell_label
QN22.26.002.14.05.04	L8S4_220222_04_H05	AB-S40302_S424_E1-50	P4S4_220131_155_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710839	15	years	birth	...	AB-S40302_S424_E1-50	P4S4_220131_155_A01	PatchSeq-SmartSeq	A8S4_220215_06_H05	QN22.26.002.14.05	QN22.26.002	Ca	HOMBA:10334	LA-JEACHN795730XCYBFE242803	LI-JEACHN795730
QN22.26.002.14.05.05	L8S4_220222_04_A06	AB-S40302_S425_E1-50	P4S4_220131_156_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710837	15	years	birth	...	AB-S40302_S425_E1-50	P4S4_220131_156_A01	PatchSeq-SmartSeq	A8S4_220215_06_A06	QN22.26.002.14.05	QN22.26.002	Ca	HOMBA:10334	LA-WCXYWT453000XCYBFE242803	LI-WCXYWT453000
QN22.26.002.14.05.06	L8S4_220222_04_B06	AB-S40302_S426_E1-50	P4S4_220131_157_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710836	15	years	birth	...	AB-S40302_S426_E1-50	P4S4_220131_157_A01	PatchSeq-SmartSeq	A8S4_220215_06_B06	QN22.26.002.14.05	QN22.26.002	Ca	HOMBA:10334	LA-JMJMOH539975XCYBFE242803	LI-JMJMOH539975
QN22.26.002.14.05.08	L8S4_220222_04_C06	AB-S40302_S427_E1-50	P4S4_220131_158_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710835	15	years	birth	...	AB-S40302_S427_E1-50	P4S4_220131_158_A01	PatchSeq-SmartSeq	A8S4_220215_06_C06	QN22.26.002.14.05	QN22.26.002	Ca	HOMBA:10334	LA-NJMQPI138328XCYBFE242803	LI-NJMQPI138328
QN22.26.002.14.05.09	L8S4_220222_04_D06	AB-S40302_S428_E1-50	P4S4_220131_159_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710834	15	years	birth	...	AB-S40302_S428_E1-50	P4S4_220131_159_A01	PatchSeq-SmartSeq	A8S4_220215_06_D06	QN22.26.002.14.05	QN22.26.002	Ca	HOMBA:10334	LA-HIEYXR972206XCYBFE242803	LI-HIEYXR972206

5 rows × 27 columns

We load a value set table that defines colors for various features in the dataset as well as orders for those feature terms. We use these orders later in this notebook.

value_sets = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='value_sets'
).set_index('label')
value_sets

value_sets.csv: 100%|██████████| 1.68k/1.68k [00:00<00:00, 26.1kMB/s]

	table	field	description	order	external_identifier	parent_label	color_hex_triplet
label
Female	donor	donor_sex	Female	1	NaN	NaN	#565353
Male	donor	donor_sex	Male	2	NaN	NaN	#ADC4C3
Human	donor	species_genus	Human	1	NCBITaxon:9605	NaN	#377eb8
Macaque	donor	species_genus	Macaque	2	NCBITaxon:9539	NaN	#4daf4a
Marmoset	donor	species_genus	Marmoset	3	NCBITaxon:9481	NaN	#FF5F5D
Squirrel monkey	donor	species_genus	Squirrel monkey	4	NCBITaxon:9520	NaN	#ffa300
Mouse	donor	species_genus	Mouse	5	NCBITaxon:10088	NaN	#c941a7
Homo sapiens	donor	species_scientific_name	Homo sapiens	1	NCBITaxon:9606	NaN	#377eb8
Macaca mulatta	donor	species_scientific_name	Macaca mulatta	2	NCBITaxon:9544	NaN	#4daf4a
Macaca nemestrina	donor	species_scientific_name	Macaca nemestrina	3	NCBITaxon:9545	NaN	#b2df8a
Callithrix jacchus	donor	species_scientific_name	Callithrix jacchus	4	NCBITaxon:9483	NaN	#FF5F5D
Saimiri sciureus	donor	species_scientific_name	Saimiri sciureus	5	NCBITaxon:9521	NaN	#FF5F5D
Mus musculus	donor	species_scientific_name	Mus musculus	6	NCBITaxon:10090	NaN	#ffa300
STR	library	region_of_interest_label	striatum (neostriatum)	1	HOMBA:10333	NaN	#048359
DS	library	region_of_interest_label	dorsal striatum (caudoputamen complex-CP, STRd)	2	HOMBA:AA30192	STR	#ff399a
Ca	library	region_of_interest_label	caudate nucleus (mediodorsal division of the CP)	3	HOMBA:10334	DS	#ff6b9a
Pu	library	region_of_interest_label	putamen (lateroventral division of the CP)	4	HOMBA:10338	DS	#9ba2ff
VS	library	region_of_interest_label	ventral striatum (STRv)	5	HOMBA:AA30195	STR	#ffe119
NAC	library	region_of_interest_label	nucleus accumbens	6	HOMBA:10339	VS	#c29515
ic	library	region_of_interest_label	internal capsule	7	HOMBA:10581	NaN	#ffd7b4

We define a convenience function to add colors for the various values in the data (e.g. unique region of interest or donor sex values).

def extract_value_set(cell_metadata_df: pd.DataFrame, input_value_set: pd.DataFrame, input_value_set_label: str):
    """Add color and order columns to the cell metadata dataframe based on the input
    value set.

    Columns are added as {input_value_set_label}_color and {input_value_set_label}_order.

    Parameters
    ----------
    cell_metadata_df : pd.DataFrame
        DataFrame containing cell metadata.
    input_value_set : pd.DataFrame
        DataFrame containing the value set information.
    input_value_set_label : str
        The column name to extract color and order information for. The information will then be added to
        the cell metadata.
    """
    cell_metadata_df[f'{input_value_set_label}_color'] = input_value_set[
        input_value_set['field'] == input_value_set_label
    ].loc[cell_metadata_df[input_value_set_label]]['color_hex_triplet'].values
    cell_metadata_df[f'{input_value_set_label}_order'] = input_value_set[
        input_value_set['field'] == input_value_set_label
    ].loc[cell_metadata_df[input_value_set_label]]['order'].values

Use our function to add the relevant color and order columns to our cell_metadata table.

# Add region of interest color and order
extract_value_set(macaque_cell_metadata, value_sets, 'region_of_interest_label')

Gene expression data#

Now we extract expression for specific genes from the h5ad files and pair it with our gene metadata. In general, we can use the function get_gene_data to extract the expression of specific genes for all the cells in a given dataset or a subset of cells. For more details on using this convenience function, see the Accessing 10x RNA-seq gene expression data tutorial notebook.

For this tutorial we will use precomputed tables of the expression for specific genes to compare expression by spatial location or by transcriptomic group.

To use the gene expression data, we first load the set of genes for the patch-seq data.

macaque_genes = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='gene',
).set_index('gene_identifier')
macaque_genes.head()

gene.csv: 100%|██████████| 3.33M/3.33M [00:00<00:00, 9.36MMB/s]

	gene_symbol	description	gene_id	molecular_type	synonym
gene_identifier
NCBIGene:712737	A1BG	alpha-1-B glycoprotein	712737	protein_coding	[]
NCBIGene:703806	A1CF	APOBEC1 complementation factor	703806	protein_coding	[]
NCBIGene:716616	A2ML1	alpha-2-macroglobulin like 1	716616	protein_coding	[]
NCBIGene:711045	A3GALT2	alpha 1,3-galactosyltransferase 2	711045	protein_coding	[]
NCBIGene:710998	A4GALT	alpha 1,4-galactosyltransferase	710998	protein_coding	[]

Below we list the genes we will use in this notebook and the example method used to load the expression for these specific genes from the h5ad file. More details on how to extract specific genes from the data see our accessing gene expression data tutorial

gene_names = ["WFS1", "CNR1", "CHRM3", "MYO16", "PTHLH", "DRD1", "DRD2", "TAC3", "SST"] 

macaque_gene_data = get_gene_data(
    abc_atlas_cache=abc_cache,
    all_cells=macaque_cell_metadata,
    all_genes=macaque_genes,
    selected_genes=gene_names,
    data_type='log2'
)
macaque_gene_data.head()

loading file: HMBA-Macaque-PatchSeq-BG

HMBA-Macaque-PatchSeq-BG-log2.h5ad: 100%|██████████| 63.8M/63.8M [00:02<00:00, 23.9MMB/s]  

 - time taken:  0.115348
total time taken: 0.905653
	total cells: 717 processed cells: 717

gene_symbol	CHRM3	CNR1	DRD1	DRD2	MYO16	PTHLH	SST	TAC3	WFS1
cell_label
QN22.26.002.14.05.04	10.811937	0.000000	0.000000	8.153246	4.933148	3.680369	6.044661	0.0	5.300721
QN22.26.002.14.05.05	2.088328	5.323032	6.759345	0.000000	0.000000	4.358629	0.000000	0.0	0.000000
QN22.26.002.14.05.06	7.193662	0.000000	0.000000	7.467582	4.027162	0.000000	0.000000	0.0	6.536435
QN22.26.002.14.05.08	7.406147	0.000000	0.000000	7.258967	6.859893	2.354068	0.000000	0.0	3.205592
QN22.26.002.14.05.09	9.808509	0.000000	8.067217	0.000000	0.000000	0.000000	0.000000	0.0	6.322538

Now that we’ve loaded our expression matrix tables, indexed by cell_label, we can merge them into our cell metadata tables.

macaque_cell_metadata_with_genes = macaque_cell_metadata.join(macaque_gene_data, on='cell_label')

Plotting heatmap expression data#

We define a helper functions aggregate_by_metadata to compute the average expression for a given category. Note that this is a simple wrapper of the Pandas GroupBy functionality. Other summary statistics beyond just the mean are listed here: https://pandas.pydata.org/docs/reference/groupby.html#dataframegroupby-computations-descriptive-stats

We also define a function to plot the resultant averaged data in a heatmap.

def aggregate_by_metadata(df: pd.DataFrame, gnames: List[str], value: str, sort: bool = False) -> pd.DataFrame:
    """Aggregate gene expression data by metadata.

    Parameters
    ----------
    df: pandas.DataFrame
        DataFrame containing gene expression and metadata.
    gnames: List[str]
        List of gene names to aggregate.
    value: str
        Metadata column to group by.
    sort: bool
        If True, sort the output by the first gene in gnames.

    Returns
    -------
    pandas.DataFrame
        DataFrame with aggregated gene expression values.
    """
    grouped = df.groupby(value)[gnames].mean()
    if sort:
        grouped = grouped.sort_values(by=gnames[0], ascending=False)
    if f'{value}_order' in df.columns:
        grouped = grouped.loc[df[value].unique()[np.argsort(df[value + '_order'].unique())]]
    return grouped


def plot_heatmap(
        df: pd.DataFrame,
        fig_width: int = 8,
        fig_height: int = 4,
        cmap: mpl.colormaps = plt.cm.magma,
        vmin: Optional[float] = None,
        vmax: Optional[float] = None,
        unit: str = '[log2(CPM + 1)]'
    ) -> Tuple[plt.Figure, plt.Axes]:
    """Plot a heatmap from a DataFrame.

    Parameters
    ----------
    df: pandas.DataFrame
        DataFrame to plot as a heatmap.
    fig_width: int
        Width of the figure (inches).
    fig_height: int
        Height of the figure (inches).
    cmap: matplotlib.colors.Colormap
        Colormap to use for the heatmap.

    Returns
    -------
    fig: matplotlib.pyplot.Figure
        Figure object containing the heatmap.
    ax: matplotlib.pyplot.Axes
        Axes object containing the heatmap.
    """

    arr = df.to_numpy().astype('float')

    if vmin is None:
        vmin = arr.min()
    if vmax is None:
        vmax = arr.max()
    norm = mpl.colors.Normalize(vmin=vmin, vmax=vmax)
    sm = plt.cm.ScalarMappable(cmap=cmap, norm=norm)
    cmap = sm.get_cmap()

    fig, ax = plt.subplots()
    fig.set_size_inches(fig_width, fig_height)

    ax.imshow(arr, cmap=cmap, aspect='auto', vmin=vmin, vmax=vmax)
    xlabs = df.columns.values
    ylabs = df.index.values

    ax.set_xticks(range(len(xlabs)))
    ax.set_xticklabels(xlabs)

    ax.set_yticks(range(len(ylabs)))
    ax.set_yticklabels(ylabs)

    cbar = fig.colorbar(sm, ax=ax, orientation='vertical', fraction=0.1, pad=0.01)
    cbar.set_label(unit)

    return fig, ax

Visualize gene expression for selected genes by ROI#

We plot the gene expression as a heat map showing the average expression as a function of region of interest (ROI) label. The full names for the region of interest acronyms can be found in the values_sets table we loaded earlier. Note that samples that were not precisely pinned anatomically were assigned to dorsal or ventral striatum based on the patch-clamp experimentalist’s original notes.

macaque_gene_names = ['DRD1', 'DRD2', 'PTHLH', 'SST', 'TAC3', 'CHRM3', 'CNR1', 'MYO16', 'WFS1']

feature_name = "region_of_interest_label"
agg = aggregate_by_metadata(macaque_cell_metadata_with_genes, macaque_gene_names, feature_name)
# We drop ic here as it has a very small number or cells.
plot_heatmap(agg.drop('ic'), 8, 8, vmin=0, vmax=7) # vmin and vmax set based on observed expression ranges
plt.title(f'Macaque - {feature_name}')
plt.show()

../_images/b80f14abe1e036f198a5bbb8a62e23113f11fa4112b0d85098a1fd783ee92841.png

Taxonomy mapping#

Finally, let’s load the taxonomy information so that we can plot gene expression versus various taxons in a heatmap for each species. Below, we load and merge all the information we need to plot cell colors and taxonomy as done in the previous HMBA tutorial.

cluster = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster',
    dtype={'number_of_cells': 'Int64'}
).rename(columns={'label': 'cluster_annotation_term_label'}).set_index('cluster_annotation_term_label')
cluster_annotation_term_set = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term_set'
).rename(columns={'label': 'cluster_annotation_term_label'})

cluster_annotation_term = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term',
).rename(columns={'label': 'cluster_annotation_term_label'}).set_index('cluster_annotation_term_label')

cluster_to_cluster_annotation_membership = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_to_cluster_annotation_membership',
).set_index('cluster_annotation_term_label')
membership_with_cluster_info = cluster_to_cluster_annotation_membership.join(
    cluster.reset_index().set_index('cluster_alias')[['number_of_cells']],
    on='cluster_alias'
)
membership_with_cluster_info = membership_with_cluster_info.join(cluster_annotation_term, rsuffix='_anno_term').reset_index()
membership_groupby = membership_with_cluster_info.groupby(
    ['cluster_alias', 'cluster_annotation_term_set_name']
)

cluster_annotation_term_set = abc_cache.get_metadata_dataframe(
    directory='HMBA-BG-taxonomy-CCN20250428',
    file_name='cluster_annotation_term_set'
).rename(columns={'label': 'cluster_annotation_term_label'})

# term_sets = abc_cache.get_metadata_dataframe(directory='WHB-taxonomy', file_name='cluster_annotation_term_set').set_index('label')
cluster_details = membership_groupby['cluster_annotation_term_name'].first().unstack()
cluster_order = membership_groupby['term_order'].first().unstack()
cluster_order.sort_values(['Neighborhood', 'Class', 'Subclass', 'Group', 'Cluster'], inplace=True)
cluster_order.rename(
    columns={'Neighborhood': 'Neighborhood_order',
             'Class': 'Class_order',
             'Subclass': 'Subclass_order',
             'Group': 'Group_order',
             'Cluster': 'Cluster_order'},
    inplace=True
)

cluster_colors = membership_groupby['color_hex_triplet'].first().unstack()
cluster_colors = cluster_colors[cluster_annotation_term_set['name']]
cluster_colors.sort_values(
    ['Neighborhood', 'Class', 'Subclass', 'Group', 'Cluster'],
    inplace=True
)

The macaque patch-seq gene expression has been mapped to the HMBA-BG taxonomy using MapMyCells, excluding non-neuronal nodes. We load this table below. A more lightweight version of this table with only mappings between cells to cluster_alias. The table is named cell_to_cluster_membership.

Below we load the complete MMC results with probabilities and correlation coefficients for all assignments at all levels.

macaque_mmc = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq',
    file_name='mmc_results',
).set_index('cell_label')
macaque_mmc.head()

mmc_results.csv: 100%|██████████| 201k/201k [00:00<00:00, 1.42MMB/s] 

	Neighborhood_label	Neighborhood_name	Neighborhood_bootstrapping_probability	Neighborhood_aggregate_probability	Neighborhood_correlation_coefficient	Class_label	Class_name	Class_bootstrapping_probability	Class_aggregate_probability	Class_correlation_coefficient	...	Group_label	Group_name	Group_bootstrapping_probability	Group_aggregate_probability	Group_correlation_coefficient	Cluster_label	cluster_alias	Cluster_bootstrapping_probability	Cluster_aggregate_probability	Cluster_correlation_coefficient
cell_label
QN22.26.002.14.05.04	CS20250428_NEIGH_0002	Subpallium GABA	1.0	1.0	0.7190	CS20250428_CLASS_0003	CN LGE GABA	1.0	1.0	0.7093	...	CS20250428_GROUP_0054	STRd D2 Striosome MSN	1.00	1.00	0.5715	CS20250428_CLUST_0986	Macaque-429	1.00	1.0000	0.5615
QN22.26.002.14.05.05	CS20250428_NEIGH_0002	Subpallium GABA	1.0	1.0	0.7634	CS20250428_CLASS_0003	CN LGE GABA	1.0	1.0	0.7466	...	CS20250428_GROUP_0049	STRd D1 Matrix MSN	1.00	1.00	0.6477	CS20250428_CLUST_0949	Macaque-467	0.99	0.9900	0.5517
QN22.26.002.14.05.06	CS20250428_NEIGH_0002	Subpallium GABA	1.0	1.0	0.6603	CS20250428_CLASS_0003	CN LGE GABA	1.0	1.0	0.6376	...	CS20250428_GROUP_0054	STRd D2 Striosome MSN	1.00	1.00	0.4564	CS20250428_CLUST_0986	Macaque-429	0.62	0.6200	0.4341
QN22.26.002.14.05.08	CS20250428_NEIGH_0002	Subpallium GABA	1.0	1.0	0.6162	CS20250428_CLASS_0003	CN LGE GABA	1.0	1.0	0.5980	...	CS20250428_GROUP_0052	STRd D2 Matrix MSN	0.88	0.88	0.4317	CS20250428_CLUST_0973	Macaque-465	1.00	0.8800	0.4515
QN22.26.002.14.05.09	CS20250428_NEIGH_0002	Subpallium GABA	1.0	1.0	0.6987	CS20250428_CLASS_0003	CN LGE GABA	1.0	1.0	0.6847	...	CS20250428_GROUP_0049	STRd D1 Matrix MSN	0.99	0.99	0.5025	CS20250428_CLUST_0949	Macaque-467	0.57	0.5643	0.3415

5 rows × 25 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(macaque_mmc)
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_details, on='cluster_alias')
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_colors, on='cluster_alias', rsuffix='_color')
macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(cluster_order, on='cluster_alias')

macaque_cell_metadata_with_genes.head()

	library_label	library_aliquot_label	barcoded_cell_sample_label	donor_label	dataset_label	feature_matrix_label	alignment_id	donor_age_value	donor_age_unit	age_at_death_reference_point	...	Neighborhood_color	Class_color	Subclass_color	Group_color	Cluster_color	Class_order	Cluster_order	Group_order	Neighborhood_order	Subclass_order
cell_label
QN22.26.002.14.05.04	L8S4_220222_04_H05	AB-S40302_S424_E1-50	P4S4_220131_155_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710839	15	years	birth	...	#19613b	#d0b83c	#253c8c	#ff9896	#561430	10	1290	55	3	32
QN22.26.002.14.05.05	L8S4_220222_04_A06	AB-S40302_S425_E1-50	P4S4_220131_156_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710837	15	years	birth	...	#19613b	#d0b83c	#1655f2	#1f77b4	#b83642	10	1059	50	3	31
QN22.26.002.14.05.06	L8S4_220222_04_B06	AB-S40302_S426_E1-50	P4S4_220131_157_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710836	15	years	birth	...	#19613b	#d0b83c	#253c8c	#ff9896	#561430	10	1290	55	3	32
QN22.26.002.14.05.08	L8S4_220222_04_C06	AB-S40302_S427_E1-50	P4S4_220131_158_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710835	15	years	birth	...	#19613b	#d0b83c	#253c8c	#aec7e8	#d71b39	10	1194	53	3	32
QN22.26.002.14.05.09	L8S4_220222_04_D06	AB-S40302_S428_E1-50	P4S4_220131_159_A01	QN22.26.002	HMBA-Macaque-PatchSeq	HMBA-Macaque-PatchSeq-BG	1200710834	15	years	birth	...	#19613b	#d0b83c	#1655f2	#1f77b4	#b83642	10	1059	50	3	31

5 rows × 78 columns

Visualize gene expression for selected genes by the HMBA-BG taxonomy#

Below we plot the gene expression for the cells as a function the Group level in the taxonomy they were associated with. We plot the MSNs cell-types only. We’ll also use a subset of the genes we extracted from the gene expression matrix.

macaque_gene_names = ['DRD1', 'DRD2', 'PTHLH', 'SST', 'TAC3', 'CHRM3', 'CNR1', 'MYO16', 'WFS1']

feature_name = "Group"
agg = aggregate_by_metadata(
    macaque_cell_metadata_with_genes,
    macaque_gene_names,
    feature_name
)


plot_heatmap(agg, 8, 8, vmin=0, vmax=7.5)
plt.title(f'Macaque - {feature_name}')
plt.show()

../_images/7477300dc4b16b6e2773126a78550661216dc5283a02e97913c7ea82ebf7fced.png

Other Features#

In addition to gene expression, we also look at electrophysiology and morphology features.

Electrophysiological features#

The electrophysiological features are across 4 tables: ap_features, chirp_features, cross_sweep_long_square_features, and sweep_features. In the below example, we load the ap_features table and merge it into our cell_metadata table.

A full list of the electrophysiological features with some short descriptions can be downloaded here

ap_features = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq', file_name='ap_features'
).set_index('cell_label')

# most cells have no measurement for these features so we filter those with values just to see what the data looks like. 
ap_features[pd.notna(ap_features['adp_v_last_rheo'])].head()

ap_features.csv: 100%|██████████| 496k/496k [00:00<00:00, 3.00MMB/s] 

	adp_v_last_rheo	ahp_delay_5spike	ahp_delay_hero	ahp_delay_ratio_5spike	ahp_delay_ratio_hero	downstroke_adapt_ratio	downstroke_hero	downstroke_ramp	downstroke_rheo	downstroke_short_square	...	width_adapt_ratio	width_hero	width_ramp	width_rheo	width_rheo_ms	width_short_square	width_suprathresh_hero	width_suprathresh_ramp	width_suprathresh_rheo	width_suprathresh_short_square
cell_label
Q21.26.003.12.03.01	-55.000004	0.021870	0.000980	0.191146	0.034789	0.879896	-267.810016	NaN	-256.109021	-277.295319	...	1.090909	0.000375	NaN	0.000385	0.385	0.000370	0.000345	NaN	0.000355	0.000335
Q21.26.006.14.01.04	-50.093750	0.043635	0.043635	0.237018	0.237018	0.879694	-87.819662	NaN	-88.143416	-99.554269	...	1.086124	0.001045	NaN	0.001055	1.055	0.000935	0.000945	NaN	0.000955	0.000850
Q21.26.009.12.01.03	-51.000004	0.155950	0.001390	0.483124	0.066763	0.900912	-179.379158	NaN	-169.956722	-191.220558	...	1.070707	0.000475	NaN	0.000485	0.485	0.000460	0.000430	NaN	0.000455	0.000415
Q21.26.009.12.02.03	-61.062504	0.030185	0.026850	0.120473	0.188131	0.789810	-72.779128	NaN	-69.267846	-80.596543	...	1.152610	0.001270	NaN	0.001305	1.305	0.001220	0.001175	NaN	0.001185	0.001125
Q21.26.009.12.01.04	-52.812500	0.037275	0.026275	0.250311	0.255842	0.982069	-73.422130	NaN	-71.413402	-79.772097	...	1.101770	0.001120	NaN	0.001140	1.140	0.001085	0.001045	NaN	0.001050	0.000990

5 rows × 57 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(ap_features)

Below, we create a function to create a box plot for a given feature by e.g. a taxonomy level, ROI structure, etc.

def plot_boxplot_by_column(
        dataframe, col_name, value_col
    ):
    """
    Plots a boxplot of the specified value column grouped by the specified column.

    Parameters
    ----------
    dataframe : pd.DataFrame
        The input dataframe containing the data to plot.
    col_name : str
        The name of the column to group by.
    value_col : str
        The name of the column containing the values to plot.
    """
    box_values = {}
    for group in dataframe[col_name].unique()[
            np.argsort(dataframe[f"{col_name}_order"].unique()) # Order by the specified order column
    ]:
        col_values = dataframe.loc[dataframe[col_name] == group, value_col]
        if pd.notna(col_values).sum() > 0:
            box_values[group] = col_values[~col_values.isna()].values # Use non-NA values for the boxplot

    plt.boxplot(
        box_values.values()
    )
    plt.xticks(
        ticks=np.arange(1, len(box_values) + 1),
        labels=box_values.keys(),
        rotation=90
    )
    plt.ylabel(value_col)
    plt.title(f'{value_col} by {col_name}')
    plt.show()

We’ll plot the ephys feature, width_rheo in a boxplot against the Groups in this dataset.

plot_boxplot_by_column(dataframe=macaque_cell_metadata_with_genes, col_name='Group', value_col='width_rheo')

../_images/a0926e597d265f8c1687facbce71b649c82c7d045ab8201c55dd621b7d198e01.png

Morphological Features#

We now load our morphological features and merge them into the cell metadata. A full list of the morphological features with short descriptions can be downloaded here.

morphology_features = abc_cache.get_metadata_dataframe(
    directory='HMBA-Macaque-PatchSeq', file_name='morphology_features'
).set_index('cell_label')
morphology_features.head()

morphology_features.csv: 100%|██████████| 97.4k/97.4k [00:00<00:00, 642kMB/s] 

	23_Sholl_PC0	23_Sholl_PC1	2_Sholl_PC0	2_Sholl_PC1	3_Sholl_PC0	3_Sholl_PC1	axon_bias_dorsal	axon_bias_medial	axon_exit_distance	axon_exit_theta_coronal	...	basal_dendrite_mean_diameter	basal_dendrite_num_branches	basal_dendrite_soma_percentile_dorsal	basal_dendrite_soma_percentile_medial	basal_dendrite_stem_exit_MedialLateral	basal_dendrite_stem_exit_dorsal	basal_dendrite_stem_exit_ventral	basal_dendrite_total_length	basal_dendrite_total_surface_area	soma_surface_area
cell_label
Q19.26.014.11.04.01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Q19.26.014.11.01.01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	...	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
Q19.26.014.11.01.04	NaN	NaN	NaN	NaN	168.820400	29.150249	NaN	NaN	NaN	0.667150	...	0.573221	56.0	0.468694	0.334369	0.5	0.125	0.375	5241.027500	9442.031612	548.486434
Q19.26.014.11.01.05	NaN	NaN	NaN	NaN	282.741349	-36.067186	NaN	NaN	NaN	0.875211	...	0.565709	69.0	0.489853	0.428372	0.4	0.200	0.400	5909.886270	10526.672804	359.438955
Q19.26.014.11.05.01	NaN	NaN	NaN	NaN	276.829831	-84.814559	NaN	NaN	NaN	0.447964	...	0.609324	77.0	0.576005	0.407282	0.8	0.200	0.000	5946.308781	11421.095519	502.026310

5 rows × 39 columns

macaque_cell_metadata_with_genes = macaque_cell_metadata_with_genes.join(morphology_features)

Finally, we plot the soma surface area feature against the taxonomy Group.

plot_boxplot_by_column(dataframe=macaque_cell_metadata_with_genes, col_name='Group', value_col='soma_surface_area')

../_images/04263ddc8191f30ebbd97707c0fcc1109e1ab90ad0705d25e33b6882ecc8affb.png

To learn more about the spatial transcriptomic data see the HMBA-BG Spatial Slabs and Taxonomy notebook.

To learn more about the taxonomy used in this notebook and the data it is derived from, see the HMBA-BG Clustering Analysis and Taxonomy notebook.