MERFISH whole brain spatial transcriptomics (part 2a)

MERFISH whole brain spatial transcriptomics (part 2a)#

In part 1, we explored two examples looking at the expression of canonical neurotransmitter transporter genes and gene Tac2 in the one coronal section. In this notebook, we will prepare data so that we can repeat the examples for all cells spanning the whole brain. This notebook takes ~10 seconds to run.

The results from this notebook has already been cached and saved. As such, if needed you can skip this notebook and continue with part 2b.

You need to be connected to the internet to run this notebook and have run through the getting started notebook.

import pandas as pd
from pathlib import Path
import numpy as np
import anndata
import time

from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache

We will interact with the data using the AbcProjectCache. This cache object tracks which data has been downloaded and serves the path to the requsted data on disk. For metadata, the cache can also directly serve a up a Pandas Dataframe. See the getting_started notebook for more details on using the cache including installing it if it has not already been.

Change the download_base variable to where you have downloaded the data in your system.

download_base = Path('../../data/abc_atlas')
abc_cache = AbcProjectCache.from_cache_dir(download_base)

abc_cache.current_manifest

'releases/20241130/manifest.json'

cell = abc_cache.get_metadata_dataframe(
    directory='MERFISH-C57BL6J-638850',
    file_name='cell_metadata',
    dtype={'cell_label': str}
)
cell.set_index('cell_label', inplace=True)
print(len(cell))

file = abc_cache.get_data_path(
    directory='MERFISH-C57BL6J-638850',
    file_name='C57BL6J-638850/log2'
)
print(file)

/Users/chris.morrison/src/data/abc_atlas/expression_matrices/MERFISH-C57BL6J-638850/20230830/C57BL6J-638850-log2.h5ad

adata = anndata.read_h5ad(file, backed='r')
gene = adata.var

ntgenes = ['Slc17a7', 'Slc17a6', 'Slc17a8', 'Slc32a1', 'Slc6a5', 'Slc18a3', 'Slc6a3', 'Slc6a4', 'Slc6a2']
exgenes = ['Tac2']
gnames = ntgenes + exgenes
pred = [x in gnames for x in gene.gene_symbol]
gene_filtered = gene[pred]
gene_filtered

	gene_symbol	transcript_identifier
gene_identifier
ENSMUSG00000030500	Slc17a6	ENSMUST00000032710
ENSMUSG00000037771	Slc32a1	ENSMUST00000045738
ENSMUSG00000025400	Tac2	ENSMUST00000026466
ENSMUSG00000039728	Slc6a5	ENSMUST00000056442
ENSMUSG00000070570	Slc17a7	ENSMUST00000085374
ENSMUSG00000019935	Slc17a8	ENSMUST00000020102
ENSMUSG00000021609	Slc6a3	ENSMUST00000022100
ENSMUSG00000020838	Slc6a4	ENSMUST00000021195

start = time.process_time()
gdata = adata[:, gene_filtered.index].to_df()
print("time taken: ", time.process_time() - start)

time taken:  7.865238999999999

# change columns from index to gene symbol
gdata.columns = gene_filtered.gene_symbol
pred = pd.notna(gdata[gdata.columns[0]])
gdata = gdata[pred].copy(deep=True)
print(len(gdata))

Close h5ad file and clean up

adata.file.close()
del adata