{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "IPython magic command to render matplotlib plots." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# MERFISH whole brain spatial transcriptomics (part 2a)\n", "\n", "In part 1, we explored two examples looking at the expression of canonical neurotransmitter transporter genes and gene Tac2 in the one coronal section. In this notebook, we will prepare data so that we can repeat the examples for all cells spanning the whole brain. This notebook takes ~10 seconds to run.\n", "\n", "The results from this notebook has already been cached and saved. As such, if needed you can skip this notebook and continue with part 2b.\n", "\n", "You need to be connected to the internet to run this notebook and have run through the [getting started notebook](https://alleninstitute.github.io/abc_atlas_access/notebooks/getting_started.html)." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from pathlib import Path\n", "import numpy as np\n", "import anndata\n", "import time\n", "\n", "from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will interact with the data using the **AbcProjectCache**. This cache object tracks which data has been downloaded and serves the path to the requsted data on disk. For metadata, the cache can also directly serve a up a Pandas Dataframe. See the ``getting_started`` notebook for more details on using the cache including installing it if it has not already been.\n", "\n", "**Change the download_base variable to where you have downloaded the data in your system.**" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'releases/20240831/manifest.json'" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "download_base = Path('../../data/abc_atlas')\n", "abc_cache = AbcProjectCache.from_cache_dir(download_base)\n", "\n", "abc_cache.current_manifest" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3938808\n" ] } ], "source": [ "cell = abc_cache.get_metadata_dataframe(\n", " directory='MERFISH-C57BL6J-638850',\n", " file_name='cell_metadata',\n", " dtype={'cell_label': str}\n", ")\n", "cell.set_index('cell_label', inplace=True)\n", "print(len(cell))" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/Users/chris.morrison/src/data/abc_atlas/expression_matrices/MERFISH-C57BL6J-638850/20230830/C57BL6J-638850-log2.h5ad\n" ] } ], "source": [ "file = abc_cache.get_data_path(\n", " directory='MERFISH-C57BL6J-638850',\n", " file_name='C57BL6J-638850/log2'\n", ")\n", "print(file)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "adata = anndata.read_h5ad(file, backed='r')\n", "gene = adata.var" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
gene_symboltranscript_identifier
gene_identifier
ENSMUSG00000030500Slc17a6ENSMUST00000032710
ENSMUSG00000037771Slc32a1ENSMUST00000045738
ENSMUSG00000025400Tac2ENSMUST00000026466
ENSMUSG00000039728Slc6a5ENSMUST00000056442
ENSMUSG00000070570Slc17a7ENSMUST00000085374
ENSMUSG00000019935Slc17a8ENSMUST00000020102
ENSMUSG00000021609Slc6a3ENSMUST00000022100
ENSMUSG00000020838Slc6a4ENSMUST00000021195
\n", "
" ], "text/plain": [ " gene_symbol transcript_identifier\n", "gene_identifier \n", "ENSMUSG00000030500 Slc17a6 ENSMUST00000032710\n", "ENSMUSG00000037771 Slc32a1 ENSMUST00000045738\n", "ENSMUSG00000025400 Tac2 ENSMUST00000026466\n", "ENSMUSG00000039728 Slc6a5 ENSMUST00000056442\n", "ENSMUSG00000070570 Slc17a7 ENSMUST00000085374\n", "ENSMUSG00000019935 Slc17a8 ENSMUST00000020102\n", "ENSMUSG00000021609 Slc6a3 ENSMUST00000022100\n", "ENSMUSG00000020838 Slc6a4 ENSMUST00000021195" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ntgenes = ['Slc17a7', 'Slc17a6', 'Slc17a8', 'Slc32a1', 'Slc6a5', 'Slc18a3', 'Slc6a3', 'Slc6a4', 'Slc6a2']\n", "exgenes = ['Tac2']\n", "gnames = ntgenes + exgenes\n", "pred = [x in gnames for x in gene.gene_symbol]\n", "gene_filtered = gene[pred]\n", "gene_filtered" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "time taken: 7.690492000000001\n" ] } ], "source": [ "start = time.process_time()\n", "gdata = adata[:, gene_filtered.index].to_df()\n", "print(\"time taken: \", time.process_time() - start)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4334174\n" ] } ], "source": [ "# change columns from index to gene symbol\n", "gdata.columns = gene_filtered.gene_symbol\n", "pred = pd.notna(gdata[gdata.columns[0]])\n", "gdata = gdata[pred].copy(deep=True)\n", "print(len(gdata))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Close h5ad file and clean up" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "adata.file.close()\n", "del adata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 4 }