{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MERFISH whole brain spatial transcriptomics (part 2a)\n",
    "\n",
    "In part 1, we explored two examples looking at the expression of canonical neurotransmitter transporter genes and gene Tac2 in the one coronal section. In this notebook, we will prepare data so that we can repeat the examples for all cells spanning the whole brain. This notebook takes ~10 seconds to run.\n",
    "\n",
    "The results from this notebook has already been cached and saved. As such, if needed you can skip this notebook and continue with part 2b.\n",
    "\n",
    "You need to be connected to the internet to run this notebook and have run through the [getting started notebook](https://alleninstitute.github.io/abc_atlas_access/notebooks/getting_started.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from pathlib import Path\n",
    "import numpy as np\n",
    "import anndata\n",
    "import time\n",
    "\n",
    "from abc_atlas_access.abc_atlas_cache.abc_project_cache import AbcProjectCache"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will interact with the data using the **AbcProjectCache**. This cache object tracks which data has been downloaded and serves the path to the requsted data on disk. For metadata, the cache can also directly serve a up a Pandas Dataframe. See the ``getting_started`` notebook for more details on using the cache including installing it if it has not already been.\n",
    "\n",
    "**Change the download_base variable to where you have downloaded the data in your system.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'releases/20241115/manifest.json'"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "download_base = Path('../../data/abc_atlas')\n",
    "abc_cache = AbcProjectCache.from_cache_dir(download_base)\n",
    "\n",
    "abc_cache.current_manifest"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "3938808\n"
     ]
    }
   ],
   "source": [
    "cell = abc_cache.get_metadata_dataframe(\n",
    "    directory='MERFISH-C57BL6J-638850',\n",
    "    file_name='cell_metadata',\n",
    "    dtype={'cell_label': str}\n",
    ")\n",
    "cell.set_index('cell_label', inplace=True)\n",
    "print(len(cell))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "/allen/scratch/aibstemp/chris.morrison/src/data/abc_atlas/expression_matrices/MERFISH-C57BL6J-638850/20230830/C57BL6J-638850-log2.h5ad\n"
     ]
    }
   ],
   "source": [
    "file = abc_cache.get_data_path(\n",
    "    directory='MERFISH-C57BL6J-638850',\n",
    "    file_name='C57BL6J-638850/log2'\n",
    ")\n",
    "print(file)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata = anndata.read_h5ad(file, backed='r')\n",
    "gene = adata.var"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>gene_symbol</th>\n",
       "      <th>transcript_identifier</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>gene_identifier</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000030500</th>\n",
       "      <td>Slc17a6</td>\n",
       "      <td>ENSMUST00000032710</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000037771</th>\n",
       "      <td>Slc32a1</td>\n",
       "      <td>ENSMUST00000045738</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000025400</th>\n",
       "      <td>Tac2</td>\n",
       "      <td>ENSMUST00000026466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000039728</th>\n",
       "      <td>Slc6a5</td>\n",
       "      <td>ENSMUST00000056442</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000070570</th>\n",
       "      <td>Slc17a7</td>\n",
       "      <td>ENSMUST00000085374</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000019935</th>\n",
       "      <td>Slc17a8</td>\n",
       "      <td>ENSMUST00000020102</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000021609</th>\n",
       "      <td>Slc6a3</td>\n",
       "      <td>ENSMUST00000022100</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>ENSMUSG00000020838</th>\n",
       "      <td>Slc6a4</td>\n",
       "      <td>ENSMUST00000021195</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   gene_symbol transcript_identifier\n",
       "gene_identifier                                     \n",
       "ENSMUSG00000030500     Slc17a6    ENSMUST00000032710\n",
       "ENSMUSG00000037771     Slc32a1    ENSMUST00000045738\n",
       "ENSMUSG00000025400        Tac2    ENSMUST00000026466\n",
       "ENSMUSG00000039728      Slc6a5    ENSMUST00000056442\n",
       "ENSMUSG00000070570     Slc17a7    ENSMUST00000085374\n",
       "ENSMUSG00000019935     Slc17a8    ENSMUST00000020102\n",
       "ENSMUSG00000021609      Slc6a3    ENSMUST00000022100\n",
       "ENSMUSG00000020838      Slc6a4    ENSMUST00000021195"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ntgenes = ['Slc17a7', 'Slc17a6', 'Slc17a8', 'Slc32a1', 'Slc6a5', 'Slc18a3', 'Slc6a3', 'Slc6a4', 'Slc6a2']\n",
    "exgenes = ['Tac2']\n",
    "gnames = ntgenes + exgenes\n",
    "pred = [x in gnames for x in gene.gene_symbol]\n",
    "gene_filtered = gene[pred]\n",
    "gene_filtered"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "time taken:  6.8215717090000005\n"
     ]
    }
   ],
   "source": [
    "start = time.process_time()\n",
    "gdata = adata[:, gene_filtered.index].to_df()\n",
    "print(\"time taken: \", time.process_time() - start)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "4334174\n"
     ]
    }
   ],
   "source": [
    "# change columns from index to gene symbol\n",
    "gdata.columns = gene_filtered.gene_symbol\n",
    "pred = pd.notna(gdata[gdata.columns[0]])\n",
    "gdata = gdata[pred].copy(deep=True)\n",
    "print(len(gdata))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Close h5ad file and clean up"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "adata.file.close()\n",
    "del adata"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}