Cell Types¶
The Allen Cell Types data set is a database of mouse and human neuronal cell types based on multimodal characterization of single cells to enable data-driven approaches to classification and is fully integrated with other Allen Brain Atlas resources. The database currently includes:
- electrophysiology: whole cell current clamp recordings made from Cre-positive neurons
- morphology: 3D bright-field images of the complete structure of neurons from the visual cortex
This page describes how the SDK can be used to access data in the Cell Types Database. For more information, please visit the Cell Types Database home page and the API documentation.
Cell Types Cache¶
The CellTypesCache
class provides a Python interface for downloading data
in the Allen Cell Types Database into well known locations so that you don’t have to think
about file names and directories. The following example demonstrates how to download meta data for
all cells with 3D reconstructions, then download the reconstruction and electrophysiology recordings
for one of those cells:
from allensdk.core.cell_types_cache import CellTypesCache
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')
# a list of cell metadata for cells with reconstructions, download if necessary
cells = ctc.get_cells(require_reconstruction=True)
# open the electrophysiology data of one cell, download if necessary
data_set = ctc.get_ephys_data(cells[0]['id'])
# read the reconstruction, download if necessary
reconstruction = ctc.get_reconstruction(cells[0]['id'])
CellTypesCache
takes takes care of knowing if you’ve already downloaded some files and reads
them from disk instead of downloading them again. All data is stored in the same directory as the manifest_file argument to the constructor.
Feature Extraction¶
The EphysFeatureExtractor
class calculates electrophysiology
features from cell recordings. extract_cell_features()
can
be used to extract the precise feature values available in the Cell Types Database:
from allensdk.core.cell_types_cache import CellTypesCache
from allensdk.ephys.extract_cell_features import extract_cell_features
from collections import defaultdict
# initialize the cache
ctc = CellTypesCache(manifest_file='cell_types/manifest.json')
# pick a cell to analyze
specimen_id = 324257146
# download the ephys data and sweep metadata
data_set = ctc.get_ephys_data(specimen_id)
sweeps = ctc.get_ephys_sweeps(specimen_id)
# group the sweeps by stimulus
sweep_numbers = defaultdict(list)
for sweep in sweeps:
sweep_numbers[sweep['stimulus_name']].append(sweep['sweep_number'])
# calculate features
cell_features = extract_cell_features(data_set,
sweep_numbers['Ramp'],
sweep_numbers['Short Square'],
sweep_numbers['Long Square'])
File Formats¶
This section provides a short description of the file formats used for Allen Cell Types data.
Morphology SWC Files¶
Morphological neuron reconstructions are available for download as SWC files. The SWC file format is a white-space delimited text file with a standard set of headers. The file lists a set of 3D neuronal compartments, each of which has:
Column | Data Type | Description |
---|---|---|
id | string | compartment ID |
type | integer | compartment type |
x | float | 3D compartment position (x) |
y | float | 3D compartment position (y) |
z | float | 3D compartment position (z) |
radius | float | compartment radius |
parent | string | parent compartment ID |
Comment lines begin with a ‘#’. Reconstructions in the Allen Cell Types Database can contain the following compartment types:
Type | Description |
---|---|
0 | unknown |
1 | soma |
2 | axon |
3 | basal dendrite |
4 | apical dendrite |
The Allen SDK comes with a swc
Python module that provides helper functions and classes for manipulating SWC files. Consider the following example:
import allensdk.core.swc as swc
# if you ran the examples above, you will have a reconstruction here
file_name = 'cell_types/specimen_485909730/reconstruction.swc'
morphology = swc.read_swc(file_name)
# subsample the morphology 3x. root, soma, junctions, and the first child of the root are preserved.
sparse_morphology = morphology.sparsify(3)
# compartments in the order that they were specified in the file
compartment_list = sparse_morphology.compartment_list
# a dictionary of compartments indexed by compartment id
compartments_by_id = sparse_morphology.compartment_index
# the root soma compartment
soma = morphology.soma
# all compartments are dictionaries of compartment properties
# compartments also keep track of ids of their children
for child in morphology.children_of(soma):
print(child['x'], child['y'], child['z'], child['radius'])
Neurodata Without Borders¶
The electrophysiology data collected in the Allen Cell Types Database is stored in the Neurodata Without Borders (NWB) file format. This format, created as part of the NWB initiative, is designed to store a variety of neurophysiology data, including data from intra- and extracellular electrophysiology experiments, optophysiology experiments, as well as tracking and stimulus data. It has a defined schema and metadata labeling system designed so software tools can easily access contained data.
The Allen SDK provides a basic Python class for extracting data from Allen Cell Types Database NWB files. These files store data from intracellular patch-clamp recordings. A stimulus current is presented to the cell and the cell’s voltage response is recorded. The file stores both stimulus and response for several experimental trials, here called “sweeps.” The following code snippet demonstrates how to extract a sweep’s stimulus, response, sampling rate, and estimated spike times:
from allensdk.core.nwb_data_set import NwbDataSet
# if you ran the examples above, you will have a NWB file here
file_name = 'cell_types/specimen_485909730/ephys.nwb'
data_set = NwbDataSet(file_name)
sweep_numbers = data_set.get_sweep_numbers()
sweep_number = sweep_numbers[0]
sweep_data = data_set.get_sweep(sweep_number)
# spike times are in seconds relative to the start of the sweep
spike_times = data_set.get_spike_times(sweep_number)
# stimulus is a numpy array in amps
stimulus = sweep_data['stimulus']
# response is a numpy array in volts
reponse = sweep_data['response']
# sampling rate is in Hz
sampling_rate = sweep_data['sampling_rate']
# start/stop indices that exclude the experimental test pulse (if applicable)
index_range = sweep_data['index_range']
HDF5 Overview¶
NWB is implemented in HDF5. HDF5 files provide a hierarchical data storage that mirrors the organization of a file system. Just as a file system has directories and files, and HDF5 file has groups and datasets. The best way to understand an HDF5 (and NWB) file is to open a data file in an HDF5 browser. HDFView is the recommended browser from the makers of HDF5.
There are HDF5 manipulation libraries for many languages and platorms. MATLAB and Python in particular have strong HDF5 support.