API Access¶
The allensdk.api
package is designed to help retrieve data from the
Allen Brain Atlas API. api
contains methods to help formulate API queries and parse the returned results. There are several
pre-made subclasses available that provide pre-made queries specific to certain data sets. Currently there
are several subclasses in Allen SDK:
CellTypesApi
: data related to the Allen Cell Types DatabaseBiophysicalApi
: data related to biophysical modelsGlifApi
: data related to GLIF modelsAnnotatedSectionDataSetsApi
: search for experiments by intensity, density, pattern, and ageGridDataApi
: used to download 3-D expression grid dataImageDownloadApi
: download whole or partial two-dimensional imagesMouseConnectivityApi
: common operations for accessing the Allen Mouse Brain Connectivity AtlasOntologiesApi
: data about neuroanatomical regions of interestConnectedServices
: schema of Allen Institute Informatics Pipeline services available through the RmaApiRmaApi
: general-purpose HTTP interface to the Allen Institute API data model and servicesSvgApi
: annotations associated with images as scalable vector graphics (SVG)SynchronizationApi
: data about image alignmentTreeSearchApi
: list ancestors or descendents of structure and specimen trees
RMA Database and Service API¶
One API subclass is the RmaApi
class.
It is intended to simplify
constructing an RMA query.
The RmaApi is a base class for much of the allensdk.api.queries package, but it may be used directly to customize queries or to build queries from scratch.
Often a query will simply request a table of data of one type:
from allensdk.api.queries.rma_api import RmaApi
rma = RmaApi()
data = rma.model_query('Atlas',
criteria="[name$il'*Mouse*']")
This will construct the RMA query url, make the query and parse the resulting JSON into an array of Python dicts with the names, ids and other information about the atlases that can be accessed via the API.
Using the criteria, include and other parameter, specific data can be requested.
associations = ''.join(['[id$eq1]',
'structure_graph(ontology),',
'graphic_group_labels'])
atlas_data = rma.model_query('Atlas',
include=associations,
criteria=associations,
only=['atlases.id',
'atlases.name',
'atlases.image_type',
'ontologies.id',
'ontologies.name',
'structure_graphs.id',
'structure_graphs.name',
'graphic_group_labels.id',
'graphic_group_labels.name'])
Note that a ‘class’ name is used for the first parameter. ‘Association’ names are used to construct the include and criteria parameters nested using parentheses and commas. In the only clause, the ‘table’ form is used, which is generally a plural lower-case version of the class name. The only clause selects specific ‘fields’ to be returned. The schema that includes the classes, fields, associations and tables can be accessed in JSON form using:
# http://api.brain-map.org/api/v2/data.json
schema = rma.get_schema()
for entry in schema:
data_description = entry['DataDescription']
clz = data_description.keys()[0]
info = data_description.values()[0]
fields = info['fields']
associations = info['associations']
table = info['table']
print("class: %s" % (clz))
print("fields: %s" % (','.join(f['name'] for f in fields)))
print("associations: %s" % (','.join(a['name'] for a in associations)))
print("table: %s\n" % (table))
Using Pandas to Process Query Results¶
When it is difficult to get data in exactly the required form using only an RMA query, it may be helpful to perform additional operations on the client side. The pandas library can be useful for this.
Data from the API can be read directly into a pandas Dataframe object.
import pandas as pd
structures = pd.DataFrame(
rma.model_query('Structure',
criteria='[graph_id$eq1]',
num_rows='all'))
Indexing subsets of the data (certain columns, certain rows) is one use of pandas: specifically .loc:
names_and_acronyms = structures.loc[:,['name', 'acronym']]
and Boolean indexing
mea = structures[structures.acronym == 'MEA']
mea_id = mea.iloc[0,:].id
mea_children = structures[structures.parent_structure_id == mea_id]
print(mea_children['name'])
Concatenate, merge and join are used to add columns or rows:
When an RMA call contains an include clause, the associated data will be represented as a python dict in a single column. The column may be converted to a proper Dataframe and optionally dropped.
criteria_string = "structure_sets[name$eq'Mouse Connectivity - Summary']"
include_string = "ontology"
summary_structures = \
pd.DataFrame(
rma.model_query('Structure',
criteria=criteria_string,
include=include_string,
num_rows='all'))
ontologies = \
pd.DataFrame(
list(summary_structures.ontology)).drop_duplicates()
flat_structures_dataframe = summary_structures.drop(['ontology'], axis=1)
Alternatively, it can be accessed using normal python dict and list operations.
print(summary_structures.ontology[0]['name'])
Pandas Dataframes can be written to a CSV file using to_csv and read using load_csv.
summary_structures[['id',
'parent_structure_id',
'acronym']].to_csv('summary_structures.csv',
index_label='structure_id')
reread = pd.DataFrame.from_csv('summary_structures.csv')
Iteration over a Dataframe of API data can be done in several ways. The .itertuples method is one way to do it.
for id, name, parent_structure_id in summary_structures[['name',
'parent_structure_id']].itertuples():
print("%d %s %d" % (id, name, parent_structure_id))
Caching Queries on Disk¶
wrap()
has several parameters for querying the API,
saving the results as CSV or JSON and reading the results as a pandas dataframe.
from allensdk.api.cache import Cache
cache_writer = Cache()
do_cache=True
structures_from_api = \
cache_writer.wrap(rma.model_query,
path='summary.csv',
cache=do_cache,
model='Structure',
criteria='[graph_id$eq1]',
num_rows='all')
If you change to_cache to False and run the code again it will read the data from disk rather than executing the query.