Streaming an NWB File with fsspec#

As you might have realized, NWB files are large. They take a lot of time to download and a lot of space on your drive. A convenient tool to mitigate this is fsspec. Fsspec allows you to stream the information from a file remotely without having to download it. This can be more efficient if you are only wanting to quickly examine a file or just need access to a portion of the file’s contents. For more exensive analysis, it is still recommended that you download the file.

Environment Setup#

⚠️Note: If running on a new environment, run this cell once and then restart the kernel⚠️

try:
    from databook_utils.dandi_utils import dandi_stream_open
except:
    !git clone https://github.com/AllenInstitute/openscope_databook.git
    %cd openscope_databook
    %pip install -e .
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\_distributor_init.py:30: UserWarning: loaded more than 1 DLL from .libs:
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\numpy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
  warnings.warn("loaded more than 1 DLL from .libs:"
import remfile
import h5py

from dandi import dandiapi
from nwbwidgets import nwb2widget
from pynwb import NWBHDF5IO

Streaming Configuration#

Here you can configure the stream. Browse the DANDI Archive for a dandiset you’re interested in and use its ID in dandiset_id. Set dandi_filepath to the path of the file you want to download within the dandiset. You can get this by navigating to the file you want to download on the DANDI Archive website and pressing on the i icon. There, you can copy the filepath from the field labeled path. Don’t include a leading /.

If you’re accessing an embargoed dandiset, you should set authenticate to True, and set dandi_api_key to your DANDI API key, which can be found if you click on your profile icon in the top-right corner on the DANDI Archive website.

dandiset_id = "000871"
dandi_filepath = "sub-644972/sub-644972_ses-1237081845-acq-1237345890-denoised-movies_image+ophys.nwb"
authenticate = False
dandi_api_key = ""
if authenticate:
    client = dandiapi.DandiAPIClient(token=dandi_api_key)
else:
    client = dandiapi.DandiAPIClient()
my_dandiset = client.get_dandiset(dandiset_id)

print(f"Got dandiset {my_dandiset}")
A newer version (0.59.0) of dandi/dandi-cli is available. You are using 0.55.1
Got dandiset DANDI:000871/draft
file = my_dandiset.get_asset_by_path(dandi_filepath)
base_url = file.client.session.head(file.base_download_url)
file_url = base_url.headers['Location']

print(f"Retrieved file url {file_url}")
Retrieved file url https://dandiarchive.s3.amazonaws.com/blobs/fe1/358/fe135898-cfa7-4243-b927-e6964c31afee?response-content-disposition=attachment%3B%20filename%3D%22sub-644972_ses-1237081845-acq-1237345890-denoised-movies_image%2Bophys.nwb%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUBRWC5GAEKH3223E%2F20240129%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20240129T214448Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=be815ce352064b62b62804204941b21ecfc96133ea8dc74a8cb4ea86354a98ad

Streaming a File#

Streaming with remfile is as easy as creating a remote file object from the url, and then opening it through the h5py and pywnb libraries.

rem_file = remfile.File(file_url)
h5py_file = h5py.File(rem_file, "r")
io = NWBHDF5IO(file=h5py_file, mode="r", load_namespaces=True)
nwb = io.read()
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\hdmf\spec\namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-common' version 1.6.0 because version 1.8.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\hdmf\spec\namespace.py:531: UserWarning: Ignoring cached namespace 'core' version 2.6.0-alpha because version 2.5.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."
c:\Users\carter.peene\AppData\Local\Programs\Python\Python39\lib\site-packages\hdmf\spec\namespace.py:531: UserWarning: Ignoring cached namespace 'hdmf-experimental' version 0.3.0 because version 0.5.0 is already loaded.
  warn("Ignoring cached namespace '%s' version %s because version %s is already loaded."

Interacting with a Remote File#

Once the file has been opened remotely, you can explore the file as you wish via print statements, or you can view the whole thing with NWBWidgets like we showed in Exploring an NWB File.

### uncomment these to view aspects of the file
### not all of these exist for all NWB files (Key Errors will arise if the fields don't exist for this file)

# nwb.identifier
# nwb.processing
# nwb.acquisition["events"]
# nwb.intervals["trials"]
# nwb.stimulus["StimulusPresentation"]
# nwb.electrodes
nwb2widget(nwb)

Using Databook Utils Function#

Throughout the remainder of the OpenScope Databook, whenever a file is streamed we reuse this code in the form of a local package, databook_utils. To retrieve an NWB file you can use the method dandi_stream_open after importing it like shown at the top of this notebook.

io = dandi_stream_open(dandiset_id, dandi_filepath)
nwb = io.read()
print(nwb)
root pynwb.file.NWBFile at 0x2035051275888
Fields:
  acquisition: {
    EyeTracking <class 'abc.EllipseEyeTracking'>,
    denoised_suite2p_motion_corrected <class 'pynwb.ophys.TwoPhotonSeries'>,
    raw_suite2p_motion_corrected <class 'pynwb.ophys.TwoPhotonSeries'>,
    v_in <class 'pynwb.base.TimeSeries'>,
    v_sig <class 'pynwb.base.TimeSeries'>
  }
  devices: {
    MESO.2 <class 'pynwb.device.Device'>
  }
  experiment_description: ophys session
  file_create_date: [datetime.datetime(2024, 1, 21, 18, 31, 22, 932627, tzinfo=tzutc())]
  identifier: 1237345890
  imaging_planes: {
    imaging_plane_1 <class 'pynwb.ophys.ImagingPlane'>
  }
  institution: Allen Institute for Brain Science
  intervals: {
    fixed_gabors_presentations <class 'pynwb.epoch.TimeIntervals'>,
    gratings_presentations <class 'pynwb.epoch.TimeIntervals'>,
    movie_flower_fwd_presentations <class 'pynwb.epoch.TimeIntervals'>,
    movie_touch_of_evil_fwd_presentations <class 'pynwb.epoch.TimeIntervals'>,
    movie_worms_fwd_presentations <class 'pynwb.epoch.TimeIntervals'>,
    rotate_gabors_presentations <class 'pynwb.epoch.TimeIntervals'>,
    spontaneous_presentations <class 'pynwb.epoch.TimeIntervals'>
  }
  keywords: <StrDataset for HDF5 dataset "keywords": shape (5,), type "|O">
  lab_meta_data: {
    metadata <class 'abc.OphysMetadata'>
  }
  processing: {
    ophys <class 'pynwb.base.ProcessingModule'>,
    running <class 'pynwb.base.ProcessingModule'>,
    stimulus <class 'pynwb.base.ProcessingModule'>,
    stimulus_ophys <class 'pynwb.base.ProcessingModule'>
  }
  session_description: Ophys Session
  session_start_time: 2023-01-03 17:03:30.439000+00:00
  stimulus_template: {
    flower_fwd <class 'pynwb.image.ImageSeries'>,
    touch_of_evil_fwd <class 'pynwb.image.ImageSeries'>,
    worms_fwd <class 'pynwb.image.ImageSeries'>
  }
  subject: subject pynwb.file.Subject at 0x2035082136544
Fields:
  age: P154.0D
  description: external: 644972 donor_id: (1214404967,) specimen_id: ['644972']
  genotype: Rbp4-Cre_KL100/wt;Camk2a-tTA/wt;Ai93(TITL-GCaMP6f)/wt
  sex: M
  species: Mus musculus
  subject_id: 644972

  surgery:  Structure: VISp
  timestamps_reference_time: 2023-01-03 17:03:30.439000+00:00