As you might have realized, NWB files are large. They take a lot of time to download and a lot of space on your drive. A convenient tool to mitigate this is remfile. remfile allows you to stream the information from a file remotely without having to download it. This can be more efficient if you are only wanting to quickly examine a file or just need access to a portion of the file’s contents. For more extensive analysis, it is still recommended that you download the file.
Environment Setup¶
⚠️Note: If running on a new environment, run this cell once and then restart the kernel⚠️
import warnings
warnings.filterwarnings('ignore')
try:
from databook_utils.dandi_utils import dandi_stream_open
except:
!git clone https://github.com/AllenInstitute/openscope_databook.git
%cd openscope_databook
%pip install -e .import remfile
import h5py
from dandi import dandiapi
from pynwb import NWBHDF5IOStreaming Configuration¶
Here you can configure the stream. Browse the DANDI Archive for a dandiset you’re interested in and use its ID in dandiset_id. Set dandi_filepath to the path of the file you want to download within the dandiset. You can get this by navigating to the file you want to download on the DANDI Archive website and pressing on the i icon. There, you can copy the filepath from the field labeled path. Don’t include a leading /.
If you’re accessing an embargoed dandiset, you should set authenticate to True, and set dandi_api_key to your DANDI API key, which can be found if you click on your profile icon in the top-right corner on the DANDI Archive website.
dandiset_id = "000871"
dandi_filepath = "sub-644972/sub-644972_ses-1237081845-acq-1237345890-denoised-movies_image+ophys.nwb"
authenticate = False
dandi_api_key = ""if authenticate:
client = dandiapi.DandiAPIClient(token=dandi_api_key)
else:
client = dandiapi.DandiAPIClient()
my_dandiset = client.get_dandiset(dandiset_id)
print(f"Got dandiset {my_dandiset}")A newer version (0.75.1) of dandi/dandi-cli is available. You are using 0.74.3
Got dandiset DANDI:000871/draft
file = my_dandiset.get_asset_by_path(dandi_filepath)
base_url = file.client.session.head(file.base_download_url)
file_url = base_url.headers['Location']
print(f"Retrieved file url {file_url}")Retrieved file url https://dandiarchive.s3.amazonaws.com/blobs/fe1/358/fe135898-cfa7-4243-b927-e6964c31afee?response-content-disposition=attachment%3B%20filename%3D%22sub-644972_ses-1237081845-acq-1237345890-denoised-movies_image%2Bophys.nwb%22&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUBRWC5GAEKH3223E%2F20260501%2Fus-east-2%2Fs3%2Faws4_request&X-Amz-Date=20260501T230301Z&X-Amz-Expires=21600&X-Amz-SignedHeaders=host&X-Amz-Signature=5cf50ba725c64434491b1a6bd496619f867ce0a034c39b938301306c9e5dd817
Streaming a File¶
Streaming with remfile is as easy as creating a remote file object from the url, and then opening it through the h5py and pywnb libraries.
rem_file = remfile.File(file_url)
h5py_file = h5py.File(rem_file, "r")
io = NWBHDF5IO(file=h5py_file, mode="r", load_namespaces=True)
nwb = io.read()nwb.processingInteracting with a Remote File¶
Once the file has been opened remotely, you can explore the file as you wish via print statements, or you can view the whole thing with just by showed in Exploring an NWB File.
### uncomment these to view aspects of the file
### not all of these exist for all NWB files (Key Errors will arise if the fields don't exist for this file)
# nwb.identifier
# nwb.processing
# nwb.acquisition["events"]
# nwb.intervals["trials"]
# nwb.stimulus["StimulusPresentation"]
# nwb.electrodesUsing Databook Utils Function¶
Throughout the remainder of the OpenScope Databook, whenever a file is streamed we reuse this code in the form of a local package, databook_utils. To retrieve an NWB file you can use the method dandi_stream_open after importing it like shown at the top of this notebook.
io = dandi_stream_open(dandiset_id, dandi_filepath)
nwb = io.read()
nwb