Downloading an NWB File#
In order to analyze some data, you’ll need to have some data. The DANDI Archive is used to store NWB files in datasets called dandisets [Rübel et al., 2022]. Typically an NWB file contains the data for just one experimental session, while a dandiset contains all the related data files yielded from a project. This notebook allows you to download from public dandisets or private dandisets (called embargoed dandisets) via the DANDI Python API. To download embargoed dandisets from DANDI, you will need to make an account on the DANDI Archive and must be given access by the owner of the dandiset.
Environment Setup#
⚠️Note: If running on a new environment, run this cell once and then restart the kernel⚠️
import warnings
warnings.filterwarnings('ignore')
try:
from databook_utils.dandi_utils import dandi_stream_open
except:
!git clone https://github.com/AllenInstitute/openscope_databook.git
%cd openscope_databook
%pip install -e .
from dandi import dandiapi
Download Configuration#
Here you can configure the download. Browse the DANDI Archive for a dandiset you’re interested in and use its ID in dandiset_id
. Also set download_loc
to the relative filepath of the directory you’d like to download to. If you’re accessing an embargoed dandiset, you should set authenticate
to True, and set dandi_api_key
to your DANDI API Key, which can be found if you click on your profile icon in the top-right corner on the DANDI Archive website.
dandiset_id = "000021"
download_loc = "."
authenticate = False
dandi_api_key = ""
if authenticate:
client = dandiapi.DandiAPIClient(token=dandi_api_key)
else:
client = dandiapi.DandiAPIClient()
my_dandiset = client.get_dandiset(dandiset_id)
print(f"Got dandiset {my_dandiset}")
Got dandiset DANDI:000021/draft
Downloading Just One File#
Set dandi_filepath
to the path of the file you want to download within the dandiset. You can get this by navigating to the file you want to download on the DANDI Archive website and pressing on the i
icon. There, you can copy the filepath from the field labeled path
. Don’t include a leading /
.
# define functions to download files with a progress bar
from typing import Union, Iterator, Callable, Tuple, Dict
import os
from pathlib import Path
from tqdm.notebook import tqdm
MAX_CHUNK_SIZE = int(os.environ.get("DANDI_MAX_CHUNK_SIZE", 1024 * 1024 * 8))
def get_download_file_iter_with_steps(
file, chunk_size: int = MAX_CHUNK_SIZE
) -> Tuple[Callable[[int], Iterator[bytes]], Dict[str, int]]:
url = file.base_download_url
steps_dict = {"total_steps": None}
result = file.client.session.get(url, stream=True)
total_size = int(result.headers.get('content-length', 0))
steps_dict["total_steps"] = total_size // chunk_size
print(f"Downloading {total_size} bytes in {steps_dict['total_steps']} steps")
def downloader(start_at: int = 0) -> Iterator[bytes]:
headers = None
if start_at > 0:
headers = {"Range": f"bytes={start_at}-"}
result = file.client.session.get(url, stream=True, headers=headers)
result.raise_for_status()
for chunk in result.iter_content(chunk_size=chunk_size):
if chunk:
yield chunk
return downloader, steps_dict
def download_with_progressbar(
file, filepath: Union[str, Path], chunk_size: int = MAX_CHUNK_SIZE
) -> None:
downloader, steps_dict = get_download_file_iter_with_steps(file)
with open(filepath, "wb") as fp:
for chunk in tqdm(downloader(0), total=steps_dict["total_steps"], unit="chunk", unit_scale=True, unit_divisor=1024):
fp.write(chunk)
dandi_filepath = "sub-699733573/sub-699733573_ses-715093703.nwb"
filename = dandi_filepath.split("/")[-1]
filepath = f"{download_loc}/{filename}"
file = my_dandiset.get_asset_by_path(dandi_filepath)
# this may take awhile, especially if the file to download is large
download_with_progressbar(file, filepath)
print(f"Downloaded file to {filepath}")
Downloading 2856232912 bytes in 340 steps
Downloaded file to ./sub-699733573_ses-715093703.nwb
Downloading Entire Dandiset#
If you’d like to do a lot of work with the files in a dandiset, you might want to download the entire thing or some portion of the dandiset. Be prepared, though; This could take a significant amount of space on your drive and a significant amount of time. If you want to just download all the files within a directory of the dandiset, you can set the first argument of download_directory
below to a more specific path within the dandiset.
# patience isn't just a virtue, it's a requirement
my_dandiset.download_directory("", f"{download_loc}/{dandiset_id}")
print(f"Downloaded directory to {download_loc}/{dandiset_id}")
Downloaded directory to ./000021