8. Metadata¶
8.1 Metadata Structure¶
A ZV store carries metadata at five levels:
Root metadata — under
zarr.json["zarr_vectors"]at the store root, plus the NGFF axes / datasets block underzarr.json["multiscales"].Level metadata — under each level group’s
zarr.json["zarr_vectors_level"].Array metadata — under each array’s per-array
zarr.json, with a"zv_array"discriminator and a small shape/dtype block.Object metadata — values in
object_attributes/<name>/data.Group metadata — values in
group_attributes/<name>/data.
The canonical schema is schema/zarr_vectors.linkml.yaml in the
zarr-vectors-py package; this chapter mirrors it.
8.2 Root-Level Metadata¶
Root metadata is the "zarr_vectors" block on zarr.json at the
store root. NGFF axes live in the sibling "multiscales" block (the
RFC 4 / 5 layout); they are not duplicated under "zarr_vectors"
(this duplication was dropped in 0.5).
Field |
Type |
Description |
|---|---|---|
|
string ( |
ZV spec version. Renamed from |
|
|
Level-0 default chunk shape (one entry per space axis). Per-level overrides live in |
|
|
Global |
|
|
Subset of |
|
|
OME-Zarr RFC 4 / 5 CRS dict, or |
|
|
How intra-chunk links are stored. Default |
|
|
|
|
|
How cross-chunk connectivity is expressed. Default |
|
int ≥ 2 |
Multi-resolution threshold (a new level is emitted only when vertex count drops by ≥ this factor). Default 8. |
|
|
Level-0 bin edge lengths. When unset, defaults to |
|
int (default 1) |
Max ` |
|
|
Optionality knob for cross-pyramid-level links (§9.6). Default |
|
|
Capability tokens this store uses (see below). Empty list when omitted. |
NGFF axes (zarr.json["multiscales"][0]["axes"]) are a list of
{"name", "type", "unit"?} descriptors in NGFF order
(time → channel → custom → space). sid_ndim (the number of
spatial index dimensions) is count(type == "space").
Capability tokens (format_capabilities)¶
Capability tokens advertise optional features the store uses; they
match the CAP_* constants in zarr_vectors.constants:
Token |
Meaning |
|---|---|
|
At least one level was written with ID-preserving sparsification ( |
|
At least one level stores per-chunk fragments referenced by multiple objects’ manifests. Successor to the pre-0.6 |
|
The store uses the v0.6 fragment-index encoding for |
|
The store uses the |
There is no shared_vertex_groups token — it was renamed to
shared_fragments in 0.6 along with the underlying sharing primitive.
8.3 Resolution-Level Metadata¶
Per-level metadata lives in each level group’s
zarr.json["zarr_vectors_level"]:
Field |
Type |
Description |
|---|---|---|
|
int ≥ 0 |
Level index (0 = full resolution). |
|
int ≥ 0 |
Total vertex rows across all |
|
|
Subset of canonical array names actually present. |
|
|
Per-level bin edge lengths. |
|
|
Integer fold-change per axis vs level 0. |
|
|
v0.7 per-level chunk-shape override. When set, each axis must be a positive integer multiple of root |
|
float in |
Fraction of objects retained at this level vs the source level. |
|
|
How this level was generated. |
|
int | null |
Index of the source level ( |
|
|
Names of chunk-key axes (leading axis first). Non-null when the store was rechunked along a non-spatial axis. |
|
string | null |
Per-vertex attribute used as the leading chunk axis (single-axis attribute chunking). |
|
|
Ordered list mapping leading-axis chunk-coord to attribute value. |
|
bool (default false) |
True when this level inherits the parent’s OID space (dropped objects → empty manifest slots). |
|
int | null |
OID-space size inherited from |
|
bool (default false) |
True when per-chunk fragments may be referenced by multiple objects’ manifests. Renamed from |
Cross-level invariants (enforced by
validate_level_chunk_shape_against_root):
A per-level
chunk_shapemust be a positive integer multiple of rootchunk_shapealong every axis (nested chunk grids — coarser levels always nest cleanly into the level-0 grid).A per-level
chunk_shapemust be an integer multiple of the per-levelbin_shapealong every axis (bins still tile chunks cleanly at every level).
8.4 Array-Level Metadata¶
Every array group’s zarr.json carries a small ZV-specific block:
{
// Zarr v3 standard fields (shape, dtype, chunk_grid, codecs) live alongside.
"zv_array": "vertices", // discriminator
"dtype": "float32", // duplicated to avoid materializing the
// codec pipeline just to learn the dtype
"shape": [], // optional, when not derivable
"encoding": "raw" // for vertices arrays only
}
Recognized zv_array discriminator values (one per array kind):
Discriminator |
Array |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
The literal "groupings_attribute" discriminator is on-disk legacy
from before the groupings → groups rename; conceptual usage is
group attribute in all current text.
8.5 Object-Level Metadata¶
Per-object data lives in <level>/object_attributes/<name>/data,
dense (B,) or (B, C) arrays keyed by object ID. The format
imposes no fixed object-attribute schema; common conventions:
"name"— human-readable label per object."type"— categorical kind (mesh / skeleton / polyline / …)."centroid"—(B, sid_ndim)per-object summary point."termination"—(B, 2)for streamline endpoints (channel 0 = source, channel 1 = sink region IDs).
Object IDs are dense 0 .. B-1 ints; OID-preserving pyramid levels
may carry empty manifests (objects dropped at this level still own a
row in every object_attributes/<name>/data blob).
8.6 Group-Level Metadata¶
Per-group data lives in <level>/group_attributes/<name>/data, dense
(G,) or (G, C) arrays keyed by group ID. Common conventions:
"region_name"— name per anatomical region group."tract_name"— name per fascicle group."super_type"— coarser categorical label for hierarchical grouping (the format does not require a parent_group pointer itself — hierarchy is expressed by attributes, not structure).
8.7 Point-Level Metadata¶
Per-vertex data lives in <level>/vertex_attributes/<name>/<chunk>,
row-aligned to <level>/vertices/<chunk>. Multi-channel attributes
use (N_k, C) shape; channel labels live in the per-array
.zattrs.channel_names. Channel chunking — splitting a single
attribute into multiple per-channel arrays — is a writer-side
decision (e.g. one chunk per gene-block for spatial transcriptomics).
8.8 Coordinate Reference System (CRS)¶
The optional crs field on root metadata follows OME-Zarr RFC 4 / 5:
A
crsdict identifies the coordinate reference system (EPSG code, WKT, or transform pipeline).Per-axis units are carried on the NGFF axis descriptors as UDUNITS-2 names; the format does not stamp placeholder units.
Coordinate transforms (scale, translation) per level live in
zarr.json["multiscales"][0]["datasets"][i]["coordinateTransformations"]alongside the standard NGFF layout.