15. Appendices¶
Appendix A: JSON Schema Definitions¶
The authoritative schema for ZV root, level, and array metadata is
the LinkML model at
zarr_vectors-py/schema/zarr_vectors.linkml.yaml.
Tools may derive JSON Schema, Pydantic models, SQLAlchemy classes,
or OWL ontologies from that file using the standard gen-* LinkML
generators.
The schema covers:
Root metadata fields (
RootMetadata).Per-level metadata (
LevelMetadata).Per-array
.zattrsshapes perzv_arraydiscriminator.Enumerations:
LinksConvention,ObjectIndexConvention,CrossChunkStrategy,CrossLevelStorage,GeometryType,Encoding,CoarseningMethod.
Appendix B: Zarr Implementation Details¶
Zarr version: v3 only.
Per-array layout: every ZV array is a 1-D
uint8, single-chunk-per-coord array. Internal record framing (fragment-index, manifest-block, link-record streams) lives in project byte layouts inside each chunk’s payload, not in Zarr’s variable-length-chunk feature.Group metadata: root, level, and per-array
zarr.jsoncarry ZV-specific top-level keys (zarr_vectors,zarr_vectors_level,zv_array) alongside the Zarr v3 standardnode_type/zarr_formatfields.Multi-store hosting: stores work on any Zarr v3 store backend (LocalStore, FsspecStore over S3 / GCS / Azure Blob, in-memory, icechunk transactional stores, …).
Appendix C: Coordinate Reference Systems¶
ZV reuses OME-Zarr RFC 4 axes (
name,type,unit) and RFC 5coordinateTransformations(scale,translation) for per-level coordinate transforms.The optional
crsdict on root metadata is opaque to the format — it carries an EPSG code, WKT string, or transform pipeline as defined by the writer’s CRS conventions.Per-axis units use UDUNITS-2 names (
"micrometer","second", …). The format does not stamp placeholder unit strings — unknown units must be omitted.
Appendix D: Compression Codec Reference¶
Default per-array codec pipelines (from
zarr_vectors.encoding.compression):
Array |
Dtype |
Compressor |
Shuffle |
|---|---|---|---|
|
user-declared (float or integer; see §7.1) |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
opaque |
none — opaque bytes (see §11.4) |
— |
|
opaque |
none — opaque bytes (see §11.4) |
— |
|
user-declared integer (width chosen to fit |
Blosc(Zstd, clevel=5) |
BITSHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
|
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
|
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
|
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
|
user-declared |
Blosc(Zstd, clevel=5) |
BYTE-SHUFFLE |
Mesh stores may use Draco-encoded vertex+face co-encoding instead of
the raw Blosc pipeline; the per-array .zattrs.encoding = "draco"
field flags this. Draco output is already compressed, so the Zarr
codec pipeline is left empty (or wrapped in a minimal pass-through
codec).
The fragment-index byte layout (§7.3) and manifest-block stream
(§7.6) are not Zarr codecs — they are project-internal record
framings carried as the raw bytes inside single-chunk uint8
arrays.
Appendix E: Downsampling Algorithms¶
The reference implementation provides one coarsening strategy:
Per-object coarsening (
coarsening_method = "per_object"): each level’s vertices are produced by reducing each surviving object’s fragments to one metavertex per coarse-bin, with attribute reduction following the per-attribute convention (mean / mode / sum, declared in the array metadata). Object identity is preserved (preserves_object_ids = true); dropped objects leave empty manifest slots.
Future strategies (mesh edge-collapse decimation, skeleton path
simplification, streamline point reduction) plug into the same
coarsening_method slot; see zarr_vectors.multiresolution.
Appendix F: Query Patterns¶
Common access patterns and which arrays they touch:
Pattern |
Arrays read |
|---|---|
Bounding-box at one level |
|
Per-bin sub-chunk filter |
|
Single object reconstruction |
|
All objects in a group |
|
Per-object attribute query |
|
Pyramid drill-down (fine → coarse) |
|
Cross-chunk traversal |
|
Appendix G: Migration Guide¶
There is no in-place migration utility across major ZV versions (0.4 → 0.5 → 0.6 → 0.7). Each step changed the on-disk record layout in a way that breaks readers built for the previous version; stores must be rewritten from source.
See Appendix K below for the per-version summary.
Appendix H: Performance Considerations¶
Chunk size: larger chunks amortise per-chunk overhead at the cost of larger minimum-read units. v0.7 lets coarser pyramid levels grow
chunk_shapeindependently of level 0.Bin grid: enabling a per-chunk bin grid (
base_bin_shape) lets point-cloud queries narrow to individual bins without decoding the whole chunk — important when `vertex_count_per_chunk100k`.
Fragment vs object granularity: fragments are the unit of pyramid coarsening and re-use. When objects are very small (one fragment apiece, like nuclei), fragments and objects correspond 1-to-1; when objects are very large (a neuron with 10⁵ vertices), fragments naturally chunk the object’s vertices.
Per-link attributes:
link_attributes/<name>/<delta>/<chunk>read cost scales with the number of link rows inlinks/<delta>/<chunk>— the same fragment index that partitions vertices partitions these attributes.
Appendix I: Extensibility¶
Custom geometry types: writers may add a new value to
geometry_typesoutside the canonical set (point_cloud,line,polyline,streamline,skeleton,graph,mesh), at the cost of dropping conformance-level checks for that geometry.Custom metadata: arbitrary keys may be added under the
zarr_vectorsorzarr_vectors_levelnamespaces; readers should ignore unknown keys.Capability tokens: stores advertise optional features via
format_capabilities. Readers that don’t recognize a token must either treat the corresponding feature as absent or refuse to open the store.Version evolution: hard-break version bumps (0.4 → 0.5 → 0.6 → 0.7) are the project’s only versioning mechanism; no shim layer ships with the implementation.
Appendix J: References¶
TRX format specification: https://tee-ar-ex.github.io/trx-python/stable/trx_specifications.html
Zarr v3 specification: https://zarr-specs.readthedocs.io/
OME-Zarr (NGFF) specification: https://ngff.openmicroscopy.org/
LinkML: https://linkml.io/
Neuroglancer precomputed mesh format: https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/meshes.md
Neuroglancer precomputed annotation format: https://github.com/google/neuroglancer/blob/master/src/datasource/precomputed/annotations.md (see Appendix L for the mapping to zarr-vectors).
Appendix K: Change Log¶
Each entry is a hard break — no migration utility ships; rewrite stores from source between major versions.
0.7.0 — per-level
chunk_shapeoverride onLevelMetadata.RootMetadata.chunk_shaperemains the level-0 default; pyramid levels may carry a positive-integer-multiple chunk-shape override so coarser levels can grow chunks the way OME-Zarr image pyramids do via voxel-size scaling. Cross-pyramid-level link arrays already carry both endpoints’ chunk coords inline; the per-axis multiplier is exposed bychunk_scale_factor(root, level).0.6.0 — fragment-index schema.
vertex_group_offsetswas replaced byvertex_fragments(a v1 byte layout with header, range bitmap, range table, and CSR explicit list — see §7.3). Inline self-describing link blobs atdelta = 0were split intolinks/0/<chunk>(flat payload) +link_fragments/<chunk>(fragment index).object_index/dataadopted the manifest-block encoding (modes 0 / 1 / 2 — single / range / explicit) with chunk-local fragment references. Theshared_vertex_groupscapability token was renamed toshared_fragmentsand the newfragment_indexcapability is mandatory.0.5.0 — NGFF alignment + format simplification. Several on-disk simplifications shipped without a version bump (consumers should pin to a specific point release): renamed
format_version→zv_version; moved axes to NGFFmultiscales[0].axes; dropped per-array dtype duplication; removedvertex_counts/,metanode_children/,cross_chunk_faces/,attributes/<name>/<key>_offsets, andobject_index/pending/; replaced the(K, 2)paired offset layout with a flat(K,)int64 array of vertex offsets (later replaced again by the fragment index in 0.6).0.4.1 — bare-integer resolution-level group names (
0/,1/). Renamed fromresolution_0/,resolution_1/, … to mirror OME-Zarr.0.4 — multiscale-link arrays. Introduced the
<delta>sub-folder layout forlinks/,cross_chunk_links/,link_attributes/, andcross_chunk_link_attributes/and thecross_level_depth/cross_level_storagewriter knobs. This is the version where cross-pyramid-level links became expressible.
Appendix L: Mapping from Neuroglancer Precomputed Annotations¶
The Neuroglancer precomputed annotation format stores small geometric primitives — points, lines, axis-aligned bounding boxes, and ellipsoids — with per-annotation properties, per-segment relationships, and a multi-resolution random-subsample spatial index. It serves a similar purpose to zarr-vectors but with narrower geometry semantics and a different multi-resolution model. This appendix maps the two layouts so authors can pick the right target and converters can translate between them.
L.1 Conceptual mapping¶
Neuroglancer Precomputed Annotations |
Zarr-vectors equivalent |
|---|---|
|
|
|
|
|
NGFF |
|
|
|
|
|
|
|
Pyramid levels |
|
Per-level effective |
|
Implicit via vertex count / bin_shape; pyramid coarsening controlled by |
Sharded vs unsharded |
Zarr v3 chunk-key encoding handles both transparently; sharding is a backend concern, not a schema choice |
Random subsampling between levels |
|
L.2 Mapping the four geometry primitives¶
Precomputed annotations are zero-dimensional primitives, each carrying
a fixed positional record. Zarr-vectors expresses them via the
geometry_types list plus the appropriate per-record layout:
Precomputed |
Positional record |
Zarr-vectors representation |
|---|---|---|
|
1 vector |
|
|
2 vectors (endpoint A, B) |
|
|
2 vectors (min, max) |
Two options. (a) |
|
2 vectors (center, radii) |
|
|
uint32 count + N vectors |
|
Multi-geometry stores are natural in zarr-vectors (just list every
geometry in geometry_types); the precomputed format restricts a
single store to one annotation_type and would require co-locating
several stores to mix kinds.
L.3 Properties and relationships¶
Per-annotation properties in precomputed correspond directly to per-vertex (or per-object) attributes:
Numeric properties (
uint8,int8, …,float32) → typedvertex_attributes/<name>/<chunk>orobject_attributes/<name>/data. Zarr-vectors uses the array’sdtypedirectly; no per-property enum table is needed at the schema level.rgb/rgba→ either an(N, 3)/(N, 4)uint8 attribute, or three / four channels of a multi-channel attribute with declaredchannel_names.enum_values/enum_labels→ carried aschannel_names/ per-attribute side metadata. Zarr-vectors does not currently reserve a top-level enum-mapping slot; writers stamp the mapping as a JSON dict in the attribute’s.zattrs.
Relationships (each annotation linked to a list of segment IDs) map onto zarr-vectors groups:
The precomputed
relationships[<rel_name>]becomes a per-relationshipgroups/dataarray (or, if multiple relationships, onegroups-like structure per relationship name — typically expressed today by rechunking by relationship; see §8).Each group corresponds to one segment id. Its membership list is the annotation IDs (= object IDs in zarr-vectors).
The inverse — annotation → list of segments — is then the group-membership matrix (which annotations belong to which groups); in zarr-vectors this is recovered by iterating
groups/data, the same operation that powers “show all annotations on segment X” in precomputed.
L.4 The spatial index¶
Both formats use a multi-resolution spatial grid to keep query cost bounded, but the selection rule differs:
Precomputed: each level has a per-cell
limit. A cell with more annotations than the limit gets a random subsample at this level; the rest propagate to finer children. Cell coverage is controlled bygrid_shape×chunk_size, anchored atlower_bound. Levels coarsen by integer division of cells.Zarr-vectors: each level has a fixed
bin_shape(and optionally an overriddenchunk_shape, v0.7). Coarsening is per-object: each surviving object’s vertices are aggregated into metavertices at the coarser bin grid.object_sparsity ∈ (0, 1]in level metadata records what fraction of objects survived.
Equivalents:
Concept |
Precomputed |
Zarr-vectors |
|---|---|---|
Grid cells per axis at level L |
|
|
Cell size at level L |
|
|
LOD selection knob |
|
|
Cross-level identity |
annotation ID is shared across levels |
OID-preserving pyramid ( |
Drillable parent→child mapping |
implicit (random subsample) |
optional |
Precomputed’s “drill the visible cells until you’ve returned at most
limit annotations per cell” maps to zarr-vectors’ “ask each level
for the object set; stop when object_sparsity * source_count fits
your budget.” The precomputed model is simpler and gets random
sampling for free; the zarr-vectors model is more general (handles
extended objects, not just points) at the cost of an explicit
coarsener.
L.5 Practical conversion notes¶
Single annotation type → single zarr-vectors store is a straightforward 1:1 transcode. Pick the geometry mapping from §L.2; emit one fragment per annotation; populate
object_index/in dense OID order from the originalby_idlisting.Properties transcode field-for-field; preserve the original uint8 / int8 / … types rather than upcasting.
Relationships transcode to groups keyed by segment id. In practice, very large segment-id spaces (10⁸+ segments) suggest using
object_attributes/segment_idrather thangroups/dataif most segments touch zero or one annotation.Spatial index does NOT transcode 1:1. The pyramid is rebuilt using the per-object coarsener; the precomputed
limitrule has no direct zarr-vectors equivalent. A reasonable default isreduction_factor = 8,object_sparsity ≈ limit / max_cell_countper level, andcross_level_storage = "implicit"if downstream readers need to drill from coarse to fine.Sharded vs unsharded is invisible to the schema mapping — zarr-vectors writes per-chunk blobs into a Zarr v3 store regardless of the backend’s sharding.
L.6 What zarr-vectors adds over precomputed annotations¶
Connected geometries (polylines with branches, skeletons, meshes) live in the same store, not just zero-dimensional primitives.
link_width = 1, 2, 3, ...and the fragment-index partitioning carry the connectivity.Fragment / vertex re-use: explicit fragment indices let multiple objects reference the same per-chunk vertex rows (
shared_fragments), saving storage on overlapping geometries (streamline bundles, mesh rims).Per-link attributes: edges / faces carry their own attribute arrays (
link_attributes/<name>/<delta>/<chunk>), not just the per-annotation properties precomputed supports.Cross-pyramid-level links (§9.6) make the pyramid drillable — given a coarse metavertex, walk back to the fine-level vertices that produced it. Precomputed’s random-subsample model loses this mapping at construction time.
Pluggable coarsening:
coarsening_method = "per_object"is the default, but mesh decimation and streamline point-reduction slot into the same level structure.
L.7 What precomputed annotations preserve that zarr-vectors does not (yet)¶
Sharded back-end as a first-class schema concern. Zarr-vectors delegates sharding to the underlying Zarr v3 store; it does not expose
shardingblocks per index.Per-property enum tables as a typed schema field (
enum_values/enum_labels). Zarr-vectors carries this as free-form.zattrsmetadata.The “limit” LOD heuristic — automatic, occupancy-driven downsampling without writing a coarsening pass. In zarr-vectors the writer must pick
bin_ratio/chunk_scale_factor/object_sparsitydeliberately.