15. Appendices

Appendix A: JSON Schema Definitions

The authoritative schema for ZV root, level, and array metadata is the LinkML model at zarr_vectors-py/schema/zarr_vectors.linkml.yaml. Tools may derive JSON Schema, Pydantic models, SQLAlchemy classes, or OWL ontologies from that file using the standard gen-* LinkML generators.

The schema covers:

  • Root metadata fields (RootMetadata).

  • Per-level metadata (LevelMetadata).

  • Per-array .zattrs shapes per zv_array discriminator.

  • Enumerations: LinksConvention, ObjectIndexConvention, CrossChunkStrategy, CrossLevelStorage, GeometryType, Encoding, CoarseningMethod.

Appendix B: Zarr Implementation Details

  • Zarr version: v3 only.

  • Per-array layout: every ZV array is a 1-D uint8, single-chunk-per-coord array. Internal record framing (fragment-index, manifest-block, link-record streams) lives in project byte layouts inside each chunk’s payload, not in Zarr’s variable-length-chunk feature.

  • Group metadata: root, level, and per-array zarr.json carry ZV-specific top-level keys (zarr_vectors, zarr_vectors_level, zv_array) alongside the Zarr v3 standard node_type / zarr_format fields.

  • Multi-store hosting: stores work on any Zarr v3 store backend (LocalStore, FsspecStore over S3 / GCS / Azure Blob, in-memory, icechunk transactional stores, …).

Appendix C: Coordinate Reference Systems

  • ZV reuses OME-Zarr RFC 4 axes (name, type, unit) and RFC 5 coordinateTransformations (scale, translation) for per-level coordinate transforms.

  • The optional crs dict on root metadata is opaque to the format — it carries an EPSG code, WKT string, or transform pipeline as defined by the writer’s CRS conventions.

  • Per-axis units use UDUNITS-2 names ("micrometer", "second", …). The format does not stamp placeholder unit strings — unknown units must be omitted.

Appendix D: Compression Codec Reference

Default per-array codec pipelines (from zarr_vectors.encoding.compression):

Array

Dtype

Compressor

Shuffle

vertices

user-declared (float or integer; see §7.1)

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

vertex_attributes/<name>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

fragment_attributes/<name>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

vertex_fragments

opaque uint8 (§7.3)

none — opaque bytes (see §11.4)

link_fragments

opaque uint8 (§7.5)

none — opaque bytes (see §11.4)

links/<delta>

user-declared integer (width chosen to fit n_vertices_in_chunk; see §7.5)

Blosc(Zstd, clevel=5)

BITSHUFFLE

link_attributes/<name>/<delta>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

object_index

object (vlen-bytes; opaque manifest blob — §7.6)

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

object_attributes/<name>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

groups

int64 (ragged CSR of object IDs + offsets)

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

group_attributes/<name>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

cross_chunk_links/<delta>

int64 (endpoint records — see §11.3)

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

cross_chunk_link_attributes/<name>/<delta>

user-declared

Blosc(Zstd, clevel=5)

BYTE-SHUFFLE

Mesh stores may use Draco-encoded vertex+face co-encoding instead of the raw Blosc pipeline; the per-array .zattrs.encoding = "draco" field flags this. Draco output is already compressed, so the Zarr codec pipeline is left empty (or wrapped in a minimal pass-through codec).

The fragment-index byte layout (§7.3) and manifest-block stream (§7.6) are not Zarr codecs — they are project-internal record framings carried as the raw bytes inside single-chunk uint8 arrays.

Appendix E: Downsampling Algorithms

The reference implementation provides one coarsening strategy:

  • Per-object coarsening (coarsening_method = "per_object"): each level’s vertices are produced by reducing each surviving object’s fragments to one metavertex per coarse-bin, with attribute reduction following the per-attribute convention (mean / mode / sum, declared in the array metadata). Object identity is preserved (preserves_object_ids = true); dropped objects leave empty manifest slots.

Future strategies (mesh edge-collapse decimation, skeleton path simplification, streamline point reduction) plug into the same coarsening_method slot; see zarr_vectors.multiresolution.

Appendix F: Query Patterns

Common access patterns and which arrays they touch:

Pattern

Arrays read

Bounding-box at one level

vertices/<chunk> for intersected chunks

Per-bin sub-chunk filter

vertex_fragments/<chunk> → row-slice into vertices/<chunk>

Single object reconstruction

object_index/data → per-chunk vertex_fragments/<chunk>vertices/

All objects in a group

groups/data → object ids → object_index/data → vertices

Per-object attribute query

object_attributes/<name>/data (no vertex read)

Pyramid drill-down (fine → coarse)

links/+1/<chunk> (when cross_level_storage != "none")

Cross-chunk traversal

cross_chunk_links/0/data

Appendix G: Migration Guide

There is no in-place migration utility across major ZV versions (0.4 → 0.5 → 0.6 → 0.7). Each step changed the on-disk record layout in a way that breaks readers built for the previous version; stores must be rewritten from source.

See Appendix K below for the per-version summary.

Appendix H: Performance Considerations

  • Chunk size: larger chunks amortise per-chunk overhead at the cost of larger minimum-read units. v0.7 lets coarser pyramid levels grow chunk_shape independently of level 0.

  • Bin grid: enabling a per-chunk bin grid (base_bin_shape) lets point-cloud queries narrow to individual bins without decoding the whole chunk — important when `vertex_count_per_chunk

    100k`.

  • Fragment vs object granularity: fragments are the unit of pyramid coarsening and re-use. When objects are very small (one fragment apiece, like nuclei), fragments and objects correspond 1-to-1; when objects are very large (a neuron with 10⁵ vertices), fragments naturally chunk the object’s vertices.

  • Per-link attributes: link_attributes/<name>/<delta>/<chunk> read cost scales with the number of link rows in links/<delta>/<chunk> — the same fragment index that partitions vertices partitions these attributes.

Appendix I: Extensibility

  • Custom geometry types: writers may add a new value to geometry_types outside the canonical set (point_cloud, line, polyline, streamline, skeleton, graph, mesh), at the cost of dropping conformance-level checks for that geometry.

  • Custom metadata: arbitrary keys may be added under the zarr_vectors or zarr_vectors_level namespaces; readers should ignore unknown keys.

  • Capability tokens: stores advertise optional features via format_capabilities. Readers that don’t recognize a token must either treat the corresponding feature as absent or refuse to open the store.

  • Version evolution: hard-break version bumps (0.4 → 0.5 → 0.6 → 0.7) are the project’s only versioning mechanism; no shim layer ships with the implementation.

Appendix J: References

Appendix K: Change Log

Each entry is a hard break — no migration utility ships; rewrite stores from source between major versions.

  • 0.7.0 — per-level chunk_shape override on LevelMetadata. RootMetadata.chunk_shape remains the level-0 default; pyramid levels may carry a positive-integer-multiple chunk-shape override so coarser levels can grow chunks the way OME-Zarr image pyramids do via voxel-size scaling. Cross-pyramid-level link arrays already carry both endpoints’ chunk coords inline; the per-axis multiplier is exposed by chunk_scale_factor(root, level).

  • 0.6.0 — fragment-index schema. vertex_group_offsets was replaced by vertex_fragments (a v1 byte layout with header, range bitmap, range table, and CSR explicit list — see §7.3). Inline self-describing link blobs at delta = 0 were split into links/0/<chunk> (flat payload) + link_fragments/<chunk> (fragment index). object_index/data adopted the manifest-block encoding (modes 0 / 1 / 2 — single / range / explicit) with chunk-local fragment references. The shared_vertex_groups capability token was renamed to shared_fragments and the new fragment_index capability is mandatory.

  • 0.5.0 — NGFF alignment + format simplification. Several on-disk simplifications shipped without a version bump (consumers should pin to a specific point release): renamed format_versionzv_version; moved axes to NGFF multiscales[0].axes; dropped per-array dtype duplication; removed vertex_counts/, metanode_children/, cross_chunk_faces/, attributes/<name>/<key>_offsets, and object_index/pending/; replaced the (K, 2) paired offset layout with a flat (K,) int64 array of vertex offsets (later replaced again by the fragment index in 0.6).

  • 0.4.1 — bare-integer resolution-level group names (0/, 1/). Renamed from resolution_0/, resolution_1/, … to mirror OME-Zarr.

  • 0.4 — multiscale-link arrays. Introduced the <delta> sub-folder layout for links/, cross_chunk_links/, link_attributes/, and cross_chunk_link_attributes/ and the cross_level_depth / cross_level_storage writer knobs. This is the version where cross-pyramid-level links became expressible.

Appendix L: Mapping from Neuroglancer Precomputed Annotations

The Neuroglancer precomputed annotation format stores small geometric primitives — points, lines, axis-aligned bounding boxes, and ellipsoids — with per-annotation properties, per-segment relationships, and a multi-resolution random-subsample spatial index. It serves a similar purpose to zarr-vectors but with narrower geometry semantics and a different multi-resolution model. This appendix maps the two layouts so authors can pick the right target and converters can translate between them.

L.1 Conceptual mapping

Neuroglancer Precomputed Annotations

Zarr-vectors equivalent

info["@type"] = "neuroglancer_annotations_v1"

zarr.json["zarr_vectors"]["zv_version"] + geometry_types

annotation_type (geometry discriminator)

geometry_types (list) + per-record link_width

dimensions, lower_bound, upper_bound

NGFF multiscales[0].axes (with units) + bounds

properties[] (per-annotation attributes)

vertex_attributes/<name>/<chunk> (per-vertex), or object_attributes/<name>/data (per-object) when one annotation = one object

relationships[] (per-segment links)

groups/data + group_attributes/<name>/data (groups keyed by segment id)

by_id/ (annotation ID → record)

object_index/data (manifest keyed by dense [0, B) OID)

spatial/<level>/ (multi-res grid + random subsample)

Pyramid levels 0/, 1/, … with per-object coarsening + cross_level_storage for fine→coarse mapping

spatial[level].grid_shape × chunk_size

Per-level effective chunk_shape (root + optional v0.7 LevelMetadata.chunk_shape override)

spatial[level].limit (max annotations / cell)

Implicit via vertex count / bin_shape; pyramid coarsening controlled by reduction_factor and coarsening_method

Sharded vs unsharded by_id/ and spatial/

Zarr v3 chunk-key encoding handles both transparently; sharding is a backend concern, not a schema choice

Random subsampling between levels

coarsening_method = "per_object" with object_sparsity (0, 1] and preserves_object_ids = true

L.2 Mapping the four geometry primitives

Precomputed annotations are zero-dimensional primitives, each carrying a fixed positional record. Zarr-vectors expresses them via the geometry_types list plus the appropriate per-record layout:

Precomputed annotation_type

Positional record

Zarr-vectors representation

POINT

1 vector

geometry_types = ["point_cloud"]; one fragment per annotation, single vertex row.

LINE

2 vectors (endpoint A, B)

geometry_types = ["polyline"] or ["line"]; one fragment per annotation, 2-vertex range; no links/ needed under links_convention = "implicit_sequential".

AXIS_ALIGNED_BOUNDING_BOX

2 vectors (min, max)

Two options. (a) geometry_types = ["point_cloud"] with per-object attributes min_corner, max_corner carrying the box (one annotation = one object). (b) geometry_types = ["line"] with the diagonal endpoints — readers reconstruct the AABB. Option (a) preserves “this is a box” semantics; (b) is simpler when boxes coexist with lines.

ELLIPSOID

2 vectors (center, radii)

geometry_types = ["point_cloud"] with per-object attributes center and radii. Same shape as option (a) for AABB. Centers populate vertices/<chunk>; the radii object_attribute carries the second vector.

POLYLINE

uint32 count + N vectors

geometry_types = ["polyline"]; one fragment per polyline (a range [start, count) over vertices/<chunk>); manifest in object_index/data carries one block per chunk the polyline crosses; links_convention = "implicit_sequential".

Multi-geometry stores are natural in zarr-vectors (just list every geometry in geometry_types); the precomputed format restricts a single store to one annotation_type and would require co-locating several stores to mix kinds.

L.3 Properties and relationships

Per-annotation properties in precomputed correspond directly to per-vertex (or per-object) attributes:

  • Numeric properties (uint8, int8, …, float32) → typed vertex_attributes/<name>/<chunk> or object_attributes/<name>/data. Zarr-vectors uses the array’s dtype directly; no per-property enum table is needed at the schema level.

  • rgb / rgba → either an (N, 3) / (N, 4) uint8 attribute, or three / four channels of a multi-channel attribute with declared channel_names.

  • enum_values / enum_labels → carried as channel_names / per-attribute side metadata. Zarr-vectors does not currently reserve a top-level enum-mapping slot; writers stamp the mapping as a JSON dict in the attribute’s .zattrs.

Relationships (each annotation linked to a list of segment IDs) map onto zarr-vectors groups:

  • The precomputed relationships[<rel_name>] becomes a per-relationship groups/data array (or, if multiple relationships, one groups-like structure per relationship name — typically expressed today by rechunking by relationship; see §8).

  • Each group corresponds to one segment id. Its membership list is the annotation IDs (= object IDs in zarr-vectors).

  • The inverse — annotation → list of segments — is then the group-membership matrix (which annotations belong to which groups); in zarr-vectors this is recovered by iterating groups/data, the same operation that powers “show all annotations on segment X” in precomputed.

L.4 The spatial index

Both formats use a multi-resolution spatial grid to keep query cost bounded, but the selection rule differs:

  • Precomputed: each level has a per-cell limit. A cell with more annotations than the limit gets a random subsample at this level; the rest propagate to finer children. Cell coverage is controlled by grid_shape × chunk_size, anchored at lower_bound. Levels coarsen by integer division of cells.

  • Zarr-vectors: each level has a fixed bin_shape (and optionally an overridden chunk_shape, v0.7). Coarsening is per-object: each surviving object’s vertices are aggregated into metavertices at the coarser bin grid. object_sparsity (0, 1] in level metadata records what fraction of objects survived.

Equivalents:

Concept

Precomputed

Zarr-vectors

Grid cells per axis at level L

spatial[L].grid_shape

ceil((bounds_max - bounds_min) / chunk_shape_L)

Cell size at level L

spatial[L].chunk_size

RootMetadata.chunk_shape × per-level chunk_scale_factor (v0.7)

LOD selection knob

spatial[L].limit

reduction_factor + per-level object_sparsity

Cross-level identity

annotation ID is shared across levels

OID-preserving pyramid (preserves_object_ids = true)

Drillable parent→child mapping

implicit (random subsample)

optional links/<delta>/<chunk> arrays (§9.6)

Precomputed’s “drill the visible cells until you’ve returned at most limit annotations per cell” maps to zarr-vectors’ “ask each level for the object set; stop when object_sparsity * source_count fits your budget.” The precomputed model is simpler and gets random sampling for free; the zarr-vectors model is more general (handles extended objects, not just points) at the cost of an explicit coarsener.

L.5 Practical conversion notes

  • Single annotation type → single zarr-vectors store is a straightforward 1:1 transcode. Pick the geometry mapping from §L.2; emit one fragment per annotation; populate object_index/ in dense OID order from the original by_id listing.

  • Properties transcode field-for-field; preserve the original uint8 / int8 / … types rather than upcasting.

  • Relationships transcode to groups keyed by segment id. In practice, very large segment-id spaces (10⁸+ segments) suggest using object_attributes/segment_id rather than groups/data if most segments touch zero or one annotation.

  • Spatial index does NOT transcode 1:1. The pyramid is rebuilt using the per-object coarsener; the precomputed limit rule has no direct zarr-vectors equivalent. A reasonable default is reduction_factor = 8, object_sparsity limit / max_cell_count per level, and cross_level_storage = "implicit" if downstream readers need to drill from coarse to fine.

  • Sharded vs unsharded is invisible to the schema mapping — zarr-vectors writes per-chunk blobs into a Zarr v3 store regardless of the backend’s sharding.

L.6 What zarr-vectors adds over precomputed annotations

  • Connected geometries (polylines with branches, skeletons, meshes) live in the same store, not just zero-dimensional primitives. link_width = 1, 2, 3, ... and the fragment-index partitioning carry the connectivity.

  • Fragment / vertex re-use: explicit fragment indices let multiple objects reference the same per-chunk vertex rows (shared_fragments), saving storage on overlapping geometries (streamline bundles, mesh rims).

  • Per-link attributes: edges / faces carry their own attribute arrays (link_attributes/<name>/<delta>/<chunk>), not just the per-annotation properties precomputed supports.

  • Cross-pyramid-level links (§9.6) make the pyramid drillable — given a coarse metavertex, walk back to the fine-level vertices that produced it. Precomputed’s random-subsample model loses this mapping at construction time.

  • Pluggable coarsening: coarsening_method = "per_object" is the default, but mesh decimation and streamline point-reduction slot into the same level structure.

L.7 What precomputed annotations preserve that zarr-vectors does not (yet)

  • Sharded back-end as a first-class schema concern. Zarr-vectors delegates sharding to the underlying Zarr v3 store; it does not expose sharding blocks per index.

  • Per-property enum tables as a typed schema field (enum_values / enum_labels). Zarr-vectors carries this as free-form .zattrs metadata.

  • The “limit” LOD heuristic — automatic, occupancy-driven downsampling without writing a coarsening pass. In zarr-vectors the writer must pick bin_ratio / chunk_scale_factor / object_sparsity deliberately.