Serving DataTrees#

Xpublish treats xarray.DataTree as its core data primitive. A bare xarray.Dataset is just a one-node tree under the hood, so everything you’ve learned so far about serving Datasets applies unchanged when you switch to trees.

Serving a single DataTree#

You can publish a DataTree directly with SingleDatasetRest or the .rest accessor — the API is identical to the Dataset case:

import xarray as xr
import xpublish

dt = xr.DataTree(name="root")
dt["a"] = xr.DataTree(dataset=xr.Dataset({"x": ("i", [1, 2, 3])}))
dt["a/b"] = xr.DataTree(dataset=xr.Dataset({"y": ("j", [10.0, 20.0])}))

rest = xpublish.SingleDatasetRest(dt)
# or, equivalently:
dt.rest.serve()

Serving a collection of trees (and datasets)#

Rest accepts a mapping whose values can be either Dataset or DataTree objects in any combination:

rest = xpublish.Rest(
    {
        "flat": xr.Dataset({"var": ("x", [1, 2, 3])}),
        "tree": dt,
    }
)
rest.serve()

The flat dataset is wrapped in a single-node tree internally, so it shows up in the /groups listing as just ["/"].

Navigating groups via the URL#

Per-dataset routes can include an optional {group_path:path} segment to navigate into a node of the tree. The included dataset_info plugin uses this convention to expose group-aware variants of its endpoints:

URL	What it returns
`/datasets/tree/`	HTML repr of the root node
`/datasets/tree/keys`	Variable keys at the root
`/datasets/tree/groups`	List of every group path in the tree
`/datasets/tree/tree`	HTML repr of the full DataTree
`/datasets/tree/groups/a/keys`	Variable keys at the `/a` node
`/datasets/tree/groups/a/b/info`	Schema info at the `/a/b` node

Group paths can be arbitrarily nested — the {group_path:path} parameter matches across slashes. An unknown group returns a 404.

Dataset provider plugins for trees#

The provider hook for plugins is xpublish.plugins.hooks.PluginSpec.get_datatree(). It receives both the dataset_id and the requested group path, and returns the xarray.DataTree rooted at that group (or None to pass to the next plugin). The returned tree’s root corresponds to the requested group.

Important

group must be declared as a positional parameter (no default) on your hookimpl. Pluggy does not forward arguments that have defaults, so a signature like def get_datatree(self, dataset_id, group="") will silently receive an empty string regardless of the URL. See the plugin user guide for details.

The lazy-by-group pattern#

For backends where loading the whole tree is expensive (Zarr v3, Icechunk, remote object stores), implement get_datatree so it opens only the requested group and wraps it in a single-node tree:

import xarray as xr
from xpublish import Plugin, hookimpl


class IcechunkProvider(Plugin):
    name: str = "icechunk"

    @hookimpl
    def get_datasets(self):
        return list(self._known_repos)

    @hookimpl
    def get_datatree(self, dataset_id: str, group: str):
        store = self._store_for(dataset_id)
        if store is None:
            return None
        ds = xr.open_zarr(store, group=group or None, consolidated=False)
        return xr.DataTree(dataset=ds)

Each request opens just the one group being viewed, so cost stays proportional to what’s actually queried.

Choosing between `get_dataset` and `get_datatree`#

Both provider hooks are first-class. Use the one that fits your data:

get_dataset() — for providers that only ever serve a flat Dataset. Xpublish wraps the returned Dataset in a single-node DataTree internally. Requests for a non-root group return 404.
get_datatree() — for providers that want to expose hierarchical data, or that benefit from per-group lazy loading (Zarr/Icechunk-style backends).

A plugin may implement both; get_datatree is consulted first. Switching from one to the other is mechanical:

# Flat provider
@hookimpl
def get_dataset(self, dataset_id: str):
    return xr.tutorial.open_dataset(dataset_id)


# Equivalent hierarchical provider (still flat, but exposed via the tree hook)
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
    if group:
        return None  # we only serve a flat dataset
    return xr.DataTree(dataset=xr.tutorial.open_dataset(dataset_id))