Serving DataTrees#

Xpublish treats xarray.DataTree as its core data primitive. A bare xarray.Dataset is just a one-node tree under the hood, so everything you’ve learned so far about serving Datasets applies unchanged when you switch to trees.

Serving a single DataTree#

You can publish a DataTree directly with SingleDatasetRest or the .rest accessor — the API is identical to the Dataset case:

import xarray as xr
import xpublish

dt = xr.DataTree(name="root")
dt["a"] = xr.DataTree(dataset=xr.Dataset({"x": ("i", [1, 2, 3])}))
dt["a/b"] = xr.DataTree(dataset=xr.Dataset({"y": ("j", [10.0, 20.0])}))

rest = xpublish.SingleDatasetRest(dt)
# or, equivalently:
dt.rest.serve()

Serving a collection of trees (and datasets)#

Rest accepts a mapping whose values can be either Dataset or DataTree objects in any combination:

rest = xpublish.Rest(
    {
        "flat": xr.Dataset({"var": ("x", [1, 2, 3])}),
        "tree": dt,
    }
)
rest.serve()

The flat dataset is wrapped in a single-node tree internally, so it shows up in the /groups listing as just ["/"].

Dataset provider plugins for trees#

The provider hook for plugins is xpublish.plugins.hooks.PluginSpec.get_datatree(). It receives both the dataset_id and the requested group path, and returns the xarray.DataTree rooted at that group (or None to pass to the next plugin). The returned tree’s root corresponds to the requested group.

Important

group must be declared as a positional parameter (no default) on your hookimpl. Pluggy does not forward arguments that have defaults, so a signature like def get_datatree(self, dataset_id, group="") will silently receive an empty string regardless of the URL. See the plugin user guide for details.

The lazy-by-group pattern#

For backends where loading the whole tree is expensive (Zarr v3, Icechunk, remote object stores), implement get_datatree so it opens only the requested group and wraps it in a single-node tree:

import xarray as xr
from xpublish import Plugin, hookimpl


class IcechunkProvider(Plugin):
    name: str = "icechunk"

    @hookimpl
    def get_datasets(self):
        return list(self._known_repos)

    @hookimpl
    def get_datatree(self, dataset_id: str, group: str):
        store = self._store_for(dataset_id)
        if store is None:
            return None
        ds = xr.open_zarr(store, group=group or None, consolidated=False)
        return xr.DataTree(dataset=ds)

Each request opens just the one group being viewed, so cost stays proportional to what’s actually queried.

Choosing between get_dataset and get_datatree#

Both provider hooks are first-class. Use the one that fits your data:

  • get_dataset() — for providers that only ever serve a flat Dataset. Xpublish wraps the returned Dataset in a single-node DataTree internally. Requests for a non-root group return 404.

  • get_datatree() — for providers that want to expose hierarchical data, or that benefit from per-group lazy loading (Zarr/Icechunk-style backends).

A plugin may implement both; get_datatree is consulted first. Switching from one to the other is mechanical:

# Flat provider
@hookimpl
def get_dataset(self, dataset_id: str):
    return xr.tutorial.open_dataset(dataset_id)


# Equivalent hierarchical provider (still flat, but exposed via the tree hook)
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
    if group:
        return None  # we only serve a flat dataset
    return xr.DataTree(dataset=xr.tutorial.open_dataset(dataset_id))

See also

For a complete upgrade guide aimed at server admins and plugin authors, see Migrating to the DataTree API.