Serving DataTrees#
Xpublish treats xarray.DataTree as its core data primitive. A bare
xarray.Dataset is just a one-node tree under the hood, so everything
you’ve learned so far about serving Datasets applies unchanged when you switch
to trees.
Serving a single DataTree#
You can publish a DataTree directly with SingleDatasetRest
or the .rest accessor — the API is identical to the Dataset case:
import xarray as xr
import xpublish
dt = xr.DataTree(name="root")
dt["a"] = xr.DataTree(dataset=xr.Dataset({"x": ("i", [1, 2, 3])}))
dt["a/b"] = xr.DataTree(dataset=xr.Dataset({"y": ("j", [10.0, 20.0])}))
rest = xpublish.SingleDatasetRest(dt)
# or, equivalently:
dt.rest.serve()
Serving a collection of trees (and datasets)#
Rest accepts a mapping whose values can be either
Dataset or DataTree objects in any combination:
rest = xpublish.Rest(
{
"flat": xr.Dataset({"var": ("x", [1, 2, 3])}),
"tree": dt,
}
)
rest.serve()
The flat dataset is wrapped in a single-node tree internally, so it shows up
in the /groups listing as just ["/"].
Dataset provider plugins for trees#
The provider hook for plugins is
xpublish.plugins.hooks.PluginSpec.get_datatree(). It receives both the
dataset_id and the requested group path, and returns the
xarray.DataTree rooted at that group (or None to pass to the next
plugin). The returned tree’s root corresponds to the requested group.
Important
group must be declared as a positional parameter (no default) on your
hookimpl. Pluggy does not forward arguments
that have defaults, so a signature like def get_datatree(self, dataset_id, group="")
will silently receive an empty string regardless of the URL. See the
plugin user guide for
details.
The lazy-by-group pattern#
For backends where loading the whole tree is expensive (Zarr v3, Icechunk,
remote object stores), implement get_datatree so it opens only the
requested group and wraps it in a single-node tree:
import xarray as xr
from xpublish import Plugin, hookimpl
class IcechunkProvider(Plugin):
name: str = "icechunk"
@hookimpl
def get_datasets(self):
return list(self._known_repos)
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
store = self._store_for(dataset_id)
if store is None:
return None
ds = xr.open_zarr(store, group=group or None, consolidated=False)
return xr.DataTree(dataset=ds)
Each request opens just the one group being viewed, so cost stays proportional to what’s actually queried.
Choosing between get_dataset and get_datatree#
Both provider hooks are first-class. Use the one that fits your data:
get_dataset()— for providers that only ever serve a flat Dataset. Xpublish wraps the returned Dataset in a single-node DataTree internally. Requests for a non-root group return 404.get_datatree()— for providers that want to expose hierarchical data, or that benefit from per-group lazy loading (Zarr/Icechunk-style backends).
A plugin may implement both; get_datatree is consulted first. Switching from
one to the other is mechanical:
# Flat provider
@hookimpl
def get_dataset(self, dataset_id: str):
return xr.tutorial.open_dataset(dataset_id)
# Equivalent hierarchical provider (still flat, but exposed via the tree hook)
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
if group:
return None # we only serve a flat dataset
return xr.DataTree(dataset=xr.tutorial.open_dataset(dataset_id))
See also
For a complete upgrade guide aimed at server admins and plugin authors, see Migrating to the DataTree API.