Building a dataset provider plugin#
So far, we’ve been eagerly loading datasets for Xpublish to serve, but this tends not to scale well between memory needs and slow startup. Xpublish plugins can also be Dataset Providers and handle loading of datasets on request.
This also allows organizations to quickly be able to adapt Xpublish to work in their own environment, rather than needing Xpublish to explicitly support it.
import xarray as xr
from requests import HTTPError
from xpublish import Plugin, Rest, hookimpl
class TutorialDataset(Plugin):
name: str = 'xarray-tutorial-dataset'
@hookimpl
def get_datasets(self):
return list(xr.tutorial.file_formats)
@hookimpl
def get_datatree(self, dataset_id: str, group: str):
# The xarray tutorial datasets are flat, so we only serve the root.
# Note: ``group`` must be a positional parameter (no default) — pluggy
# will not forward arguments that have defaults to the hookimpl.
if group:
return None
try:
ds = xr.tutorial.open_dataset(dataset_id)
except HTTPError:
return None
return xr.DataTree(dataset=ds)
rest = Rest({})
rest.register_plugin(TutorialDataset())
rest.serve()
With this plugin, Xpublish can serve the same datasets as we explictly defined and loaded in serving multiple datasets, as well as any others supported by xr.tutorial
The plugin implements xpublish.plugins.hooks.PluginSpec.get_datatree(),
which receives both dataset_id and a group path so it can serve hierarchical
data. The simpler get_dataset() hook
is also first-class — pick it for providers that only ever serve flat datasets.
See the DataTrees tutorial for the lazy-by-group pattern used
by Zarr/Icechunk-backed providers.
Note
For more details on building dataset provider plugins, please see the plugin user guide