Building a dataset provider plugin#

So far, we’ve been eagerly loading datasets for Xpublish to serve, but this tends not to scale well between memory needs and slow startup. Xpublish plugins can also be Dataset Providers and handle loading of datasets on request.

This also allows organizations to quickly be able to adapt Xpublish to work in their own environment, rather than needing Xpublish to explicitly support it.

import xarray as xr
from requests import HTTPError

from xpublish import Plugin, Rest, hookimpl


class TutorialDataset(Plugin):
    name: str = 'xarray-tutorial-dataset'

    @hookimpl
    def get_datasets(self):
        return list(xr.tutorial.file_formats)

    @hookimpl
    def get_datatree(self, dataset_id: str, group: str):
        # The xarray tutorial datasets are flat, so we only serve the root.
        # Note: ``group`` must be a positional parameter (no default) — pluggy
        # will not forward arguments that have defaults to the hookimpl.
        if group:
            return None
        try:
            ds = xr.tutorial.open_dataset(dataset_id)
        except HTTPError:
            return None
        return xr.DataTree(dataset=ds)


rest = Rest({})
rest.register_plugin(TutorialDataset())
rest.serve()

With this plugin, Xpublish can serve the same datasets as we explictly defined and loaded in serving multiple datasets, as well as any others supported by xr.tutorial

The plugin implements xpublish.plugins.hooks.PluginSpec.get_datatree(), which receives both dataset_id and a group path so it can serve hierarchical data. The simpler get_dataset() hook is also first-class — pick it for providers that only ever serve flat datasets. See the DataTrees tutorial for the lazy-by-group pattern used by Zarr/Icechunk-backed providers.

Note

For more details on building dataset provider plugins, please see the plugin user guide