Replies: 3 comments 9 replies
-
I had some momentum, so I took a swing at implementing a plugin system: #140 |
Beta Was this translation helpful? Give feedback.
-
Thanks for your thoughtful comments @abkfenris!
Not sure what exactly do you mean here? It is already easy to add/remove routers (or plugins in #140), but maybe your suggestion is to provide xpublish "standardized" extension mechanisms for other components of the application (e.g., data loaders, middlewares, settings, etc.)? In general I fully agree with your thoughts. IMO Xpublish core (this repository) should provide only the bare minimum set of routers (plugins). I'd even move the zarr router in a separate repository. I think it is better to have a lot of individual repositories / packages, each providing some xpublish plugin that defines a set of API routes. Assuming that those plugins are easily discoverable. We might also want to make those plugins flexible and parameterizable. For example, allow protecting certain plugins or individual routes to authenticated users (#100), e.g., by adding the possibility to inject additional fastapi dependencies like those provided by fastapi-users. Such extension mechanism is implemented in titiler (in general I think we could take inspiration from titiler for many things). It would be nice if Xpublish core could also provide convenient extension mechanisms for application settings (possibly reusing pydantic's Contrary to API routers, Xpublish core could come batteries included regarding all the common boilerplate things that would help making the deployment easier, e.g., a set of basic middlewares for logging, diagnostics, etc., extra dependencies like #54... It could even provide a basic command-line interface and configuration file system, like pygeoapi but simpler, although this is probably more a "nice to have" at this point. All those helpers and extension mechanisms shouldn't prevent integrating Xpublish into other fastapi applications in order to fully leverage fastapi and avoid xpublish "lock-in". I would be nice if xpublish stays hackable to some reasonable point. I guess the needs will vary a lot from one application to another. Regarding distros, one way to achieve it could be via conda(-forge) meta-packages like this one: pangeo-notebook. Not sure if/how it is possible to define meta packages on pypi, though. |
Beta Was this translation helpful? Give feedback.
-
I'll comment on some of the specific bits in the ongoing thread but I wanted to say at the top that I'm very supportive of the high level concepts @abkfenris has laid out here. I've long thought that Xpublish needs a set of opinionated deployment concepts with interchangeable routers. I do think that success here will turn on the documentation and discoverability of the external routers and deployment distros. To this end, I suggest we consider ways to group the various sub-projects in a way that makes them easy to find/use together (new github org, common docs, etc.). |
Beta Was this translation helpful? Give feedback.
-
Xpublish has the capability to become the core of a next generation of data server. Right now it's quickly extendable and hackable, but I think we need to start building some consensus on what's next for the project to keep progressing.
Where do we go from here?
I think we need to start looking at Xpublish from several different directions.
As an extendable core
This is the most similar to what Xpublish is at this moment.
Right now Xpublish supports manually instantiating with a single or a collection of datasets, but both the dataset loading and the routers are extendable, which provide a natural seperation of concerns between dataset loading and serving.
I think the current project can evolve further to define an extendible core to make it easier to add and swap out routers/data and configuration loaders and other elements.
Alternatively, the core and extension points could be broken out into a seperate package and Xpublish package stays as a
xarray.Dataset.rest
accessor.As a distro (or many)
For many data managers, they don't want to always be mucking around in the code of their data server. They would rather feed their server a config file, and have an array of services stood up for their datasets.
Various organizations also like to say 'we've standardized on such-and-such'. For example, IOOS (the U.S. Integrated Ocean Observing System) has said that the various regions serving them data should implement ERDDAP and THREDDS. (I do data management and development for one of the IOOS regions)
So I think we start at least one Xpublish based server that's designed to be something that various data managers would look at running. By building a few Xpublish based distros we can figure out what's best by being part of the core package, where the core should be extended, and what might be better off being a plugin.
Different distros may be focused on different audiences. We see that some with current data servers and systems. ERDDAP is largely focused on distilling data for you, where as THREDDS is more focused on providing services (there is some crossover). We also may see that different audiences are producing and consuming data with different levels of metadata and structure.
As a collections of routers/plugins
In between we have individual packages or plugins for various types of routers and other Xpublish extensions points. By pulling the routers and other functionality out into standalone plugins, it makes it easier to iterate and reason over them, in addition to providing the flexibility to assemble different collections of plugins together into different distros or one off instances.
For example some routers may require more processing than other (dyanmically reprojecting and generating web map tiles), so an admin might not want those plugins as they don't want to run a dask cluster, or they may want to swap out or customize the data loader.
While we are here, a few plugin ideas
Similar projects that we may be able to collaborate with
Beta Was this translation helpful? Give feedback.
All reactions