Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GNIP 97: New metadata editor #11511

Open
2 of 5 tasks
etj opened this issue Sep 19, 2023 · 23 comments
Open
2 of 5 tasks

GNIP 97: New metadata editor #11511

etj opened this issue Sep 19, 2023 · 23 comments
Assignees
Labels
gnip A GeoNodeImprovementProcess Issue

Comments

@etj
Copy link
Contributor

etj commented Sep 19, 2023

GNIP 97 - New metadata editor

Overview

The metadata editor still relies on a legacy template engine which is quite hard to customize.
Also it uses unmaintained vulnerable js libraries that should be removed.

Proposed By

  • @etj Emanuele Tajariol

Assigned to Release

This proposal is for GeoNode 5

State

  • Under Discussion
  • In Progress
  • Completed
  • Rejected
  • Deferred

Motivation

  • Allow easier customization
  • Allow different editing frontends
  • Make a clear distincton between descriptive metadata and operational metadata (owner, group)
  • Remove old unmaintained vulnerable js libraries

Proposal

The idea is to switch to standard procedures to document the metadata schema and to return the metadata info.

The metadata schema is presented using the JSON schema conventions.

The GeoNode API has been extended to provide

  • /api/v2/metadata/schema metadata schema
  • /api/v2/metadata/instance/<PK> resource metadata instances handling (GET/PUT)
  • /api/v2/metadata/xxx other utility calls for the metadata handling (autocomplete endpoints referred into the schema, etc)

One of the basic driving features has been the customization of the schema. The implementation relies on a set of metadata handlers that can be easily extended. We embedded into GeoNode the optional inspire module, mostly a (working) showcase that documents how the GeoNode base model can be customized.

Implementation

Intro

The package geonode.metadata contains all the metadata code.

The manager module contains the base engine that calls methods on the registered field handlers.
Handlers are the objects that manage the field definition (the json schema), and that take care of converting database objects into json schema instances and viceversa.
A single handler may handle one or more fields.

MetadataManager

The geonode.metadata.manager.MetadataManager class provides these main methods:

  def get_schema(self, lang=None)`

Builds and returns the metadata json schema.
The schema is created by looping on the sorted set of metadata handlers, asking them to update the schema incrementally via def update_schema(self, jsonschema: dict, context, lang=None).
Handlers may add their own field, remove fields added by previously called
The schema localization is only related to the title and description annotations

This method is used by the /api/v2/metadata/schema endpoint.
The default language used is the one set in the cookies, or it can be overriden by adding the query param lang.

   def build_schema_instance(self, resource: ResourceBase, lang=None)

Builds and returns the json schema instance related to the resource.

The instance is created by examining the json schema and looping on each field.
Then, the related handler is called to fill in the serialized value for such field.

Called handler methods:

  • def load_serialization_context(resource: ResourceBase, jsonschema: dict, context: dict)
    Allow handlers to pre-load some common info for all the handled fields
  • def get_jsonschema_instance(resource: ResourceBase, field_name, context, errors, lang=None)
    Returns the instance of the sub-schema associated with the field field_name.
   def update_schema_instance(self, resource, json_instance)

Parses the incoming json schema instance and stores the value into the DB.

Persistence is done by examining the json schema and looping on each field.
Then, the related handler is called to parse and store the deserialized value(s) for such field.
If the field is a complex object, it's up to the handler parse and handle it.

Called handler methods:

  • def load_deserialization_context(self, resource: ResourceBase, jsonschema: dict, context: dict)
    Allow handlers to pre-load some common info for all the handled fields
  • def update_resource(resource: ResourceBase, field_name: str, json_instance: dict, context: dict, errors: dict, **kwargs)
    Logic to parse and persist the field

Handlers

The list of handlers is declared in geonode.metadata.settings.METADATA_HANDLERS.
Order is important since later declared handlers may want to customize previously defined fields.
Within each field subschema there are some geonode-specific annotations (custom annotation are allowed by the json schema standard), some are needed for backend logic (for instance geonode:handler'), other for the client UI (e.g. ui:options`)

BaseHandler

BaseHandler handles most of the fields declared inside the ResourceBase class.
The BaseHandler will read the main schema from a file (metadata/schemas/base.json ) , augmenting its info programmatically (such as autocomplete URLs).

SparseHandler

This handler is not actively used by the default GeoNode schema, since it's more aimed at customizations.
You may see an example of its use in the inspire package.

The idea for the SparseField (the single model change in metadata) is to have a "vertical" table with free form fields:
image

In the default case, its use is as simple as declaring the field:

  "instrumentations": {
    "type": "array",
    "title": "instrumentations",
    "description": "Provides information about any instruments used in the data collection. The description should include vendor, model number, optional equipment, etc.",
    "items": {
      "type": "string",
      "minLength": 1,
      "default": ""
    },
    "minItems": 1,
    "geonode:handler": "sparse"
  }

Some custom annotation

  • geonode:handler: key to the handlers defined in METADATA_HANDLERS; tells which is the handler to be called to handle the field's content. Pls note that this annotation may be replaced by subsequent handler if they want to handle the field differently.
  • geonode:required: the jsonschema specs wants the required array outside the subschema definition. By defining this annotation as true, the metadata manager will add the current field in the required list.

Localization

So far the static labels for metadata editor have been localized using gettext() and i18n .po files.

Since the jsonschema is much more dynamic in its behaviour, allowing very custom schema, we need other ways for localizing the schema fields labels.

The proposed/implemented idea is to use a thesaurus with a well known identifier labels_i18n where

  • Thesaurus:
    • identifier = "labels_i18n"
  • ThesaurusKeyword:
    • about = field name
    • alt_label= field name
  • ThesaurusKeywordLabel:
    • localized stuff

In order to avoid clashes, the schema fields titles should be prefixed with the metadata handler id ("inspire", "cnr", ...)

The implementation will look for translations in the thesaurus, and then will the gettext as fallback.

Some improvements to the thesauri manament commands will be done in order to better manage updates.
For instance, we may want to

  • merge a thesaurus file into an existing thesaurus
  • import from a thesaurus file only a subset of the keywords
  • ...

Backwards Compatibility

TODO

Future evolution

This refactoring allows for other major improvemetns in GeoNode.
Since this is quite a big improvement in GeoNode that introduces breaking changes, the plans before releasing a stable 5.0 include some other improvements/refactoring, that will be discussed each in its own issue:

  • contact roles refactoring
    • improve the model allowing non-system users to be contact points
    • easier customization
  • codelists into thesauri
    • remove many custom codelist tables
    • improve localization
  • move owner and group outside the metadata editor, since they are not informative metadata, but operative info, and should be handler in a differet localtion and with different privileges. At the moment we don't want the metadata schema to chhange according to user privileges.

Most of the pure metadata fields in ResourceBase can be moved as SparseFields, improving the overall performance of the ORM procedures.

Feedback

TODO

Voting

Project Steering Committee:

  • Alessio Fabiani:
  • Francesco Bartoli:
  • Giovanni Allegri:
  • Simone Dalmasso:
  • Toni Schoenbuchner:
  • Florian Hoedt:

Links

Remove unused links below.

@etj
Copy link
Contributor Author

etj commented Sep 19, 2023

Some comments from another issue referencing this topic:

@etj etj changed the title GNIP =DRAFT=: New metadata editor GNIP _WIP_: New metadata editor Sep 19, 2023
@mwallschlaeger
Copy link
Member

Thanks for the proposal @gannebamm . I just wanted to add some of my thoughts on this topic to the discussion. First of all I want to give some impression on the current interaction with metadata inside of geonode. As geonode ist a project that is growing for long time it has a long history of internal APIs. In the current implementation the metadata values are accessed through Django methods like Querrysets, and Forms and since some time also by the django-rest-api including their serializers and Viewsets. Therefore when adding new fields to the metadata model the following parts need editing:

  • rest v2 API (serializer, Viewset)
  • Django Forms need to be created in order to edit the metadata in the metadata editor
  • and if somewhere else metadata values got accessed they can use querrysets
  • django templates

This different methods also require different validation strategies which require some maintenance when editing the metadata model.

Therefore, as we now have a more or less robust rest API, I would suggest to only use the API for interaction with the metadata model. Additionally the base.forms and templates could be removed, and a lot of code needs to be edited to make metadata interaction generally API based. But afterwards, there would be place for a new metadata editor only using rest v2 to interact with geonode django-rest-api backend. This would also allow wider possibilities for geonode metadata editors and would make space for geonode project allow different templates for the metadata editor like (slim, medium, ISO19115:2003).

Still if somebody requires more or other fields for metadata, she need to change the database model, rest API and UI and something not mentioned yet, the pyCSW interaction. For the pyCSW interaction I go definitely with the issue #9147 which would make it easier to bring changes to metadata model to the CSW API.

@giohappy
Copy link
Contributor

giohappy commented Sep 19, 2023

@gannebamm @mwallschlaeger The high level design I'm planning for the rewrite of the metadata editor is the following:

  • implement a model for the configuration of the metadata editor panels that the client will use to build the new editor. This model will include at least the following:
    • fieldsets (that can be rendered as panels, blocks of widgets, tabs, etc.)
    • fieldsets:
      • fields: list of field configurations
      • name: name of the fieldset
    • field configuration:
      • label
      • name
      • widget type (select, textarea, text input, date, etc.)
      • enums: either a fixed list of values or the API endpoint from where the list can be retrieved dynamically (e.g. categories, regions, etc.)
      • cardinality: 0..n, 1, 1..n, etc.
      • advanced: true/false

As you will notice, this configuration is very similar to what we have for the new filters and facets API.

This configuration could be initially serialized (JSON) and served to the client statically. It could be overridden inside a project, similar to the geocode config overrides.
A second step could be to make the configuration build dynamically and define an interface for modules to register additional fieldsets, fields, and field configurations dynamically.

The goal is to:

  • make configurations, APIs, and the client for the metadata editor converge on what we already do for filters, avoiding duplicated models and configuration schemas where possible
  • make it generic enough to be able to implement other custom forms, beyond the metadata editor
  • merge the Wizard / Advanced visualizations into a single editor, with fieldsets organized in collapsible panels (for example) and with togglable advanced fields

Ok, this is a very rough description. I wanted to take more time to write it down, but given the WIP GNIP has been created already I think it was useful to share it right away.

@mwallschlaeger
Copy link
Member

@giohappy thanks for sharing this idea with us. I was spinning my head around this for some time the recent days. If i understand you correct this is only related to the UI part, so the JSON would not configure any API endpoints or database columns behind the API right?

I think generally your idea could be a good approach to make it easier to edit metadata editor in geonode. But for some projects i was working on in the past on which I applied the same approach, I ended up with blowing and extending the JSON definition more and more because the definitions are not fine grained enough. And when I think about my latest experiences with the widgets and forms in geonode I guess it's going to be very hard to implement all the aspects of this into a simple JSON model. I still like the idea and approach to make it more easy to edit the metadata editor view.

@gannebamm
Copy link
Contributor

Just a quick mention of some work done by @t-book in that regard: https://github.com/csgis/custom_metadata

@etj etj changed the title GNIP _WIP_: New metadata editor GNIP 97: New metadata editor Oct 10, 2023
@etj etj added the gnip A GeoNodeImprovementProcess Issue label Oct 10, 2023
@t-book
Copy link
Contributor

t-book commented Oct 16, 2023

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ). But I would wait for the final GNIP.

@gannebamm
Copy link
Contributor

@t-book What other alternatives would be maintainable? One of the goals of @giohappy is

make configurations, APIs, and the client for the metadata editor converge on what we already do for filters, avoiding duplicated models and configuration schemas where possible

Which implies using mapstore2 as the framework.

Even though I also see mapstore2 developing as a daunting and complex task. However, I am no front-end developer; therefore, this is heavily biased and likely due to inexperience.

@ridoo
Copy link
Contributor

ridoo commented Oct 23, 2023

@giohappy as far I can see, the API you have in mind would provide client instructions how to render widgets based on metadata which is might be configurable on the backend side.

Do you have any reasons to include such a coupling between REST API and client? When it comes to metadata, I would have expected the client to make sense of the metadata for its own scope (a client does not always render things).

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

If "coupling" gets necessary, maybe configurable layouting on the client side can be an option.

@giohappy
Copy link
Contributor

Do you have any reasons to include such a coupling between REST API and client? When it comes to metadata, I would have expected the client to make sense of the metadata for its own scope (a client does not always render things).

It depends on what you mean by coupling. My proposal is to have a way to tell the client what we want to edit, and how to lay it out (given a list of available widgets/components). This configuration must be dynamic because we want projects to extend and/or modify it. Or even, in the future, create custom forms beyond the metadata editor.

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

Here we're talking of something different. We want a model to describe forms, I don't see how this can be covered by the current API.

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ). But I would wait for the final GNIP.

React doesn't imply MapStore2. We can see if MapStore2's routing and state management can help or not here, but we're not assuming that we MUST use MapStore2.
On the other hand, we promote React2 to avoid using vanilla Javascript for such an important and complex component and to harmonize the client components.
Of course, this can be discussed with the people that will work on this. What we really care is avoid Frankesteins :)

We're still fighting with the plethora of approaches in the legacy templates. Parts written with Angular, others with jQuery + dozens of unmaintained libraries.

@ridoo
Copy link
Contributor

ridoo commented Oct 24, 2023

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

Seems, that I thought you wanted to add this to the GeoNode API (/api/v2). In general, I agree, that geonode-mapstore-client has lot of coupling with the UI .. it actually provides the UI (with some GeoNode integration). Here, such API would be better located than in the GeoNode API of course.

However, until now I have thought of the backend implementation of geonode-mapstore-client is a temporary approach on the path to completely separate frontend from backend. Adding REST API functionality in the contrib module here, feels a bit like cheating 😄, but I may not see all the parts playing a role here. Do you have more strategic insights how geonode-mapstore-client will evolve (a link would also be fine)?

@gannebamm
Copy link
Contributor

@t-book

This would also mean implementing the metadata editor with React as well, right? If so, I am rather skeptical in terms of complexity. ( Based on experiences with the development with mapstore2 ).

@giohappy

React doesn't imply MapStore2. We can see if MapStore2's routing and state management can help or not here, but we're not assuming that we MUST use MapStore2.
On the other hand, we promote React2 to avoid using vanilla Javascript for such an important and complex component and to harmonize the client components.
Of course, this can be discussed with the people that will work on this. What we really care is avoid Frankesteins :)

@giohappy I completely understand the need to converge on one main frontend framework and to get rid of old loose ties (like angular stuff). Would you @t-book think that a 'plain' React-based metadata editor would be easier to extend/develop for, or is this still to much overhead in your opinion?


@giohappy

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

@ridoo

[...] Here, such API would be better located than in the GeoNode API of course.

However, until now I have thought of the backend implementation of geonode-mapstore-client is a temporary approach on the path to completely separate frontend from backend. Adding REST API functionality in the contrib module here, feels a bit like cheating 😄, but I may not see all the parts playing a role here.

What is the worst case in using the geonode-mapstore-client django app to enable that API? I understand your resentments and that thus API would put pressure on the geonode-mapstore-client application and, therefore, make it harder to completely decouple GeoNode core from its UI components. On the other hand you could use the geonode-mapstore-client django app as example to eg build a geonode-vue-client app, right?

@ridoo
Copy link
Contributor

ridoo commented Nov 2, 2023

What is the worst case in using the geonode-mapstore-client django app to enable that API? I understand your resentments and that thus API would put pressure on the geonode-mapstore-client application and, therefore, make it harder to completely decouple GeoNode core from its UI components. On the other hand you could use the geonode-mapstore-client django app as example to eg build a geonode-vue-client app, right?

I would not speak from "worst case" here. As said, my impression was there will be two separate things in the end (of a path I cannot oversee currently):

  1. having a headless GeoNode and
  2. a pure frontend client which is decoupled from the backend

At the moment the geonode-mapstore-client is a django module shipping client side code. To be honest, I did not dig deep enough into it to reason about why this is actually necessary. Ok, there is some route mapping, but also (which led me think the current status is an intermediary solution) still a lot of Django templates which then include the JS.

However, I cannot rate about decisions, in particular because I can't oversee the big picture. To me, however, a completely decoupled client makes a good testing candidate for the headless backend: "Is it possible to create a client based on the REST API available". Of course this may make it harder to realize things, especially with regard to the issue we actually try to discuss here. In any sense, it puts focus on alternate, but generic solutions, i.e. in this case separating "what" (backend) from "how" (frontend).

This however, leaves me with the question, why the client not just uses the metadata to render forms and widgets? Actually, the data is described already by its schema (more or less), isn't it? Perhaps, I do miss something really important here 🙈

EDIT: Keep in mind, I do not bring such a big Django dev history.

@ridoo
Copy link
Contributor

ridoo commented Nov 6, 2023

@giohappy I forgot to reply on this one

You already have an OpenAPI description available. Did you accounted this already, and why not to extend metadata this way? Even JSON schema might be an option (but I have to admit not to thought about this option deeper, yet).

Here we're talking of something different. We want a model to describe forms, I don't see how this can be covered by the current API.

What I mean is this: there is an OpenAPI description available (which takes quite long for whatever reason) .. however, dependent how this gets generated you may think of a client which takes those metadata descriptions to render widgets, right? For example a dataset is described here:

"Dataset": {
  "type": "object",
  "description": "DREST-compatible model-based serializer.",
  "properties": {
    "pk": {
      "type": "string",
      "readOnly": true
    },
    "uuid": {
      "type": "string",
      "readOnly": true
    },
    "name": {
      "type": "string",

  ...
}

Of course, the schema needs more love and would have to be improved here. But a proper OpenAPI schema would be a benefit for more use cases than a custom API within the geonode-mapstore-client IMHO.

There are also libraries which support you rendering things from json schema (combined with templating).

@kilichenko-pixida
Copy link
Contributor

kilichenko-pixida commented Nov 9, 2023

Where should these configurations stay? How do you give it to the client? An API is what I would use. I'm not expecting to implement such an API in GeoNode itself but in the geonode-mapstore-client app. This app has already a lot of "coupling" with the client, being the app that renders the client. At the same time, it decouples the client from GeoNode's core.

It seems to me rendering the component in React from the the configuration it gets from the mapstore-client isn't going to help with the coupling with the core - instead it would either mean maintataining the correspondance between the GeoNode core metadata models and the mapstore-client configurations for React rendering or alternatively providing at least some endpoint in the GeoNode API that would expose the necessary information on the metadata model (which is then used in the mapstore-client to make a configuration palatable for React).

The former apporoach seems like a recipe for errors when metadata models get updated or extended and in the later case why not just expose the metadata schema in a way that would be directly easily palatable and sufficient for rendering?

I naively think it could be done in a way that it's not just hard-coded JSONs but is derived from the models (perhaps also reusing the code for serialization?). And it seems like both a custom API like @giohappy suggested and an approach with a standardized schema description @ridoo mentions would work, just with different trade-offs. But I might be just making many wrong assumtions, please correct me if I am thinking in the wrong direction.

Also, I am not sure I understand the general direction here, please help me out. Is Django app meant to stay within the mapstore client? What functions will it perform that are currently done by Django in the GeoNode core? What is its current scope of responsibilities?

@kilichenko-pixida
Copy link
Contributor

kilichenko-pixida commented Nov 9, 2023

And when I think about my latest experiences with the widgets and forms in geonode I guess it's going to be very hard to implement all the aspects of this into a simple JSON model. I still like the idea and approach to make it more easy to edit the metadata editor view.

@mwallschlaeger what do you think are going to be the the hard aspects? Are there particular parts of the metadata models you're worried about?

@gannebamm
Copy link
Contributor

@kilichenko-pixida will create a simple POC using react-jsonschema-form to test its capabilities and how it could interact with the OpenAPI schema. This should help to follow up on this long discussion thread and have some implementation tests to criticise (good or bad).

@kilichenko-pixida
Copy link
Contributor

kilichenko-pixida commented Jan 25, 2024

@giohappy
As @gannebamm mentioned above, we've worked on a POC for the approach @ridoo suggested.

To recap, the idea is to use an existing React library to render the form to edit the metadata based on the schema, and, as much as possible, to reuse the automatically generated OpenAPI schema we already have. Here is the stand-alone React draft implementation.

The key relevant findings so far:

  1. The suggested library appears to be suitable for the purpose and could be extended as needed. Custom widgets for editing more complex fields are possible and the appearance is fully customizable with CSS.
  2. OpenAPI schema structure is such, that theoretically, it can provide all the necessary info to the React client to render the form.
  3. Static helper strings and encoded options are necessary for the form generation. They are currently not in the schema, but in principle, the schema structure has the properties to support them. Whether and how they should be integrated into the schema is up for discussion.
  4. The current state of the OpenAPI schema is insufficient for the purpose. With my limited background, it's hard to say how much effort is needed to fix it.

The problems with the currently generated schema are manifold:
4.1. The generation is slow - it takes around 5s in the dev environment. This is probably solvable with some form of caching but it hasn't been tried yet.
4.2. The drf_spectacular library used for the introspection into the model and generating the schema seems to currently be poorly integrated. It produces multiple runtime errors (see the log in the linked repo) and as a result, many fields are missing from the schema or are represented incorrectly. It can be fixed manually, but the amount of both upfront and maintenance effort required might defeat the purpose of relying on automatic schema generation.
4.3. There is a bidirectional mismatch between what is marked as readOnly in the API (in the serializers and therefore in the schema) and what is actually editable through the advanced metadata editor.

For a more comprehensive overview of the findings, including further considerations about the schema, please see a prior lengthier post on the matter.

Given the points above, at least several key questions come up:
A) What is the general direction of the OpenAPI schema development? Is it meant to be looked into any time soon? What purpose should it eventually serve?
B) Is the poor integration of the drf_spectacular something that's already known about? Are we missing something important about it?

@ridoo
Copy link
Contributor

ridoo commented Jan 25, 2024

Thanks @kilichenko-pixida for summarizing your findings.

I just wanted to emphasize why we evaluated this path:

  • A good OpenAPI schema gives a structured view how an API is structured. Clients can "understand" how to use it, and how the actual model/payload looks like
  • Using this schema as kind of a contract, a client could use it to generate widgets from it.
  • Furthermore, it is able to generate payload matching the schema, which in turn is acceptable API payload
  • Using the introspection API might also take into account customized metadata (more or less automatically)

Other advantages on that approach would hopefully:

  • Further improve the API (which might be necessary anyway, when moving away from the Django templates)
  • Add machine readable description of the exposed GeoNode API
  • A map-store-client not needing to add another API for internal use (which have to be maintained as well)
  • Smoothen the path for alternate clients to act upon the GeoNode API

@giohappy
Copy link
Contributor

It took some time to read everything and connect the many comments :)

In general, I agree with the approach of basing the form client component / app (whatever it is) on the OpenAPI schema. Theoretically, it's the best and cleanest approach, although with a list of shortcomings that I will list later.
Problems are:

  • as you also noticed, the current introspection solution based on drf_spectacular is (very) far from perfect. It's been introduced as a quick and dirty thing by @afabiani in the middle of the refactoring to GN 4, but it's never been part of a dedicated development task. It was something to start, but that wasn't developed further. I've spent some time on it in the past, and I haven't found easy solutions to make it generate a correct and complete schema without heavy manual work. I think one of the problems is the dynamic rest framework, which adds quite a lot of complexity to DRF itself.
  • there's no plan for the OpenAPI at the moment on our side

So, yes, basing the forms on the OpenAPI would be nice, but in reality, we don't have it now.
This doesn't mean we are against its development. I think we SHOULD have a correct and extensive OpenAPI for GeoNode

Basing the form only on the OpenAPI has its limitations by the way.

  • there are many information that need to be adapted to the current user, project, etc. (labeling, validation, permission on fields, etc.). Generating the OpenAPI on the fly is heavy, as you already noticed. Probably we can use faster tools, but I suspect it will always be heavy. Most of the APIs generate static schemas, also for this reason.
  • not all the "metainformation" for a form can be easily described with an OpenAPI schema (grouping of fields, for example?)

S, we don't have a properly generated OpenAPI yet, and we're not sure that we can express with it everything that is needed for a complex form.

This is why, in the end, we have considered to have a dedicated API. Yes, there will be some duplication, and we must take care of keeping this API aligned with the resource models, but this gives us complete freedom and flexibility to arrange it for the specific purposes of a form, also in terms of performance and dynamicity.

If any of you have a solution for the OpenAPI approach, and a concrete implementation to support the form requirements, I'm fully open to it!

@giohappy
Copy link
Contributor

FYI we're moving this activity forward. Soon we will submit a tech proposal for the refactoring of the metadata services and frontend (editor).

We are keeping into account the proposal to make the metadata editor leverage on the OpenAPI and/or the JSON Schema specifications, but we're finding cases that fall outside the scope of these schemas, that would require custom schemas and protocols in any case. For example, the most important thing that isn't covered well (at least from my findings) is telling how to retrieve dynamic values / enums for selectors. For example, how do you tell the client where and how to retrieve the list of users for selecting owner, poc, etc.?

We don't want to fight against the specification or have to build complex schemas to circumvent any limitations.
For this reason, we'll push forward the idea of using an API dedicated to the description of the metadata schema, along with a new API for their management (CRUD ops).

Another pillar for refactoring is building the metadata management API on top of multiple (configurable) model fields (concert or virtual). We want to detach the metadata structure from the Django models and the DB and let any model contribute its fields to the metadata, even extending them for custom purposes.

The goal is to reduce gain flexibility and reduce the number of places where customizations are required to implement custom metadata models (at the moment it means integrating your own models with the Django models through signaling, override the many metadata templates, etc.).

And finally, this goes hand in hand with the upgrade to PyCSW 3, which will probably be part of this activity.

@gannebamm
Copy link
Contributor

Hi @giohappy
Nice to see there is some ongoing work here! We had some discussions about the whole concept of metadata in GeoNode. One of the major issues we discovered is that metadata entries for publications (in terms of research) are static and must be static by design. The dynamic approach of GeoNode which saves user-objects to contacts is therefore against that principle. Here is an example to make this more clear:

Scientist sci-A works currently for organisation unit org-A. If sci-A uploads dataset data-A he/she will be listed as owner with his/her other contact infos like affiliation to org-A. Later sci-A switches to a new organisation unit like org-B. Due to the dynamic nature of GeoNodes metadata it will result in changed metadata if retrieved in the future for data-A.

Even though this is the prefered way to e.g. contact sci-A, this shall not happen from our point of view. It would for example change the list of datasets from org-A even though data-A was created while sci-A was part of it.

I hope this makes our issue clear?

We are aware of the edge-case-iness from our research institutes point of view. We are also aware that our edge-case will not change how the broader community will develop this feature and may think about other ways to solve our issue. One briefly drafted idea was to use the new #12124 to save static metadata files per dataset.

@mwallschlaeger what about ZALF? Have you also discussed this issue?

@giohappy
Copy link
Contributor

Hi @gannebamm.
I understand your point, The dynamic nature of many metadata fields, and their relations with other model fields, works well in many cases, but far less in others. Yours is an example, but we also have clients that were forced to use the "Metadata upload preserver" option to keep the original uploaded metadata file, and to avoid it being regenerated by GeoNode.

I don't think there's a one-fits-all solution. There could be more advanced options to opt-out from dynamic fields, have metadata versioning, and so on, but all this goes beyond the plans for this refactoring.

@giohappy
Copy link
Contributor

giohappy commented Aug 2, 2024

We have started investigating the various options, including react-jsonschema-form. We have also looked at the PoC from @kilichenko-pixida.

This is our current plan:

  • Investigate which technology to use for the new metadata API. drf-spectacular (which we use in GeoNode) should already support generating JSON Schema. We need to check how far we can get with it. Another option is adopting Django Ninja for the metadata API but I don't know if it is worth it.
  • Investigate if the UI Schema from react-jsonschema-form covers all our needs, including the most specific one: telling the client where and how to retrieve the values for a field (external or internal API endpoint)
  • What approach should be adopted to generate and provide the UI schema to the client? It should be generated dynamically from fields, rather than being a static configuration. This is more or less the same as having a custom metadata editor API @ridoo, but adopting an already defined schema (UI Schema)
  • How to implement the editor. A pure React app would complicate things a lot, because we need to integrate security, CRSF tokens, etc. We will need to depend on the GeoNode MapStore client
  • For extensibility, we envision (for the future) exposing an API to let external code provide custom widgets and fields. This is just an idea, but we have no plan to implement it for now

We will keep you updated about our experiments....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
gnip A GeoNodeImprovementProcess Issue
Projects
None yet
Development

No branches or pull requests

7 participants