GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

mwallschlaeger · 2023-04-03T13:33:58Z

GNIP 93 - Allowing changes to the dataset attributes table via REST API

Overview

I guess this might be to small for a GNIP, but it might require a discussion of the PSC.

Currently making changes to the attribute table of a dataset is only possbile via the Advanced Metadata Editor form. To further restify GeoNode it would be nice have the possibility to change attributes via the REST API.

Proposed By

@mwallschlaeger : Marcel Wallschlaeger

Assigned to Release

This proposal is for GeoNode 4.1.

State

Motivation

The main motivation for us is requirement to upload data into our GeoNode instance using an external service only communicating to GeoNode through the REST API. Therefor to set or change attributes of a dataset this workflow requires the attribute manipulation via REST interface.

Proposal

Currently the Dataset REST endpoint has a read_only field for the attributes. In my opinion I would leave this construct as it is. and add another endpoint below the Dataset. Something like:

router.register(r"datasets/(?P<pk>\d+)/attributes",views.DatasetAttributesViewSet, "attributes")

Which handles the changes to the attributes of a dataset. This Endpoint would allow to change the "description", "attribute_label", "visable", "featureinfo_type, "display_order".

Backwards Compatibility

I dont think so.

Future evolution

For our specific usecase we further want to add fields to the Attributes models which might be interesting for the whole GeoNode. This would be "unit" and "keywords" for a column so in a later implementation users could find dataset including a specific type of attribute inside of a dataset. This would allow to make dashboards and stories to compare same units and keyword tags of different datasets more easy.

Feedback

Update this section with relevant feedbacks, if any.

Voting

Project Steering Committee:

Alessio Fabiani:
Francesco Bartoli:
Giovanni Allegri:
Simone Dalmasso:
Toni Schoenbuchner:
Florian Hoedt:

Links

The text was updated successfully, but these errors were encountered:

gannebamm · 2023-11-23T08:54:30Z

I see the need for these capabilities. We are researching how an attribute value update would be possible via DRF and REST. @kilichenko-pixida will post some of his findings soon.

kilichenko-pixida · 2023-11-23T10:35:45Z

I see the need for these capabilities. We are researching how an attribute value update would be possible via DRF and REST. @kilichenko-pixida will post some of his findings soon.

As mentioned, I am working on this in this issue. As of now, I am not adding a separate endpoint for the attributes, I think I found a way to update them through the PATCH api/v2/dataset/<dataset_id>.

I didn't figure out what kind of JSON payload might just work out of the box with the DRF serizlizers for this type of nested field, so instead of passing it as {"attribute_set": [{...}, ...]} (would cause at exception), it would have to be passed as {"data": {"attribute_set": [{...}, ...]}}}. This is later read and parsed in the update method of the DatasetSerializer class (overriding the default from its parent class) and attributes are going to be directly updated from there as well.

I am also preparing a geonodectl command which would allow to pass a JSON file like {"attribute_set": [{...}, ...]} to update the attributes.

kilichenko-pixida · 2023-12-12T19:28:18Z

@gannebamm @ridoo @mwallschlaeger

Here is a summary of the relevant changes that already happened. I been working on this issue and it resulted in two already merged PRs: one to geonode to add the attrtibute processing ability and another to geonodectl to make patching from JSON files possible in geonodectl as mentioned in this issue. As per testing in the dev environment, patching of attributes is now possible, but current implementation might not be a good solution for a direct use of the API as described in this GNIP.

The reason is that currently the attribute_set values needs to be passed wrapped in a "data" field which allows to bypass internal DRF validation and access it from the update method within the DatasetSerializer class. E.g. {"data": {"attribute_set": [{...}, ...]}}}.

This "wrapping" is done automatically when patching attributes from the geonodectl, but that makes patching directly through the API not obvious (though now at least possible). Details on why it was done this way could be found in the discussion on the first PR to geonode - TLDR We don't know how to do it, but it might turn out to be trivial.

mattiagiupponi · 2024-01-23T10:18:22Z

Hi @gannebamm @kilichenko-pixida @ridoo @mwallschlaeger

I was checking the issue and I have a couple of notes about the implementation.

attributes as a model

The field attributes serve a purpose and come from Geoserver. Therefore, we must handle them with care. For example, we retrieve the attribute name and type from Geoserver through a function:

geonode/geonode/geoserver/helpers.py

Lines 991 to 1105 in cdeb2d9

    
           def set_attributes_from_geoserver(layer, overwrite=False): 
        
               """ 
        
               Retrieve layer attribute names & types from Geoserver, 
        
               then store in GeoNode database using Attribute model 
        
               """ 
        
               attribute_map = [] 
        
               if getattr(layer, "remote_service") and layer.remote_service: 
        
                   server_url = layer.remote_service.service_url 
        
                   if layer.remote_service.operations.get("GetCapabilities", None) and layer.remote_service.operations.get( 
        
                       "GetCapabilities" 
        
                   ).get("methods"): 
        
                       for _method in layer.remote_service.operations.get("GetCapabilities").get("methods"): 
        
                           if _method.get("type", "").upper() == "GET": 
        
                               server_url = _method.get("url", server_url) 
        
                               break 
        
               else: 
        
                   server_url = ogc_server_settings.LOCATION 
        
               if layer.subtype in ["tileStore", "remote"] and layer.remote_service.ptype == "gxp_arcrestsource": 
        
                   dft_url = f"{server_url}{(layer.alternate or layer.typename)}?f=json" 
        
                   try: 
        
                       # The code below will fail if http_client cannot be imported 
        
                       req, body = http_client.get(dft_url, user=_user) 
        
                       body = json.loads(body) 
        
                       attribute_map = [ 
        
                           [n["name"], _esri_types[n["type"]]] for n in body["fields"] if n.get("name") and n.get("type") 
        
                       ] 
        
                   except Exception: 
        
                       tb = traceback.format_exc() 
        
                       logger.debug(tb) 
        
                       attribute_map = [] 
        
               elif layer.subtype in {"vector", "tileStore", "remote", "wmsStore", "vector_time"}: 
        
                   typename = layer.alternate if layer.alternate else layer.typename 
        
                   dft_url_path = re.sub(r"\/wms\/?$", "/", server_url) 
        
                   dft_query = urlencode( 
        
                       {"service": "wfs", "version": "1.0.0", "request": "DescribeFeatureType", "typename": typename} 
        
                   ) 
        
                   dft_url = urljoin(dft_url_path, f"ows?{dft_query}") 
        
                   try: 
        
                       # The code below will fail if http_client cannot be imported or WFS not supported 
        
                       req, body = http_client.get(dft_url, user=_user) 
        
                       doc = dlxml.fromstring(body.encode()) 
        
                       xsd = "{http://www.w3.org/2001/XMLSchema}" 
        
                       path = f".//{xsd}extension/{xsd}sequence/{xsd}element" 
        
                       attribute_map = [ 
        
                           [n.attrib["name"], n.attrib["type"]] 
        
                           for n in doc.findall(path) 
        
                           if n.attrib.get("name") and n.attrib.get("type") 
        
                       ] 
        
                   except Exception: 
        
                       tb = traceback.format_exc() 
        
                       logger.debug(tb) 
        
                       attribute_map = [] 
        
                       # Try WMS instead 
        
                       dft_url = ( 
        
                           server_url 
        
                           + "?" 
        
                           + urlencode( 
        
                               { 
        
                                   "service": "wms", 
        
                                   "version": "1.0.0", 
        
                                   "request": "GetFeatureInfo", 
        
                                   "bbox": ",".join([str(x) for x in layer.bbox]), 
        
                                   "LAYERS": layer.alternate, 
        
                                   "QUERY_LAYERS": typename, 
        
                                   "feature_count": 1, 
        
                                   "width": 1, 
        
                                   "height": 1, 
        
                                   "srs": "EPSG:4326", 
        
                                   "info_format": "text/html", 
        
                                   "x": 1, 
        
                                   "y": 1, 
        
                               } 
        
                           ) 
        
                       ) 
        
                       try: 
        
                           req, body = http_client.get(dft_url, user=_user) 
        
                           soup = BeautifulSoup(body, features="lxml") 
        
                           for field in soup.findAll("th"): 
        
                               if field.string is None: 
        
                                   field_name = field.contents[0].string 
        
                               else: 
        
                                   field_name = field.string 
        
                               attribute_map.append([field_name, "xsd:string"]) 
        
                       except Exception: 
        
                           tb = traceback.format_exc() 
        
                           logger.debug(tb) 
        
                           attribute_map = [] 
        
               elif layer.subtype in ["raster"]: 
        
                   typename = layer.alternate if layer.alternate else layer.typename 
        
                   dc_url = f"{server_url}wcs?{urlencode({'service': 'wcs', 'version': '1.1.0', 'request': 'DescribeCoverage', 'identifiers': typename})}" 
        
                   try: 
        
                       req, body = http_client.get(dc_url, user=_user) 
        
                       doc = dlxml.fromstring(body.encode()) 
        
                       wcs = "{http://www.opengis.net/wcs/1.1.1}" 
        
                       path = f".//{wcs}Axis/{wcs}AvailableKeys/{wcs}Key" 
        
                       attribute_map = [[n.text, "raster"] for n in doc.findall(path)] 
        
                   except Exception: 
        
                       tb = traceback.format_exc() 
        
                       logger.debug(tb) 
        
                       attribute_map = [] 
        
               # Get attribute statistics & package for call to really_set_attributes() 
        
               attribute_stats = defaultdict(dict) 
        
               # Add new layer attributes if they don't already exist 
        
               for attribute in attribute_map: 
        
                   field, ftype = attribute 
        
                   if field is not None: 
        
                       if Attribute.objects.filter(dataset=layer, attribute=field).exists(): 
        
                           continue 
        
                       elif is_dataset_attribute_aggregable(layer.subtype, field, ftype): 
        
                           logger.debug("Generating layer attribute statistics") 
        
                           result = get_attribute_statistics(layer.alternate or layer.typename, field) 
        
                       else: 
        
                           result = None 
        
                       attribute_stats[layer.name][field] = result 
        
               set_attributes(layer, attribute_map, overwrite=overwrite, attribute_stats=attribute_stats)

and later set with this function:

geonode/geonode/geoserver/helpers.py

Lines 913 to 988 in cdeb2d9

    
           def set_attributes(layer, attribute_map, overwrite=False, attribute_stats=None): 
        
               """*layer*: a geonode.layers.models.Dataset instance 
        
               *attribute_map*: a list of 2-lists specifying attribute names and types, 
        
                   example: [ ['id', 'Integer'], ... ] 
        
               *overwrite*: replace existing attributes with new values if name/type matches. 
        
               *attribute_stats*: dictionary of return values from get_attribute_statistics(), 
        
                   of the form to get values by referencing attribute_stats[<dataset_name>][<field_name>]. 
        
               """ 
        
               # we need 3 more items; description, attribute_label, and display_order 
        
               attribute_map_dict = { 
        
                   "field": 0, 
        
                   "ftype": 1, 
        
                   "description": 2, 
        
                   "label": 3, 
        
                   "display_order": 4, 
        
               } 
        
               for attribute in attribute_map: 
        
                   if len(attribute) == 2: 
        
                       attribute.extend((None, None, 0)) 
        
               attributes = layer.attribute_set.all() 
        
               # Delete existing attributes if they no longer exist in an updated layer 
        
               for la in attributes: 
        
                   lafound = False 
        
                   for attribute in attribute_map: 
        
                       field, ftype, description, label, display_order = attribute 
        
                       if field == la.attribute: 
        
                           lafound = True 
        
                           # store description and attribute_label in attribute_map 
        
                           attribute[attribute_map_dict["description"]] = la.description 
        
                           attribute[attribute_map_dict["label"]] = la.attribute_label 
        
                           attribute[attribute_map_dict["display_order"]] = la.display_order 
        
                   if overwrite or not lafound: 
        
                       logger.debug("Going to delete [%s] for [%s]", la.attribute, layer.name) 
        
                       la.delete() 
        
               # Add new layer attributes if they doesn't exist already 
        
               if attribute_map: 
        
                   iter = len(Attribute.objects.filter(dataset=layer)) + 1 
        
                   for attribute in attribute_map: 
        
                       field, ftype, description, label, display_order = attribute 
        
                       if field: 
        
                           _gs_attrs = Attribute.objects.filter(dataset=layer, attribute=field) 
        
                           if _gs_attrs.count() == 1: 
        
                               la = _gs_attrs.get() 
        
                           else: 
        
                               if _gs_attrs.exists(): 
        
                                   _gs_attrs.delete() 
        
                               la = Attribute.objects.create(dataset=layer, attribute=field) 
        
                               la.visible = ftype.find("gml:") != 0 
        
                               la.attribute_type = ftype 
        
                               la.description = description 
        
                               la.attribute_label = label 
        
                               la.display_order = iter 
        
                               iter += 1 
        
                           if not attribute_stats or layer.name not in attribute_stats or field not in attribute_stats[layer.name]: 
        
                               result = None 
        
                           else: 
        
                               result = attribute_stats[layer.name][field] 
        
                           if result: 
        
                               logger.debug("Generating layer attribute statistics") 
        
                               la.count = result["Count"] 
        
                               la.min = result["Min"] 
        
                               la.max = result["Max"] 
        
                               la.average = result["Average"] 
        
                               la.median = result["Median"] 
        
                               la.stddev = result["StandardDeviation"] 
        
                               la.sum = result["Sum"] 
        
                               la.unique_values = result["unique_values"] 
        
                               la.last_stats_updated = datetime.datetime.now(timezone.get_current_timezone()) 
        
                           try: 
        
                               la.save() 
        
                           except Exception as e: 
        
                               logger.exception(e) 
        
               else: 
        
                   logger.debug("No attributes found")

in the end, they must always be coherent with Geoserver and the original dataset

Api implementation

Sometimes working with dynamic rest is a headache, I agree. The field always expects a list of IDs rather than a payload to update the values.

The reason is that currently, the attribute_set values need to be passed wrapped in a "data" field which allows us to bypass internal DRF validation and access it from the update method within the DatasetSerializer class. E.g. {"data": {"attribute_set": [{...}, ...]}}}.

This function seems like a workaround solution, which might not be the best approach in case we need to work with another field in the future.

We define a specific action and extend_schema in the viewset, allowing the user to add a schema to the API without creating a new URL.

For example, the extra_metadata:

geonode/geonode/base/api/views.py

Lines 1390 to 1481 in cdeb2d9

    
               @extend_schema( 
        
                   methods=["get", "put", "delete", "post"], description="Get/Update/Delete/Add extra metadata for resource" 
        
               ) 
        
               @action( 
        
                   detail=True, 
        
                   methods=["get", "put", "delete", "post"], 
        
                   permission_classes=[IsOwnerOrAdmin, UserHasPerms(perms_dict={"default": {"POST": ["base.add_resourcebase"]}})], 
        
                   url_path=r"extra_metadata",  # noqa 
        
                   url_name="extra-metadata", 
        
               ) 
        
               def extra_metadata(self, request, pk, *args, **kwargs): 
        
                   _obj = get_object_or_404(ResourceBase, pk=pk) 
        
                   if request.method == "GET": 
        
                       # get list of available metadata 
        
                       queryset = _obj.metadata.all() 
        
                       _filters = [{f"metadata__{key}": value} for key, value in request.query_params.items()] 
        
                       if _filters: 
        
                           queryset = queryset.filter(**_filters[0]) 
        
                       return Response(ExtraMetadataSerializer().to_representation(queryset)) 
        
                   if not request.method == "DELETE": 
        
                       try: 
        
                           extra_metadata = validate_extra_metadata(request.data, _obj) 
        
                       except Exception as e: 
        
                           return Response(status=500, data=e.args[0]) 
        
                   if request.method == "PUT": 
        
                       """ 
        
                       update specific metadata. The ID of the metadata is required to perform the update 
        
                       [ 
        
                           { 
        
                                   "id": 1, 
        
                                   "name": "foo_name", 
        
                                   "slug": "foo_sug", 
        
                                   "help_text": "object", 
        
                                   "field_type": "int", 
        
                                   "value": "object", 
        
                                   "category": "object" 
        
                           } 
        
                       ] 
        
                       """ 
        
                       for _m in extra_metadata: 
        
                           _id = _m.pop("id") 
        
                           ResourceBase.objects.filter(id=_obj.id).first().metadata.filter(id=_id).update(metadata=_m) 
        
                       logger.info("metadata updated for the selected resource") 
        
                       _obj.refresh_from_db() 
        
                       return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all())) 
        
                   elif request.method == "DELETE": 
        
                       # delete single metadata 
        
                       """ 
        
                       Expect a payload with the IDs of the metadata that should be deleted. Payload be like: 
        
                       [4, 3] 
        
                       """ 
        
                       ResourceBase.objects.filter(id=_obj.id).first().metadata.filter(id__in=request.data).delete() 
        
                       _obj.refresh_from_db() 
        
                       return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all())) 
        
                   elif request.method == "POST": 
        
                       # add new metadata 
        
                       """ 
        
                       [ 
        
                           { 
        
                                   "name": "foo_name", 
        
                                   "slug": "foo_sug", 
        
                                   "help_text": "object", 
        
                                   "field_type": "int", 
        
                                   "value": "object", 
        
                                   "category": "object" 
        
                           } 
        
                       ] 
        
                       """ 
        
                       for _m in extra_metadata: 
        
                           new_m = ExtraMetadata.objects.create(resource=_obj, metadata=_m) 
        
                           new_m.save() 
        
                           _obj.metadata.add(new_m) 
        
                       _obj.refresh_from_db() 
        
                       return Response(ExtraMetadataSerializer().to_representation(_obj.metadata.all()), status=201) 
        
               def _get_request_params(self, request, encode=False): 
        
                   try: 
        
                       return ( 
        
                           QueryDict(request.body, mutable=True, encoding="UTF-8") 
        
                           if encode 
        
                           else QueryDict(request.body, mutable=True) 
        
                       ) 
        
                   except Exception as e: 
        
                       """ 
        
                       The request with the barer token access to the request.data during the token verification 
        
                       so in this case if the request.body cannot not access, we just re-access to the 
        
                       request.data to get the params needed 
        
                       """ 
        
                       logger.debug(e) 
        
                       return request.data

In this function, all the code required to add, delete, update, and handle the extra_metadata attribute for the resource is defined. The corresponding endpoint will be /api/v2/resources/{pk}/extra_metadata.

I would suggest having a similar approach for handling the attribute. This will have several benefits:

It will limit the scope of the API to a specific usage.
It will keep the serializer simple by avoiding any additional logic related to a particular field update.
It will make maintenance easier.

ridoo · 2024-01-25T10:10:48Z

@mattiagiupponi thanks for your feedback. I am getting more and more into the rest_framework stuff. There is so much "magic" underneath .. once you have cases which diverge from the "normal" (tm) ones, you'd need much deeper understanding of how all works.

In this specific case, I would be curious how you plan to do edits via the REST API when dropping the legacy metadata editor templates. If you plan to re-use the API v2 (which I assume you will) we should revise it with regard to such changes. To my current understanding, the API v2 is too limited to accept all necessary metadata changes of a dataset, right?

mattiagiupponi · 2024-01-25T15:11:04Z

In this specific case, I would be curious how you plan to do edits via the REST API when dropping the legacy metadata editor templates. If you plan to re-use the API v2 (which I assume you will) we should revise it with regard to such changes. To my current understanding, the API v2 is too limited to accept all necessary metadata changes of a dataset, right?

We are still evaluating how change it, but for sure some API changes might be required

giohappy · 2024-01-25T16:44:07Z

@ridoo we're speaking of dataset attributes, not metadata in general here, right?

giohappy changed the title ~~GNIP #93 - Allowing changes to the dataset attributes table via REST API~~ GNIP - 93: Allowing changes to the dataset attributes table via REST API Jul 4, 2023

giohappy changed the title ~~GNIP - 93: Allowing changes to the dataset attributes table via REST API~~ GNIP 93: Allowing changes to the dataset attributes table via REST API Jul 4, 2023

This was referenced Nov 30, 2023

A new parameter to read json files to patch dataset metadata (including attributes) GeoNodeUserGroup-DE/geonodectl#30

Merged

Patching the attribute set Thuenen-GeoNode-Development/geonode#4

Merged

ridoo mentioned this issue Dec 4, 2023

Feature Branch Status Thuenen-GeoNode-Development/thuenen_atlas#8

Open

30 tasks

ridoo mentioned this issue Feb 3, 2024

Custom React Frontend + geonode API #11919

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

mwallschlaeger commented Apr 3, 2023 •

edited by gannebamm

Loading

gannebamm commented Nov 23, 2023

kilichenko-pixida commented Nov 23, 2023

kilichenko-pixida commented Dec 12, 2023

mattiagiupponi commented Jan 23, 2024

ridoo commented Jan 25, 2024

mattiagiupponi commented Jan 25, 2024

giohappy commented Jan 25, 2024 •

edited

Loading

GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

GNIP 93: Allowing changes to the dataset attributes table via REST API #10873

Comments

mwallschlaeger commented Apr 3, 2023 • edited by gannebamm Loading

GNIP 93 - Allowing changes to the dataset attributes table via REST API

Overview

Proposed By

Assigned to Release

State

Motivation

Proposal

Backwards Compatibility

Future evolution

Feedback

Voting

Links

gannebamm commented Nov 23, 2023

kilichenko-pixida commented Nov 23, 2023

kilichenko-pixida commented Dec 12, 2023

mattiagiupponi commented Jan 23, 2024

attributes as a model

Api implementation

ridoo commented Jan 25, 2024

mattiagiupponi commented Jan 25, 2024

giohappy commented Jan 25, 2024 • edited Loading

mwallschlaeger commented Apr 3, 2023 •

edited by gannebamm

Loading

giohappy commented Jan 25, 2024 •

edited

Loading