improved cache tagging & purging for relations #5650

usu · 2023-07-05T16:38:55Z

Q	A
Branch?	3.1
Tickets
License	MIT
Doc PR	TBD

This PR improves the cache tagging logic. The aim is to keep the full exactness (no stale data in the HTTP cache) but drastically reducing the amount of wrongly purged data. I open this PR as draft to receive some early feedback before progressing further and writing tests.

Problems with current implementation of cache tagging

Each response contains cache tags for all IRIs referenced. No difference is made between linked entities and embedded entities. However, in many write cases, no purge is needed for linked entities (for example if the change only affects properties within the linked entity).
For each write operation, the affected item is purged as well as all its related objects. This double-interaction (all related objects included in cache tags + all related objects purged on write) is especially bad for unidirectional relations. A mere property change on /items/1 would purge all /items/x, if they have a unidirectional link to a common entity. (Unidirectional here is not to be understood from a database/doctrine perspective, but from an API/serialization perspective; =property is only serialized from one side of the relationship).

Changes proposed in this PR

Response change tags:

pure IRIs are only included in the cache tags for entities that are fully embedded in the response (=requested item + embedded entities / subentities; or items of a collection in case of a collection request)
For each relation property , 1 cache tag is added in the format "{iri}#{property}" (happy to use another delimiter if # is not a good one). For xToMany relations, this reduces the number of cache tags from N to 1 (helping towards Varnish cache tags with a big number of entities #3168).

For write operations, the following tags are purged:

Post

the collection which was posted on
all related objects, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Drop

the IRI of the item itself
the collection the item belonged to
all related objects, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Update

the IRI of the item itself
related objects of changed relations only, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Example

{
  "_links": {
    "self": {
      "href": "/items/1"
    },
    "linkedParent": {
      "href": "/linked_parents/2"
    },
    "linkedChildren": [
      {
        "href": "/linked_children/3"
      }
    ],
    "embeddedParent": {
      "href": "/embedded_parents/4"
    },
    "embeddedChildren": [
      {
        "href": "/embedded_children/5"
      }
    ]
  },
  "_embedded": {
    "embeddedParent": {
      "_links": {
        "self": {
          "href": "/embedded_parents/4"
        }
      },
      "dummyProperty": "",
      "id": 4
    },
    "embeddedChildren": [
      {
        "_links": {
          "self": {
            "href": "/embedded_children/5"
          },
          "embeddedGrandchildren": []
        },
        "_embedded": {
          "embeddedGrandchildren": []
        },
        "dummyProperty": "",
        "id": 5
      }
    ]
  },
  "id": 1
}

Request	Response cache tags
GET /items/1	/items/1 /items/1#linkedParent /items/1#linkedChildren /items/1#embeddedParent /items/1#embeddedChildren /embedded_parents/4 /embedded_children/5 /embedded_children/5#embeddedGrandchildren

Request	Purged tags (in bold tag which matches the cache tags from above)	Comment
PATCH /linked_parents/2 { stringProperty: "dummy" }	/linked_parents/2	Only reponses where /linked_parents/2 is directly embedded are purged. The cached reponse of "GET /items/1" is still valid
PATCH /linked_children/3 { parent: "/items/10" }	/linked_children/3 /items/1#linkedChildren /items/10#linkedChildren	"GET /items/1" is properly purged
POST /linked_children { parent: "/items/1" }	/linked_children /items/1#linkedChildren	"GET /items/1" is properly purged
PATCH /embedded_parents/4 { dummyProperty: "dummy" }	/embedded_parents/4	"GET /items/1" is properly purged
POST /items { embeddedParent: "/embeddedParents/4"}	/items /embeddedParents/4#children	"GET /items/1" is still valid, because the children property was not serialized on the embeddedParent entity
POST /embedded_grandchildren { parent: "/embedded_children/5" }	/embedded_grandchildren /embedded_children/5#embeddedGrandchildren	"GET /items/1" is properly purged

Happy to add additional examples, if the logic is not clear.

manual cache dependencies

For manual getters inside the entity, the logic doesn't know a priori, which dependencies the getter relies on. This can be manually controlled with extraProperties['cacheDependencies'].

    #[ApiProperty(readableLink: true, extraProperties: ['cacheDependencies' => ['childrenA', 'childrenB']])]
    #[SerializedName('children')]
    public function getCombinedChildren(): array {
        return array_merge(
            $this->childrenA->getValues(),
            $this->childrenB->getValues()
        );
    }

Is this PR will be accepted, we can also transform this into an official property.

Open points & requested feedback

Does this approach makes sense? Would like to have a feedback before moving on with writing tests.
I'm personally working mostly with haljson format. Still need to do some review/checks, if this also works for the other formats.
Would it makes sense to outsource the actual calculation of the cache tags into seperate event listeners instead of hard-baking it into AbstractItemNormalizer? This would it make easier for users to extend or overwrite the cache tag generation with their own logic.

The last point is also related towards the ability to easily extend and/or replace the cache tag system. As of now:

Tag collection: Is not easily replaceable, as the logic is hardbaked into the Normalizers (the only way to change the logic is to replace the complete Normalizer). It's somewhat extendable (as documented), by adding additional tags to the request.
Purge operations: Is replaceable by replacing the PurgeHttpCacheListener with an own implementation. Not easily extendable, as all logic to generate and push the purge tags is isolated within PurgeHttpCacheListener.

Is there a specific reason, why FOSHttpCacheBundle is not utilized? It has both $responseTagger->addTags() and $cacheManager->invalidateTags() which makes it really easy, to hook into the cache system from anywhere in the logic (e.g. to invalidate additional tags in the state processor for specific operations).

(A public function addTagsFor was for example suggested in #5086, but never implemented.)

usu · 2023-07-13T18:44:37Z

@soyuka Do you mind having a look at this to judge whether this is worth pursuing?

stale · 2023-09-12T03:33:18Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

improved cache tagging & purging for relations

25d0d38

This was referenced Aug 14, 2023

POC: HTTP reverse proxy cache ecamp/ecamp3#3610

Merged

chore(HttpCache): outsource cache tag collection into normalizer events #5753

Closed

feat(serializer): collect cache tags using a TagCollector #5758

Merged

stale bot added the stale label Sep 12, 2023

stale bot closed this Sep 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improved cache tagging & purging for relations #5650

improved cache tagging & purging for relations #5650

usu commented Jul 5, 2023

usu commented Jul 13, 2023

stale bot commented Sep 12, 2023

improved cache tagging & purging for relations #5650

improved cache tagging & purging for relations #5650

Conversation

usu commented Jul 5, 2023

Problems with current implementation of cache tagging

Changes proposed in this PR

Example

manual cache dependencies

Open points & requested feedback

usu commented Jul 13, 2023

stale bot commented Sep 12, 2023