Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved cache tagging & purging for relations #5650

Closed
wants to merge 1 commit into from

Conversation

usu
Copy link
Contributor

@usu usu commented Jul 5, 2023

Q A
Branch? 3.1
Tickets
License MIT
Doc PR TBD

This PR improves the cache tagging logic. The aim is to keep the full exactness (no stale data in the HTTP cache) but drastically reducing the amount of wrongly purged data. I open this PR as draft to receive some early feedback before progressing further and writing tests.

Problems with current implementation of cache tagging

  • Each response contains cache tags for all IRIs referenced. No difference is made between linked entities and embedded entities. However, in many write cases, no purge is needed for linked entities (for example if the change only affects properties within the linked entity).
  • For each write operation, the affected item is purged as well as all its related objects. This double-interaction (all related objects included in cache tags + all related objects purged on write) is especially bad for unidirectional relations. A mere property change on /items/1 would purge all /items/x, if they have a unidirectional link to a common entity. (Unidirectional here is not to be understood from a database/doctrine perspective, but from an API/serialization perspective; =property is only serialized from one side of the relationship).

Changes proposed in this PR

Response change tags:

  • pure IRIs are only included in the cache tags for entities that are fully embedded in the response (=requested item + embedded entities / subentities; or items of a collection in case of a collection request)
  • For each relation property , 1 cache tag is added in the format "{iri}#{property}" (happy to use another delimiter if # is not a good one). For xToMany relations, this reduces the number of cache tags from N to 1 (helping towards Varnish cache tags with a big number of entities  #3168).

For write operations, the following tags are purged:

Post

  • the collection which was posted on
  • all related objects, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Drop

  • the IRI of the item itself
  • the collection the item belonged to
  • all related objects, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Update

  • the IRI of the item itself
  • related objects of changed relations only, from the related objects side (=does't purge the related object, if the relation is unidirectional)

Example

{
  "_links": {
    "self": {
      "href": "/items/1"
    },
    "linkedParent": {
      "href": "/linked_parents/2"
    },
    "linkedChildren": [
      {
        "href": "/linked_children/3"
      }
    ],
    "embeddedParent": {
      "href": "/embedded_parents/4"
    },
    "embeddedChildren": [
      {
        "href": "/embedded_children/5"
      }
    ]
  },
  "_embedded": {
    "embeddedParent": {
      "_links": {
        "self": {
          "href": "/embedded_parents/4"
        }
      },
      "dummyProperty": "",
      "id": 4
    },
    "embeddedChildren": [
      {
        "_links": {
          "self": {
            "href": "/embedded_children/5"
          },
          "embeddedGrandchildren": []
        },
        "_embedded": {
          "embeddedGrandchildren": []
        },
        "dummyProperty": "",
        "id": 5
      }
    ]
  },
  "id": 1
}
Request Response cache tags
GET /items/1 /items/1
/items/1#linkedParent
/items/1#linkedChildren
/items/1#embeddedParent
/items/1#embeddedChildren
/embedded_parents/4
/embedded_children/5
/embedded_children/5#embeddedGrandchildren
Request Purged tags
(in bold tag which matches the cache tags from above)
Comment
PATCH /linked_parents/2
{ stringProperty: "dummy" }
/linked_parents/2 Only reponses where /linked_parents/2 is directly embedded are purged.

The cached reponse of "GET /items/1" is still valid
PATCH /linked_children/3
{ parent: "/items/10" }
/linked_children/3
/items/1#linkedChildren
/items/10#linkedChildren
"GET /items/1" is properly purged
POST /linked_children
{ parent: "/items/1" }
/linked_children
/items/1#linkedChildren
"GET /items/1" is properly purged
PATCH /embedded_parents/4
{ dummyProperty: "dummy" }
/embedded_parents/4 "GET /items/1" is properly purged
POST /items
{ embeddedParent: "/embeddedParents/4"}
/items
/embeddedParents/4#children
"GET /items/1" is still valid, because the children property was not serialized on the embeddedParent entity
POST /embedded_grandchildren
{ parent: "/embedded_children/5" }
/embedded_grandchildren
/embedded_children/5#embeddedGrandchildren
"GET /items/1" is properly purged

Happy to add additional examples, if the logic is not clear.

manual cache dependencies

For manual getters inside the entity, the logic doesn't know a priori, which dependencies the getter relies on. This can be manually controlled with extraProperties['cacheDependencies'].

    #[ApiProperty(readableLink: true, extraProperties: ['cacheDependencies' => ['childrenA', 'childrenB']])]
    #[SerializedName('children')]
    public function getCombinedChildren(): array {
        return array_merge(
            $this->childrenA->getValues(),
            $this->childrenB->getValues()
        );
    }

Is this PR will be accepted, we can also transform this into an official property.

Open points & requested feedback

  • Does this approach makes sense? Would like to have a feedback before moving on with writing tests.
  • I'm personally working mostly with haljson format. Still need to do some review/checks, if this also works for the other formats.
  • Would it makes sense to outsource the actual calculation of the cache tags into seperate event listeners instead of hard-baking it into AbstractItemNormalizer? This would it make easier for users to extend or overwrite the cache tag generation with their own logic.

The last point is also related towards the ability to easily extend and/or replace the cache tag system. As of now:

  • Tag collection: Is not easily replaceable, as the logic is hardbaked into the Normalizers (the only way to change the logic is to replace the complete Normalizer). It's somewhat extendable (as documented), by adding additional tags to the request.
  • Purge operations: Is replaceable by replacing the PurgeHttpCacheListener with an own implementation. Not easily extendable, as all logic to generate and push the purge tags is isolated within PurgeHttpCacheListener.

Is there a specific reason, why FOSHttpCacheBundle is not utilized? It has both $responseTagger->addTags() and $cacheManager->invalidateTags() which makes it really easy, to hook into the cache system from anywhere in the logic (e.g. to invalidate additional tags in the state processor for specific operations).

(A public function addTagsFor was for example suggested in #5086, but never implemented.)

@usu
Copy link
Contributor Author

usu commented Jul 13, 2023

@soyuka Do you mind having a look at this to judge whether this is worth pursuing?

@stale
Copy link

stale bot commented Sep 12, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 12, 2023
@stale stale bot closed this Sep 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant