feat: improve guidance on total counts (#731) (#788)

* feat: improve guidance on total counts (#731) Signed-off-by: tkrop <tronje.krop@zalando.de> * feat: improve wording as suggested (#731) Co-authored-by: Miha Lunar <mlunar@gmail.com> --------- Signed-off-by: tkrop <tronje.krop@zalando.de> Co-authored-by: Miha Lunar <mlunar@gmail.com>
zalando · Nov 21, 2023 · 6555c13 · 6555c13
1 parent b916b92
commit 6555c13
Show file tree

Hide file tree

Showing 3 changed files with 55 additions and 30 deletions.
diff --git a/chapters/http-headers.adoc b/chapters/http-headers.adoc
@@ -43,7 +43,7 @@ components:
     Default:
       headers:
         ETag:
-          $ref: '#/components/(parameters|headers)/ETag
+          $ref: '#/components/(parameters|headers)/ETag'
 ----
 
 *Note:* It is a question of taste whether headers for responses are defined in

diff --git a/chapters/pagination.adoc b/chapters/pagination.adoc
@@ -6,27 +6,31 @@
 == {MUST} support pagination
 
 Access to lists of data items must support pagination to protect the service
-against overload as well as to support client side iteration and batch processing
-experience. This holds true for all lists that are (potentially) larger than
-just a few hundred entries.
+against overload as well as to support client side iteration and batch
+processing experience. This holds true for all lists that are (potentially)
+larger than just a few hundred entries.
 
 There are two well known page iteration techniques:
 
 * **Offset-based pagination**: numeric offset identifies the first page-entry
-* **Cursor-based pagination** — aka key-based pagination: a unique key identifies the first page-entry
-  (see also https://dev.twitter.com/overview/api/cursoring[Twitter API] or 
+* **Cursor-based pagination** — aka key-based pagination: a unique key
+  identifies the first page-entry (see also
+  https://dev.twitter.com/overview/api/cursoring[Twitter API] or
   https://developers.facebook.com/docs/graph-api/results[Facebook API])
 
-The technical conception of pagination should also consider user experience (see 
-https://www.smashingmagazine.com/2016/03/pagination-infinite-scrolling-load-more-buttons/[Pagination Usability Findings In eCommerce]),
-for instance, jumping to a specific page is far less used than navigation via {next}/{prev}
-page links (see <<161>>). This favors an API design using cursor-based instead of 
-offset-based pagination -- see <<160>>.
+:smashing-pagination: https://www.smashingmagazine.com/2016/03/pagination-infinite-scrolling-load-more-buttons/
+
+The technical conception of pagination should also consider user experience
+(see {smashing-pagination}[Pagination Usability Findings In eCommerce]), for
+instance, jumping to a specific page is far less used than navigation via
+{next}/{prev} page links (see <<161>>). This favors an API design using
+cursor-based instead of offset-based pagination -- see <<160>>.
 
 **Note:** To provide a consistent look and feel of pagination patterns,
 you must stick to the common query parameter names defined in <<137>>.
 
 
+
 [#160]
 == {SHOULD} prefer cursor-based pagination, avoid offset-based pagination
 
@@ -38,10 +42,10 @@ Before choosing cursor-based pagination, consider the following trade-offs:
 
 * Usability/framework support:
   ** Offset-based pagination is more widely known than cursor-based pagination,
-    so it has more framework support and is easier to use for API clients
+    so it has more framework support and is easier to use for API clients.
 * Use case - jump to a certain page:
   ** If jumping to a particular page in a range (e.g., 51 of 100) is really a
-   required use case, cursor-based navigation is not feasible.
+   required use case, cursor-based navigation may not be feasible.
 * Data changes may lead to anomalies in result pages:
   ** Offset-based pagination may create duplicates or lead to missing entries
      if rows are inserted or deleted between two subsequent paging requests.
@@ -52,23 +56,23 @@ Before choosing cursor-based pagination, consider the following trade-offs:
   ** Very big data sets, especially if they cannot reside in the main memory of
      the database.
   ** Sharded or NoSQL databases.
-* Cursor-based navigation may not work if you need the total count of results.
 
 The {cursor} used for pagination is an opaque pointer to a page, that must
 never be *inspected* or *constructed* by clients. It usually encodes (encrypts)
-the page position, i.e. the unique identifier of the first or last page element, the
-pagination direction, and the applied query filters (or a hash over these) to safely 
-recreate the collection (see also best practice <<cursor-based-pagination>> below).
+the page position, i.e. the unique identifier of the first or last page
+element, the pagination direction, and the applied query filters (or a hash
+over these) to safely recreate the collection (see also best practice
+<<cursor-based-pagination>> below).
 
 
 [#248]
 == {SHOULD} use pagination response page object
 
 [[pagination-fields]]
 For iterating over collections (result sets) we propose to either use cursors
-(see <<160>>) or simple hypertext control links (see <<161>>). To implement these
-in a consistent way, we have defined a response page object pattern with the
-following field semantics:
+(see <<160>>) or simple hypertext control links (see <<161>>). To implement
+these in a consistent way, we have defined a response page object pattern with
+the following field semantics:
 
 * [[self]]{self}:the link or cursor pointing to the same page.
 * [[first]]{first}: the link or cursor pointing to the first page.
@@ -136,14 +140,15 @@ ResponsePage:
 ----
 
 *Note:* While you may support cursors for {next}, {prev}, {first}, {last}, and
-{self}, it is best practice to replace these with pagination links -- see <<161>>.
+{self}, it is best practice to replace these with pagination links -- see
+<<161>>.
 
 
 [#161]
 == {SHOULD} use pagination links
 
-To simplify client design, APIs should support <<165, simplified hypertext controls>> 
-as standard pagination links where applicable:
+To simplify client design, APIs should support <<165, simplified hypertext
+controls>> as standard pagination links where applicable:
 
 [source,json]
 ----
@@ -161,11 +166,24 @@ as standard pagination links where applicable:
 }
 ----
 
-See also <<248>> for details on the pagination fields and page result object. 
+See also <<248>> for details on the pagination fields and page result object.
+
+
+[#254]
+== {SHOULD} avoid a total result count
+
+In pagination responses you should generally avoid providing a _total result
+count_, since calculating it is a costly operation that is usually not required
+by clients. Counting the total number of results for complex queries usually
+requires a full scan of all involved indexes, as it is difficult to calculate
+and cache it in advance. While this is only an implementation detail, it is
+important to consider that providing these total counts over the life-span
+of a service might become expensive as the data set grows over time.
+
+As clients may integrate against these counts over time alongside data
+set growth, removing them will be more difficult than not providing them
+in the first place.
 
-*Remark:* You should avoid providing a total count unless there is a clear
-need to do so. Very often, there are significant system and performance
-implications when supporting full counts. Especially, if the data set grows
-and requests become complex queries and filters drive full scans. While this
-is an implementation detail relative to the API, it is important to consider
-the ability to support serving counts over the life of a service.
+If your consumer really requires a total result count in the response, you may
+support this requirement via the {Prefer} header adding the directive
+`return=total-count` (see also <<181>>).
diff --git a/models/headers-1.0.0.yaml b/models/headers-1.0.0.yaml
@@ -107,6 +107,9 @@ If-None-Match:
 
 
 Prefer:
+  # Do not import this schema directly, since processing directives are usually
+  # highly customized. Instead, copy the schema to your API and adjust it to
+  # your needs.
   name: Prefer
   in: header
   required: false
@@ -123,6 +126,10 @@ Prefer:
       return using **204** (No Content) without resource (minimal) or using
       **200** or **201** with resource (representation) in the response body on
       success.
+    * **return=<total-count>** is used to suggest the server to return a total
+      result count in a collection requests supporting pagination. Since this
+      is a costly operation, it should be used with care, and the service may
+      decide to ignore this request.
     * **wait=<delta-seconds>** is used to suggest a maximum time the server has
       time to process the request synchronously.
     * **handling=<strict|lenient>** is used to suggest the server to be strict