Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pagination #279

Merged
merged 32 commits into from
Sep 1, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
75de903
Add workaround for `req_body_json()` and content type
mgirlich Aug 15, 2023
24997de
Add `req_paginate()`
mgirlich Aug 15, 2023
8d44aec
Fix workaround
mgirlich Aug 15, 2023
f327c33
WIP
mgirlich Aug 17, 2023
43f350a
Make `req_paginate()` more lower level
mgirlich Aug 18, 2023
66fd240
Quick documentation
mgirlich Aug 18, 2023
1fe5359
Export `in_query()`, `in_header()`, and `in_body()`
mgirlich Aug 18, 2023
134b27e
Refactor
mgirlich Aug 18, 2023
a940b0b
Change interface to anonymous functions
mgirlich Aug 31, 2023
daf53c6
Fix documentation
mgirlich Aug 31, 2023
76b08b1
Actually check arguments in `check_function2()`
mgirlich Aug 31, 2023
600c9da
Add standalone cli
mgirlich Aug 31, 2023
277a4dc
No need for standalone cli
mgirlich Aug 31, 2023
c2d708d
Add some basic tests
mgirlich Aug 31, 2023
24de724
Remove `calculate_n_pages()`
mgirlich Sep 1, 2023
92b38dd
Improve documentation for `req_paginate()`
mgirlich Sep 1, 2023
ec09637
Rename to `paginate_req_perform()`
mgirlich Sep 1, 2023
61a33ae
Link to `*_req_perform()` from `req_perform()`
mgirlich Sep 1, 2023
aa2d77f
Export `paginate_next_request()`
mgirlich Sep 1, 2023
ba3c937
Check for pagination policy in `paginate_req_perform()`
mgirlich Sep 1, 2023
f0063ed
Simplify `req_paginate_offset()`
mgirlich Sep 1, 2023
01a509e
Store offset in request
mgirlich Sep 1, 2023
39252aa
Fix example for `paginate_req_perform()`
mgirlich Sep 1, 2023
a35ee5e
Rename to `req_paginate_token()`
mgirlich Sep 1, 2023
e5209cf
Kind of support an infinite amount of pages
mgirlich Sep 1, 2023
b3aef39
Add more tests
mgirlich Sep 1, 2023
e54a904
Fix test
mgirlich Sep 1, 2023
1427759
Avoid modern R syntax
mgirlich Sep 1, 2023
ddef0a6
More documentation tweaks
mgirlich Sep 1, 2023
f4b1766
Add pagination to pkgdown yaml
mgirlich Sep 1, 2023
cf1f435
Remove workaround
mgirlich Sep 1, 2023
9d87912
Fix pkgdown
mgirlich Sep 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,8 @@ export(oauth_flow_refresh)
export(oauth_token)
export(obfuscate)
export(obfuscated)
export(paginate_next_request)
export(paginate_req_perform)
export(req_auth_basic)
export(req_auth_bearer_token)
export(req_body_file)
Expand All @@ -61,6 +63,10 @@ export(req_oauth_device)
export(req_oauth_password)
export(req_oauth_refresh)
export(req_options)
export(req_paginate)
export(req_paginate_next_url)
export(req_paginate_offset)
export(req_paginate_token)
export(req_perform)
export(req_progress)
export(req_proxy)
Expand Down
237 changes: 237 additions & 0 deletions R/paginate.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
#' Pagination
#'
mgirlich marked this conversation as resolved.
Show resolved Hide resolved
#' Use `req_paginate()` to specify how to request the next page in a paginated
#' API. Use [paginate_req_perform()] to fetch all pages.
#' If you need more control use a combination of [req_perform()] and
#' [paginate_next_request()] to iterate through the pages yourself.
#' There are also helpers for common pagination patterns:
#' * `req_paginate_next_url()` when the response contains a link to the next
#' page.
#' * `req_paginate_offset()` when the request describes the offset i.e.
#' at which element to start and the page size.
#' * `req_paginate_next_token()` when the response contains a token
#' that is used to describe the next page.
#'
#' @inheritParams req_perform
#' @param next_request A callback function that takes a two arguments (the
mgirlich marked this conversation as resolved.
Show resolved Hide resolved
#' original request and the response) and returns:
#'
#' * a new [request] to request the next page or
#' * `NULL` if there is no next page.
#' @param n_pages A function that extracts the total number of pages from
#' the [response].
#'
#' @return A modified HTTP [request].
#' @seealso [paginate_req_perform()] to fetch all pages. [paginate_next_request()]
#' to generate the request to the next page.
#' @export
#'
#' @examples
#' page_size <- 150
#'
#' request("https://pokeapi.co/api/v2/pokemon") %>%
#' req_url_query(limit = page_size) %>%
#' req_paginate_next_url(
#' next_url = function(resp) resp_body_json(resp)[["next"]],
#' n_pages = function(resp) {
#' total <- resp_body_json(resp)$count
#' ceiling(total / page_size)
#' }
#' )
req_paginate <- function(req,
next_request,
n_pages = NULL) {
check_request(req)
check_function2(next_request, args = c("req", "resp"))
check_function2(n_pages, args = "resp", allow_null = TRUE)

req_policies(
req,
paginate = list(
next_request = next_request,
n_pages = n_pages
)
)
}

#' Perform a paginated request
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document this with req_paginate()?

#'
#' @inheritParams req_perform
#' @param resp An HTTP [response].
#' @param max_pages The maximum number of pages to request.
#' @param progress Display a progress bar?
#'
#' @return A list of responses.
#' @export
#'
#' @examples
#' page_size <- 150
#'
#' req_pokemon <- request("https://pokeapi.co/api/v2/pokemon") %>%
#' req_url_query(limit = page_size) %>%
#' req_paginate_next_url(
#' next_url = function(resp) resp_body_json(resp)[["next"]],
#' n_pages = function(resp) {
#' total <- resp_body_json(resp)$count
#' ceiling(total / page_size)
#' }
#' )
#'
#' responses <- paginate_req_perform(req_pokemon)
paginate_req_perform <- function(req,
max_pages = 20L,
progress = TRUE) {
check_request(req)
check_has_pagination_policy(req)
check_number_whole(max_pages, allow_infinite = TRUE, min = 1)
check_bool(progress)

resp <- req_perform(req)
f_n_pages <- req$policies$paginate$n_pages %||% function(resp) Inf
n_pages <- min(f_n_pages(resp), max_pages)
# the implementation below doesn't really support an infinite amount of pages
# but 100e3 should be plenty
if (is.infinite(n_pages)) {
n_pages <- 100e3
}

out <- vector("list", length = n_pages)
out[[1]] <- resp

cli::cli_progress_bar(
"Paginate",
total = n_pages,
format = "{cli::pb_spin} Page {cli::pb_current}/{cli::pb_total} | ETA: {cli::pb_eta}",
current = 1L
)

for (page in seq2(2, n_pages)) {
req <- paginate_next_request(resp, req)
if (is.null(req)) {
page <- page - 1L
break
}

resp <- req_perform(req)

body_parsed <- resp_body_json(resp)
out[[page]] <- resp

cli::cli_progress_update()
}
cli::cli_progress_done()

# remove unused end of `out` in case the pagination loop exits before all
# `max_pages` is reached
if (page < n_pages) {
out <- out[seq2(1, page)]
}

out
}

#' @export
#'
#' @rdname paginate_req_perform
paginate_next_request <- function(resp, req) {
check_response(resp)
check_request(req)
check_has_pagination_policy(req)

next_request <- req$policies$paginate$next_request
next_request(resp = resp, req = req)
}

#' @param next_url A function that extracts the url to the next page from the
#' [response].
#' @rdname req_paginate
#' @export
req_paginate_next_url <- function(req,
next_url,
n_pages = NULL) {
check_function2(next_url, args = "resp")

next_request <- function(req, resp) {
next_url <- next_url(resp)

if (is.null(next_url)) {
return(NULL)
}

req_url(req, next_url)
}

req_paginate(
req,
next_request,
n_pages = n_pages
)
}

#' @param offset A function that applies the new offset to the request. It takes
#' two arguments: a [request] and an integer offset.
#' @param page_size A whole number that specifies the page size i.e. the number
#' of elements per page.
#' @rdname req_paginate
#' @export
req_paginate_offset <- function(req,
offset,
page_size,
n_pages = NULL) {
check_function2(offset, args = c("req", "offset"))
check_number_whole(page_size)

next_request <- function(req, resp) {
cur_offset <- req$policies$paginate$offset
cur_offset <- cur_offset + page_size
req$policies$paginate$offset <- cur_offset
offset(req, cur_offset)
}

out <- req_paginate(
req,
next_request,
n_pages
)

out$policies$paginate$offset <- 0L
out
}

#' @param set_token A function that applies the new token to the request. It
#' takes two arguments: a [request] and the new token.
#' @param next_token A function that extracts the next token from the [response].
#' @rdname req_paginate
#' @export
req_paginate_token <- function(req,
set_token,
next_token,
n_pages = NULL) {
check_function2(set_token, args = c("req", "token"))
check_function2(next_token, args = "resp")

next_request <- function(req, resp) {
next_token <- next_token(resp)

if (is.null(next_token)) {
return(NULL)
}

set_token(req, next_token)
}

req_paginate(
req,
next_request,
n_pages = n_pages
)
}

check_has_pagination_policy <- function(req, call = caller_env()) {
if (!req_policy_exists(req, "paginate")) {
cli::cli_abort(c(
"{.arg req} doesn't have a pagination policy.",
i = "You can add pagination via `req_paginate()`."
))
}
}
3 changes: 3 additions & 0 deletions R/req-perform.R
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@
#' [response]; otherwise throws an error. Override this behaviour with
#' [req_error()].
#' @export
#' @seealso [multi_req_perform()] to perform multiple requests in parallel.
#' [paginate_req_perform()] to fetch all pages of a requests paginated via
#' [req_paginate()].
#' @examples
#' request("https://google.com") %>%
#' req_perform()
Expand Down
62 changes: 62 additions & 0 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,65 @@ local_write_lines <- function(..., .env = caller_env()) {
writeLines(c(...), path)
path
}

check_function2 <- function(x,
...,
args = NULL,
allow_null = FALSE,
arg = caller_arg(x),
call = caller_env()) {
check_function(
x = x,
allow_null = allow_null,
arg = arg,
call = call
)

if (!is.null(x)) {
.check_function_args(
f = x,
expected_args = args,
arg = arg,
call = call
)
}
}

# Basically copied from rlang. Can be removed when https://github.com/r-lib/rlang/pull/1652
# is merged
.check_function_args <- function(f,
expected_args,
arg,
call) {
if (is_null(expected_args)) {
return(invisible(NULL))
}

actual_args <- fn_fmls_names(f) %||% character()
if (identical(actual_args, expected_args)) {
return(invisible(NULL))
}

n_expected_args <- length(expected_args)
n_actual_args <- length(actual_args)

if (n_expected_args == 0) {
cli::cli_abort(
"{.arg {arg}} must have no arguments, not {n_actual_args} argument{?s}.",
call = call,
arg = arg
)
}

if (n_actual_args == 0) {
arg_info <- "instead of no arguments"
} else {
arg_info <- "not {.arg {actual_args}}"
}

cli::cli_abort(
paste0("{.arg {arg}} must have the {cli::qty(n_expected_args)}argument{?s} {.arg {expected_args}}, ", arg_info, "."),
call = call,
arg = arg
)
}
6 changes: 6 additions & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@ reference:
- req_perform
- req_stream
- multi_req_perform
- paginate_req_perform

- subtitle: Control the process
desc: >
Expand All @@ -53,6 +54,11 @@ reference:
- req_throttle
- req_retry

- title: Pagination
contents:
- req_paginate
- paginate_next_request

- title: Handle the response
contents:
- starts_with("resp_")
Expand Down
4 changes: 2 additions & 2 deletions man/jwt_claim.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading
Loading