Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add basic ESearch support #333

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

nevans
Copy link
Collaborator

@nevans nevans commented Sep 26, 2024

Parse ESEARCH into ESearchResult, with support for generic RFC4466 syntax and RFC4731 ESEARCH return data. For compatibility, ESearchResult#to_a returns an array of integers (sequence numbers or UIDs) whenever any ALL result is available.

Gather ESEARCH response to #search/#uid_search. If the server returns both ESEARCH and SEARCH, both are cleared from the responses hash, but only the ESEARCH is returned.

When the server doesn't send any search responses and return options are passed, return an empty ESearchResult. This empty result will have the appropriate tag and uid values, but no data. Otherwise return an empty SearchResult.

Add keyword param for search return options, and process it differently from criteria. Also extract a return argument out of the criteria array, and process it like the return kwarg. (FYI: I discovered https://bugs.ruby-lang.org/issues/20956 while working on this).

TODO list

  • ESearchResponse < Data.define(:tag, :uid, :data)
    • #data is a generic Array of [name, value] pairs. Not using Hash because some names can be repeated, and the order is significant even between different names. data.assoc(name) can be used for quick lookup.
      • TODO: test cases for unknown extensions
      • TODO: document that ExtensionData will be used for unknown return data
    • Methods for compatibility with SearchResult:
      • #to_a returns numbers from #all, #partial, or an empty array
      • #modseq
    • Methods for ESEARCH return data: #min, #max, #all, #count
    • Methods for ESEARCH + CONDSEQ: #modseq
  • Parse ESEARCH into ESearchResponse
    • Return data for unknown extensions: use ExtensionData.new(tagged_ext_value)
    • Return data for ESEARCH and CONDSTORE (MIN, MAX, ALL, COUNT, MODSEQ): parse as Integer or SequenceSet
  • Support in #search and #uid_search
    • Support for return options
      • return kwarg: must be an Array, and process differently than criteria.
      • extract return from array criteria and process differently than criteria
      • When server doesn't return any response but return options are used, return an empty ESearchResponse rather than an empty SearchResponse.
      • Document supported return options
    • Gather ESEARCH responses in addition to SEARCH responses. If server sends both, prefer to return the ESEARCH responses.
  • For consistency, add charset kwarg to #search, #uid_search

TODO: Remove the following from this PR (make new PRs for them):

  • ESearchResponse methods and data classes
    • remove #to_a returning numbers from #partial
    • remove #partial
      • remove ESearchResponse::PartialResult
    • remove #updates
    • remove #relevancy
  • Parse PARTIAL, ADDTO, REMOVEFROM, RELEVANCY as generic ExtensionData

Related future PRs:

  • Support for PARTIAL (RFC9394, RFC5267)
  • Support for CONTEXT=* (RFC5267)
    • Automatically update a specified SequenceSet?
    • Register a special updates response handler?
  • Support for ESORT (RFC5267)
  • Support for SEARCH=FUZZY (RFC6203)
    • Method to zip #relevancy scores with numbers from #all or #partial
    • Unsorted ESearchResult #to_a, #all, #partial
  • SequenceSet methods:
    • unsorted #numbers
    • More methods for accessing or mutating SequenceSet without affecting the order
    • #prepend - like #append but adds to the beginning
    • #insert(index, value)
    • #remove(index_or_range)
    • unsorted version of #[], #at, #slice
    • unsorted #zip with array of numbers (for RELEVANCY)
    • unsorted #zip with another SequenceSet (for COPYUID)

@nevans
Copy link
Collaborator Author

nevans commented Oct 8, 2024

FWIW: I discovered in testing that Yahoo does not return ESEARCH results when RETURN (PARTIAL 1:500) is used, contrary to the PARTIAL RFC that was written by Yahoo engineers!

@nevans nevans added this to the v0.6 milestone Oct 8, 2024
@nevans nevans added the IMAP4rev2 Requirement for IMAP4rev2, RFC9051 label Oct 14, 2024
@nevans nevans force-pushed the basic-esearch-support branch 4 times, most recently from 59137e9 to 7dc2006 Compare October 25, 2024 21:21
nevans added a commit to nevans/net-imap that referenced this pull request Nov 8, 2024
This affects `#search`, `#uid_search`, `#sort`, `#uid_sort`, `#thread`,
and `#uid_thread`.

Prior to this, sending a parenthesized list in the search criteria for
any of these commands required the use of strings, which are converted
to `RawData`, which has security implications with untrusted inputs.

With this change, arrays will only be converted into SequenceSet when
_every_ element in the array is a valid SequenceSet input.  Otherwise,
the array will be left alone, which allows us to send parenthesized
lists without using strings and RawData.

For example, some searches this change enables:

* Combining criteria to pass into `OR`, `NOT`, `FUZZY`, etc.
  * `search(["not", %w(flagged unread)])`
    converts to: `SEARCH not (flagged unread)`
* Adding return options (we should also add a return kwarg).
  * `uid_search(["RETURN", ["PARTIAL", 1..50], "UID", 12345..67890])`
    converts to: `UID SEARCH RETURN (PARTIAL 1:50) UID 12345:67890`
  * Note that `PARTIAL` supports negative ranges, which can't be coerced
    to SequenceSet.  They'll need to be sent as strings, for now.
  * Note that searches with return options should return ESEARCH
    results, which are currently unsupported.  See ruby#333.

This _should_ be backward compatible: previously these inputs would
raise an exception.
@nevans nevans force-pushed the basic-esearch-support branch 2 times, most recently from d86e849 to efd2760 Compare November 8, 2024 23:02
@nevans nevans force-pushed the basic-esearch-support branch from efd2760 to 0587646 Compare November 11, 2024 19:01
@nevans nevans force-pushed the basic-esearch-support branch 6 times, most recently from fd7b0f9 to a535b1f Compare November 25, 2024 17:10
@nevans nevans force-pushed the basic-esearch-support branch 10 times, most recently from acdb6f8 to d38357b Compare December 14, 2024 22:17
@nevans nevans force-pushed the basic-esearch-support branch 3 times, most recently from f4cabfa to b486078 Compare December 15, 2024 23:14
nevans and others added 6 commits December 15, 2024 18:22
Parses +ESEARCH+ into ESearchResult, with support for generic RFC4466
syntax and RFC4731 `ESEARCH` return data.

For compatibility, `ESearchResult#to_a` returns an array of integers
(sequence numbers or UIDs) whenever any `ALL` result is available.
If the server returns both `ESEARCH` and `SEARCH`, both are cleared from
the responses hash, but only the `ESEARCH` is returned.

When the server doesn't send any search responses:  If return options
are passed, return an empty ESearchResult.  It will have the appropriate
`tag` and `uid` values, but no `data`.  Otherwise return an empty
`SearchResult`.
This also extracts the `return` kwarg out of the `criteria` array, so it
can be processed differently.
This looks like a bug in prism:
```
$ rbenv shell 3.4.0-rc1
$ ruby -e 'pp ([["foo"]] in [/\Afoo\b/i | [/\Afoo\z/i, *]])'
false
$ ruby --parser=parse.y -e 'pp ([["foo"]] in [/\Afoo\b/i | [/\Afoo\z/i, *]])'
true
```
@nevans nevans force-pushed the basic-esearch-support branch from b486078 to 35ab834 Compare December 15, 2024 23:22
@nevans nevans marked this pull request as ready for review December 15, 2024 23:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IMAP4rev2 Requirement for IMAP4rev2, RFC9051
Development

Successfully merging this pull request may close these issues.

1 participant