-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes DISCO-2470: Implement the foundation of a Firefox Suggest component #5723
Conversation
660cfe3
to
ad34efd
Compare
b64c945
to
bef40d8
Compare
|
||
This component currently supports the basic Suggest experience only. The basic experience shows suggestions for sponsored and web content from a canned dataset. The component downloads the dataset from [Remote Settings](https://remote-settings.readthedocs.io/en/latest/), stores the suggestions in a local database, and makes them available to the Firefox address bar. Because matching is done locally, Mozilla never sees the user's query. | ||
|
||
The opt-in "Improved Firefox Suggest Experience", which sends user queries to a [Mozilla-owned proxy server](https://mozilla-services.github.io/merino/intro.html) for server-side matching, is not currently supported. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗒️ I'm hedging a little bit with the "not currently" here 😅 Eventually, I'd love for us to add a Merino client to this component, and have it return suggestions in the same format. We could use that to power AccuWeather suggestions, and bring the whole opt-in Merino experience to mobile. For now, though, I wanted to make it clear that we only support the "offline" experience.
I also used "basic Suggest experience" and "improved Suggest experience" here, instead of "offline" and "online" or "Suggest" and "Suggest Plus", because we haven't used that terminology outside of our team yet, and "improved" is how the SUMO FAQ page describes it.
} | ||
|
||
#[test] | ||
fn ingest() -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🗒️ This is an end-to-end test that exercises the happy path. We'll definitely want to add more coverage, but I'm curious about folks' opinions on the right balance of tests to add in this PR, vs. a follow-up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"
components/suggest/src/store.rs
Outdated
/// the application. A mobile device on a metered connection might want to | ||
/// request a small subset of the Suggest data and fetch the rest later, | ||
/// while a desktop on a fast link might request the entire dataset on first | ||
/// launch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 Some other ideas for using constraints:
- Imagine a future where we have "offline" location-based suggestions. We could have a scheme that chunks these suggestion by region in Remote Settings, and constrain the ingestion to just the chunk based on the user's current (coarse) location—so that we'll only download suggestions near where the user is, instead of downloading the entire dataset and using up space to store suggestions for places they might never visit.
- If the user just installed Firefox, we might want to ingest an initial set of "higher-value" suggestions that we can show right away, and ingest the rest later. This would also require changing our Remote Settings schema, as I don't think records currently know what suggestions are in their attachments—but it's a thought for making Suggest more useful to first-time users.
- If the user is on cellular, we could similarly limit the amount of work we do. Android's
WorkManager
lets you schedule a task based on the network connection type, but we might still want to fetch a smaller set of suggestions even on cellular—just not a full fetch.
components/suggest/src/store.rs
Outdated
pub struct IngestLimits { | ||
/// The maximum number of records to request from Remote Settings. | ||
/// Each record has about 200 suggestions. | ||
pub records: Option<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💭 I added this constraint mostly for smoke testing against a real Remote Settings, so that I could kick the tires on the schema without ingesting the entire 7 MB dataset every time...
But, I think "approximate number of suggestions" might be a more useful constraint than number of records, if it's valuable at all. "Number of records" (and chunks) is an implementation detail; if the consumer requests about 500 suggestions, we can do the calculation in the component that it's 3 records, without leaking that detail to consumers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to hiding the details of what a record
is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀
Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or do we plan to have dynamic configuration of limits based on a user's network 👀
Yep, I was thinking something closer to this. I'd love to explore generalizing it to our other components, too, but definitely in the far future!
My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.
Taking that even further, we could make the limits adaptive: see how long it takes to download some data, and figure out what to prioritize and how much to download based on that.
I'm not sure it's super helpful for where we are right now, since Suggest is a ~7 MB, infrequently changing dataset (1-2x/month; when it does change, it's only a few chunks, not the entire dataset) that's chunked arbitrarily now.
But I could see us wanting to do this if we ever want to support novel suggestion types, like location-based suggestions, in the future—where mirroring the entire dataset might not work well.
How does that line up with what you're thinking? 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't realize I didn't get back to this 😅
My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.
I like it! And seems like it would serve our use case here of making sure the user gets a good experience even if they are on a low data network connection - once we ship and see how well our configuration story is in practice I'd love for us to visit if there's value in the other components, but like you said, that's in the far future!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My half-baked idea is, a consumer could use platform APIs to query whether the device is on a metered connection (and/or has Data Saver or Low Data Mode on), or to check its battery level, and then pick a set of 👋🏼 arbitrary but sensible 👋🏼 limits based on that.
I guess one vague concern about this is that "arbitrary but sensible" is really a contradiction. Picking something sensible probably means the app needs to know more about the implementation than really makes sense. I'm wondering if it's better for the app to communicate what it does know for sure (ie, it's in a kind of data-saver mode) which the component uses to choose appropriate (but not arbitrary ;) limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic! While some further tests would be nice, I see no reason they can't be done in a followup - indeed, there's literally nothing here which I think must be fixed before landing even if some tweaks are desired before releasing - so I think you should land this whenever you like :)
} | ||
|
||
#[test] | ||
fn ingest() -> anyhow::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A followup is fine IMO, especially if we assume you'd like others on the team to help contribute to them. Your team can then make their own decision about when the coverage is "good enough"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @linabutler 🚀 🚀 This looks great!!
Had mostly questions as I was going through - didn't dig too much into the business logic 😛 This looks great to me and I'm super excited for another cross-platform component 🤩
components/suggest/src/store.rs
Outdated
pub struct IngestLimits { | ||
/// The maximum number of records to request from Remote Settings. | ||
/// Each record has about 200 suggestions. | ||
pub records: Option<u64>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to hiding the details of what a record
is - I like the idea of constraints in that it helps for Mobile vs Desktop network connection, but I wonder how the consumer will make the decision of the constraint.. is the plan to run some smoke tests with real devices on throttled networks to see how long the network takes? Or do we plan to have dynamic configuration of limits based on a user's network 👀
Regardless I feel like this is a followup but I'm interested in the methodology to pick a limit as I'm sure we can learn from that in other places (Nimbus, other consumers of Remote Settings, Sync?)
Codecov ReportPatch and project coverage have no change.
Additional details and impacted files@@ Coverage Diff @@
## main #5723 +/- ##
=======================================
Coverage 89.86% 89.86%
=======================================
Files 1 1
Lines 148 148
=======================================
Hits 133 133
Misses 15 15 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Lina! This is cool stuff. 🚢
This commit adds a new component to fetch search suggestions from Remote Settings and store them in an SQLite database.
Before, creating a provider would block the calling thread until the database was set up. Now, migrations will run the first time the database is accessed, which we already expect to happen on a background thread.
29795bd
to
43245de
Compare
This commit adds an empty UniFFI interface for the new Suggest component.
In other uses, "provider" refers to the source of a suggestion, like adM, Wikipedia, AMO, etc. Since `SuggestionProvider` aggregates suggestions from all these sources, calling it a "provider" could be confusing, especially if we support filtering suggestions from specific sources in the future.
The Gecko JS UniFFI bindings don't support synchronous function calls yet (https://bugzilla.mozilla.org/show_bug.cgi?id=1842433), so the builder API will be a little awkward to use on Desktop. This commit replaces the builder with a dictionary, with default fields to make it more ergonomic.
This commit adds documentation and tests for `full_keyword`, and reimplements it using the Rust iterator methods.
This commit exposes `SuggestDb.clear()` as `SuggestStore.clear()`, and ensures that it removes all icons in addition to suggestions and metadata.
This commit changes the visibility of crate-internal types and methods from `pub` to `pub(crate)`, and clarifies the difference between the two error types: * `SuggestApiError` is the public error type, for all errors that should be handled by the application. * `Error` is the internal error type, used in the implementation of the component, and is not part of the public interface.
This commit makes it possible to combine multiple database accesses into a single transaction. Previously, each `SuggestDb` method would lock the database connection, and optionally run its own transaction before releasing the lock. This meant that a single logical operation, like ingesting all suggestions from an attachment, would involve multiple transactions. This commit moves the database accesses into a new `SuggestDao` type, which takes a reference to an existing locked connection. The new `SuggestDb::{read, write}` methods handle locking and transaction management, and take a closure that receives a `SuggestDao` for reading and writing. Co-authored-by: Tif Tran <ttran@mozilla.com>
`SuggestDb` already protects the connection with a Rust mutex, so SQLite's per-connection mutex is unnecessary.
This commit also generalizes `IngestLimits.records` to `SuggestIngestionConstraints.max_suggestions`, since the partitioning scheme and number of suggestions per Remote Settings attachment is an implementation detail.
"Fetching" is a bit of an overloaded term that can refer to querying ingested suggestions, or downloading records from Remote Settings. To reduce confusion, let's replace those uses with more precise terminology: * "Downloading" for when we request suggestions from Remote Settings. * "Ingesting" for storing the downloaded suggestions in the database. * "Querying" for when the application requests ingested suggestions.
This matches how we pass dictionaries across the FFI boundary in our other components.
A bit of future-proofing, in case we support multiple Remote Settings collections in the future.
This commit renames: * `RemoteSuggestion` to `DownloadedSuggestion` and `SuggestDataAttachmentContents` to `DownloadedSuggestDataAttachment`, for consistency with the "downloading" terminology. * `RemoteRecordId` to `SuggestRecordId`, for consistency with the other Remote Settings types.
43245de
to
c43a481
Compare
This commit adds a new Suggest component to power the Firefox Suggest feature. It downloads a dataset of suggestions from Remote Settings, stores them in an SQLite database with a simple schema, and makes them available to queries.
We'll iterate on this over time: more comprehensive tests, additions to the interface, and schema changes will definitely be coming soon, and you can take a peek at some of that upcoming work in this Jira epic! This initial PR is meant to provide a foundation for that work, and not so much a production-quality component that we'll ship right away.
It can be reviewed commit-by-commit—that might be helpful for folks newer to Rust and Application Services, as an example of how to get a new component off the ground—or as a mega-diff. For the newer folks, some of the patterns in this new component show up in our other components, too; if you're curious, I definitely recommend checking out the
.udl
files for the others!/cc @mozilla/disco-services
Pull Request checklist
[ci full]
to the PR title.Branch builds: add
[firefox-android: branch-name]
to the PR title.