-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge sync15 adapter #12
Conversation
I'm not exactly sure if this is a good idea, really.
…at we can avoid raw JSON Values for typical use
… for getting records
…sability improvements also
Rebasing master here failed fairly hard for reasons I don't fully understand.
…lection request query params
@vladikoff AFAICT the ci stuff is broken everywhere, is this correct? Also, how have you and @eoger been building fxa-rust-client? |
pub id: String, | ||
|
||
// It's not clear to me if this actually can be empty in practice. | ||
// firefox-ios seems to think it can... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not one of the publicly-documented fields of a BSO at [1], so absolutely it can be empty.
[1] https://mozilla-services.readthedocs.io/en/latest/storage/apis-1.5.html#basic-storage-object
} | ||
|
||
/// Marker trait that indicates that something is a sync record type. By not implementing this | ||
/// for EncryptedPayload, we can statically prevent double-encrypting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Thom, looks good (granted it was just a cursory look and I barely know what's going on here). I'd question if the #[inline]'s are worth the trouble
use std::borrow::{Borrow, Cow}; | ||
use util::ServerTimestamp; | ||
|
||
/// Tokenserver's timestamp is X-Timestamp and not X-Weave-Timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stale comment?
} | ||
} | ||
|
||
#[derive(Debug, Clone, PartialEq)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a good candidate (probably among others) for #[derive(Default)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do this because it's a bad idea to create one of these without specifying a collection.
Regarding They don't force the compiler to inline ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing work Thom! Many of the comments here are more to help me get up to speed with Rust, and some are probably just plain wrong.
Note that I don't think it's necessary to address these issues before landing - many of them are to start a discussion and should probably just be moved to issues.
Obviously my Rust is very poor, so my r+ doesn't mean a whole lot, but as far as I'm concerned we should get this landed ASAP.
let _ = io::stdout().flush(); // Don't care if flush fails really. | ||
let mut s = String::new(); | ||
io::stdin().read_line(&mut s).expect("Failed to read line..."); | ||
if let Some('\n') = s.chars().next_back() { s.pop(); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using trim_matches seems like it would be clearer, although presumably requiring additional allocation of string objects. For my education, is it written this way for efficiency or for some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, I wouldn't worry about allocations as a result of user input which should be rare. I just saw that https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html was not stable, and got confused as to whether or not trim_matches
was usable.
} | ||
|
||
#[repr(C)] | ||
pub struct PasswordRecordC { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mention below, I wonder if we can just expose a json string instead of individual record types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's possible.
sync15-adapter/src/lib.rs
Outdated
}; | ||
|
||
#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] | ||
pub struct Sync15ServiceInit { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file seems to be all about the Sync15Service, so I'm surprised to see it named lib.rs - is that a rust convention of some sort?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main file for the module is always lib.rs
for libraries and main.rs
for binaries. I can move this into it's own module if you'd like though.
sync15-adapter/src/lib.rs
Outdated
root_key: KeyBundle, | ||
client: Client, | ||
// We update this when we make requests | ||
last_server_time: Cell<ServerTimestamp>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this intended to be used?
More specifically, I'm thinking it might make sense to implement a couple of "state" objects which can be created by this module and used by the consumer, which might help implement a few things we need to consider sooner rather than later.
eg:
- a ServiceState object with things like the last timestamp, the result of meta/global, any backoff type headers, and possibly even some error states, particularly around auth - but in general they'd be opaque to the consumer.
eg, I could imagine remote_setup returning one of these, and functions which get records from the server could take it as an input. Backoff handling could then be implemented by the get functions throwing until the backoff period is over, could throw if syncIDs or other state from meta/global has changed, etc.
- a CollectionState object which also tracks the last timestamp for the collection and possibly other state. In general, it works much like the ServerState - generally opaque to the consumer, but would allow better tracking of the lastModified from the last sync they did, etc.
(Note I'm not suggesting this happen before merging, but instead just starting a discussion to see what you think and understand where this is heading)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further to @mhammond's comment: this is a lot less like the iOS state machine than I was expecting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is crufty and not well thought-out at all. I'm going to try and move it to be more like a simplified version of the iOS state machine (simplified since we don't need to handle as many cases), which I hadn't looked at.
sync15-adapter/src/lib.rs
Outdated
// shouldn't need to take ownership of it... | ||
*req.body_mut() = Some(Vec::from(bytes).into()); | ||
let mut resp = self.svc.exec_request(req, false)?; | ||
Ok(PostResponse::from_response(&mut resp)?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see where the timestamp from the post response ends up in the resp object, but can't see where last_sync_remote is updated, so committing one CollectionUpdate then creating and committing another will fail (which, unsurprisingly, took me quite some time to verify was actually true, but I am learning :)
(FWIW, desktop has the same problem of not updating the lastModified timestamp after a post, although the code never actually makes multiple posts in that way)
As mentioned above though, I'm not quite convinced this should be tracked in the server instance anyway.
_ => {} | ||
} | ||
|
||
// Can't change this in match arms without NLL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's "NLL"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume rust-lang/rust-roadmap-2017#16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there are a lot of cases where you can't modify a value because a reference is "alive" even though you aren't using it anymore. It's annoying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this doesn't happen much in practice. Usually it's an indicator that you're doing something very mutable/procedural and should rethink your approach.
Otherwise, you can wrap your read in its own scope:
let new_batch_id: Option<String>;
// Scoped read.
{
match &thing {
&Some::Thing(val) => new_batch_id = Some("foo".to_string()),
_ => new_batch_id = None,
}
}
thing.batch_id = new_batch_id;
or use an Entry
, or move your value into the match and move the same value or a replacement out of it, or match on a Copy
value from inside your structure, or something else.
pub duration: u64, | ||
// This is treated as optional by at least the desktop client, | ||
// but AFAICT it's always present. | ||
pub hashed_fxa_uid: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... which does raise the whole "how and who does telemetry" :)
.send()?; | ||
|
||
if !resp.status().is_success() { | ||
warn!("Non-success status when fetching token: {}", resp.status()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually we will want to communicate 401s etc back to the consumer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, via TokenserverHttpError. It probably should be parsed out more explicitly though.
use openssl; | ||
use base64; | ||
|
||
pub fn base16_encode(bytes: &[u8]) -> String { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised you had to hand-roll this, but 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wrote most of a crate for this over the weekend, mainly since rust-hex (which I found out about after writing this) is much much slower (and it's hard to optimize given that it's API is based on std::io::{Read,Write}
). But yeah I had to ask in the rust IRC if there was a builtin way and apparently there is not :(
/// and NaN timestamps, which the server certainly won't send, but the guarantee would make me feel | ||
/// better. | ||
#[derive(Debug, Copy, Clone, PartialEq, PartialOrd, Deserialize, Serialize, Default)] | ||
pub struct ServerTimestamp(pub f64); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very cool - I wonder if we should have something similar for durations? (I think mentat has something similar)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, std::time::Duration
exists. But yeah I was thinking we might want something for the kinds of durations the server sends, which are in floating point seconds.
sync15-adapter/src/lib.rs
Outdated
} | ||
|
||
pub fn all_records<T>(&mut self, collection: &str) -> | ||
error::Result<Vec<BsoRecord<MaybeTombstone<T>>>> where T: Sync15Record { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider whether it's better to do this with generics or with trait objects. You're using monomorphization here, and that means there are separate all_records
implementations for every T
. Presumably the set of methods will grow, and the set of record types will grow to 5+…
Another decision you can make is to have the bit that depends on the record type — decryption to cleartext — be inside the generic method, and the rest (the first three lines) be in a separate non-generic method. That'll keep code size smaller.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't expect this API to live that long FWIW, and if you exclude deserializing after decryption, this code is not very big, even with the deserialization it's only 7k or so after assembling for the password records (and you'll eventually have to pay the cost of the code size from deserialization).
sync15-adapter/src/record_types.rs
Outdated
|
||
#[derive(Debug, Clone, Hash, PartialEq, Eq, Serialize, Deserialize)] | ||
#[serde(rename_all = "camelCase")] | ||
pub struct PasswordRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat agree with @mhammond, but only to the extent that I think they should be in a separate file. I think it's better to have canonical implementations than to make consumers figure out the object formats themselves!
It's always possible for the consumer to specify their own T: Sync15Record
and bypass these.
sync15-adapter/src/lib.rs
Outdated
root_key: KeyBundle, | ||
client: Client, | ||
// We update this when we make requests | ||
last_server_time: Cell<ServerTimestamp>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further to @mhammond's comment: this is a lot less like the iOS state machine than I was expecting.
- Tombstone stuff is now in tombstone.rs - record_types.rs just includes type definitions for known record formats. - service code is now in service.rs - lib.rs just has crate decls, module decls, and exports.
Merging based on markh's r+. This doesn't mean I think this code is production ready, just that it shouldn't spend more time in a side branch. I addressed some easy things (file structure basically) and I'll file issues for the harder stuff |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Post-landing review, but this is a great start, thank you! 🎉 My biggest concern is that amorphous Sync15Service
object, and some confusion around BsoRecord
, MaybeTombstone
, and Sync15Record
.
sync15-adapter/src/record_types.rs
Outdated
|
||
#[derive(Debug, Clone, Hash, PartialEq, Eq, Serialize, Deserialize)] | ||
#[serde(rename_all = "camelCase")] | ||
pub struct PasswordRecord { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this depends on where and how we draw the line between the Sync client and storage. I could imagine having a separate storage module that implements Sync15Record
, and handles translating between Sync records and the persistence layer. That way, we could also teach the store about on-disk encryption and content-dependent reconciliation, without involving Sync. It might also make it easier to swap out storage for Mentat while using the Sync 1.5 server.
} | ||
|
||
/// Marker trait that indicates that something is a sync record type. By not implementing this | ||
/// for EncryptedPayload, we can statically prevent double-encrypting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, awesome!
impl<T> BsoRecord<T> { | ||
/// If T is a Sync15Record, then you can/should just use record.into() instead! | ||
#[inline] | ||
pub fn new_non_record<I: Into<String>, C: Into<String>>(id: I, coll: C, payload: T) -> BsoRecord<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do?
pub ciphertext: String, | ||
} | ||
|
||
// This is a little cludgey but I couldn't think of another way to have easy deserialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems OK...maybe calling it something like ENCRYPTED_PAYLOAD_SERIALIZED_OVERHEAD
to make it clearer we're serializing an empty record for that calculation?
lazy_static! { | ||
// The number of bytes taken up by padding in a EncryptedPayload. | ||
static ref EMPTY_ENCRYPTED_PAYLOAD_SIZE: usize = serde_json::to_string( | ||
&EncryptedPayload { iv: "".into(), hmac: "".into(), ciphertext: "".into() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth factoring this out into a Default
impl, or not really?
impl<'a, T> CollectionUpdate<'a, T> where T: Sync15Record { | ||
pub fn new(svc: &'a Sync15Service, allow_dropped_records: bool) -> CollectionUpdate<'a, T> { | ||
let coll = T::collection_tag(); | ||
let ts = svc.last_modified_or_zero(coll); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
svc
is really starting to look like a mix of REST calls and collection state management that we could tease apart into HTTPClient
and Collections
...
|
||
#[derive(Deserialize, Serialize, Clone, Debug, PartialEq, Eq)] | ||
#[serde(untagged)] | ||
pub enum MaybeTombstone<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we have BsoRecord
(the outer envelope, IIUC), MaybeTombstone
, and Sync15Record
. Could we flatten this a bit? Instead of calling this MaybeTombstone
, why not Payload
with Tombstone
and Record
variants?
self.to_update.push(rec_or_tombstone); | ||
} | ||
|
||
pub fn add_record(&mut self, record: T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we morphed MaybeTombstone
into a Payload
type, could we pass it directly here, instead of separate methods for adding records and tombstones?
error::unexpected( | ||
"Tried to authorize bad URL using hawk (no port -- unknown protocol?)"))?; | ||
|
||
let header = hawk::RequestBuilder::new( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, this could cache builders by URL, I imagine there are some routes we'll hit more frequently than others...
index += 2; | ||
} | ||
// We know statically that this unwrap is safe, since we can only write ascii | ||
String::from_utf8(result).unwrap() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would String::from_utf8_lossy
be a good fit here?
&self.token | ||
} | ||
|
||
pub fn new(request_client: &Client, base_url: &str, access_token: String, key_id: String) -> Result<TokenserverClient> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Playing a bit more with this, I'm not sure I'm a fan of new
making an HTTP request when you construct it...what happens if the network is flaky, or the server is down? Could this be lazy, so that we don't fail to create the objects we need for a sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tweaked this in #42 - let me know what you think!
* Updates application context and targeting attributes This refactors the list of application context attributes that are used for matching purposes to be more in line with the `client_info` section collected by Glean telemetry in order to ease the burden on analysis. Also expands the evaluator tests a bit. * Change order of Matcher to be in line with AppContext field order
There's really no reason for me to be working in a side branch anymore. Sorry this got so big but on the bright side it's well over halfway to what it needs to be.
I dunno what the best way to review this will be, sorry in advance.
Some things to note:
There's a demo
You can actually try it out. There's an example called boxlocker-parity that actually has more functionality than boxlocker now, since it can do record uploads and deletes too. You can run it by doing:
$ git co tcsc/sync15-adapter-WIP $ cd sync15-adapter $ cargo run --example boxlocker-parity
This will run the ../boxlocker/boxlocker.py script to login the first time, although at this point it maybe could use fxa-rust-client / sandvich? I don't know if this is usable yet, last time I tried it wasnt.
It will give you options to add/delete password records and will update them with a batched upload, possibly over multiple posts.
Potential issues
One thing to note, this takes fxa-rust-client out of the workspace. Apparently the build is broken and this is blocked by a new update to rust-openssl? Ask @eoger, I don't understand this, but AFAICT it's still broken in master, and I have to assume there's either some uncommited config or something that is used to make it work for the demo builds.
Another thing to note is the commit history is a mess. Earlier I tried to rebase and found that even though it seemed right, it broke the build, so I just did merge commits (it turned out the build was broken for the above reason). So, IDK. I can probably rebase at this point, but it should just land as a merge commit IMO
Things that work:
Things that don't work or need more work
$timestamp
). This works in theory and there's tests that we generate a relatively sane URL but I haven't tried it in practice so it might be broken.Also the API isn't the nicest, too much stuff is public, etc...