Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Celo: Serialization changes #130

Closed
wants to merge 1 commit into from
Closed

Conversation

ValarDragon
Copy link
Member

@ValarDragon ValarDragon commented Dec 13, 2020

Description

Copies celo's serialization changes from #127

The key change is it adds serialize_uncompressed_unchecked.

Currently the implication of serialize_unchecked is to serialize in the uncompressed form, so this PR makes sense in the context of needing to serialize unchecked, compressed data.

The usecase for that is not clear to me, as the check needed for deserializing an affine curve point on a prime order curve is effectively the same as getting the affine representation of the point. (We currently don't implement arithmetic in montgomery form AFAIK)

Is the usecase for skipping subgroup checks for compressed curve points?

@jon-chuang, @Pratyush, is the above correct for where serializing unchecked compressed comes in handy?

I'll handle fixing the merge conflicts, and copying over the changes to make elliptic curves implement this if we agree on the need for the change.

We also have the question of whether the default should be for serialize_unchecked to be compressed or uncompressed. The current form of this PR is a breaking change that will fail silently in many places, by changing the implementation of serialize_unchecked.

Not sure what the best API to be using here right now is.


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

Co-authored-by: jon-chuang <9093549+jon-chuang@users.noreply.github.com>
@ValarDragon ValarDragon marked this pull request as draft December 13, 2020 07:27
@ValarDragon ValarDragon changed the title Celo: Serialization Celo: Serialization changes Dec 13, 2020
@Pratyush
Copy link
Member

Is the usecase for skipping subgroup checks for compressed curve points?

Yes, this is the intended usecase, I believe. I’m not sure what the overhead of decompression vs subgroup checks is for common curves

@Pratyush
Copy link
Member

@jon-chuang Could you elaborate further on the applications/potential use cases for this? Isn't taking the square root pretty expensive?

@jon-chuang
Copy link
Contributor

Hi, yes, the use case is indeed to skip the subgroup check. We can use a batched subgroup check instead which is about 30-75x faster, iirc.

@jon-chuang
Copy link
Contributor

It would be nice to specialise the impl for vec or [T] where T: AffineCurve etc. to perform the batch subgroup check. Perhaps a change for the future.

@Pratyush
Copy link
Member

Pratyush commented Dec 16, 2020

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 16, 2020

Upon further thought, I personally dont like the idea of having checked and unchecked in the API, and am more in favour of one of the two solutions:

  • Remove such checks entirely, then the responsibility/manner of checking falls on the programmer
  • create a trait for each type representing the batch version of checking functions, with a default impl doing nothing, and impl for curves doing a batch check, and have this function invoked for the slice and vec data types. Well, this is probably pretty orthogonal to the question though.

@Pratyush
Copy link
Member

Remove such checks entirely, then the responsibility/manner of checking falls on the programmer

I feel like this is probably going to lead to protocol problems elsewhere, and furthermore breaks the contracts of the curve types (which assume that the input is in the prime order subgroup)

@Pratyush
Copy link
Member

We could try playing around with adding a default parameter to the serialize and serialize_uncompressed methods, so that by default it performs the checks, but if you change the default type parameter to a different one (which avoids the checks), then it does not perform on-curve checks.

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 16, 2020

Hmm yeah, I guess the solution proposed is meant to allow for minimal breakage.

I guess compression and data integrity checks do form a reasonable quadrant on the type of (de)serialisation one would like generically speaking. One however only cares about toggling data integrity checks if they are relatively expensive, so they matter little to e.g. a bool deserialization.

Would you mind giving an example of the type parameter method?

@ValarDragon
Copy link
Member Author

we could separate the unchecked serialization methods into its own trait, that is auto-implemented if canonical serialize is implemented

@ValarDragon
Copy link
Member Author

ValarDragon commented Dec 16, 2020

Im in favor of the approach of having 4 methods

* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked

I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 16, 2020

Hmm since it's useful, I propose renaming (de)serialize_unchecked to (de)serialize_compressed_unchecked. I am undecided on whether we should also rename serialize to serialize_compressed in the process. WDYT @ValarDragon @jon-chuang @kobigurk?

Wrt to this question, are you proposing to have e.g. deserialize_compressed.. alongside deserialize_uncompressed.., only for the unchecked version of the methods?

@jon-chuang
Copy link
Contributor

Im in favor of the approach of having 4 methods

* serialize_compressed
* serialize_uncompressed
* serialize_compressed_unchecked
* serialize_uncompressed_unchecked

I think its fine to expose the four options here, and let downstream applications decide which method they want, and which methods they want to expose.

Hmm, well barring curves, most types do not have an option to compress, so shouldn't we have a default that leads to minimal breakage?

@Pratyush
Copy link
Member

Pratyush commented Dec 16, 2020

Here's some example code, that allows incorporating the batch checks as well:

trait CanonicalDeserialize {
	Specifies how the check should be performed
	type Validator: Checker<Self>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}

	
	fn deserialize_uncompressed<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get uncompressed item from R
		if C::should_check() {
			assert!(Self::Validator::check(&[item]));
		}
		item
	}
}

/// Actually performs the check
trait Checker<T> {
	fn check(&[T]) -> bool;
}

trait ShouldCheck {
	fn should_check() -> bool;
}

struct AlwaysCheck;

impl ShouldCheck for AlwaysCheck {
	fn should_check() -> bool {
		true
	}
}

struct NoCheck;

impl ShouldCheck for NoCheck {
	fn never_check() -> bool {
		false
	}
}

Here's how you would use it to do both checked and unchecked serialization:

let item = T::deserialize(&mut reader)?;
let item = T::deserialize::<_, NoCheck>(&mut reader)?;

and here's how you would use the Checker API to implement batch checking without specialization

impl<T: CanonicalDeserialize> CanonicalDeserialize for Vec<T> {
	type Validator = VecChecker<T>;
	
	fn deserialize<R: Read, C: ShouldCheck = AlwaysCheck>(reader: R) -> Self {
		// get items from R
		if C::should_check() {
			assert!(T::Validator::check(items.as_slice()));
		}
		item
	}
}

struct VecChecker<T>(PhantomData<T>);
impl<T: CanonicalDeserialize> Checker<Vec<T>> for VecChecker<T> {
	fn check(items:&[Vec<T>]) -> bool {
		for item in items {
			T::check(item.as_slice())
		}
	}
}

The idea would be to have different implementations of checkers for different types.

@Pratyush
Copy link
Member

Pratyush commented Dec 16, 2020

Note that the changes relating to the Validator associated type are completely orthogonal to the changes related to the ShouldCheck trait; we could adopt either, neither, or both.

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 16, 2020

Here's some example code, that allows incorporating the batch checks as well:

I like this strategy as it exposes optional fine grained control with fewer breaking changes and more separation of duties.

I don't, however, like the use of type parameters as an optional argument...

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 16, 2020

Instead of the above approach, I propose instead to have a trait with overridable default impls

trait ValidData {
    fn validate(value: Self) -> Result {
        Ok(())
    }
    fn validate_batch(values: &[Self]) -> Result {
        for value in values {
            Self::check(value)?;
        }
        Ok(())
    }
}

Then one writes for impl<T: ValidData> CanonicalDeserialize for Vec<T>

fn deserialize<.., C: ..>(...) {
   for ... {
       deserialize(..); // use default
   }
   if C::should_validate() {
       T::validate_batch(values)?;
   }
}

Also validate sounds more precise.

@jon-chuang
Copy link
Contributor

Hi, @Pratyush @ValarDragon, could I get permission to push to non-protected branches?

@jon-chuang
Copy link
Contributor

Btw, where is serialize_derive actually used? I can't seem to find it used anywhere. For instance, derive_canonical_deserialize isn't.

@weikengchen
Copy link
Member

The serialize-derive repo is often used in a project by adding a dependency on serialize with the feature derive.
https://github.com/arkworks-rs/algebra/blob/master/serialize/src/lib.rs#L19

This allows the Rust to derive the canonical serialization and deserialization.
e.g., https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107

@weikengchen
Copy link
Member

By the way, thanks a lot for the serialization system by Celo. It has helped one of my projects significantly!

@jon-chuang
Copy link
Contributor

Hi @weikengchen thanks, I guess it's not used in algebra itself, with arrays and simple structs. I see that it's necessary for cases where there are nested structs.

That's great, but according to the above, though, the API seems like it's about to change again.

@jon-chuang
Copy link
Contributor

jon-chuang commented Dec 21, 2020

@Pratyush unfortunately it seems that your strategy doesn't work, becuase:
defaults for type parameters are only allowed in struct, enum, type, or trait definitions.
So I'm still looking for a nice workaround to specify default behaviour without introducing breaking changes and/or clutter.

One workaround is to have the traits Canonical(de)Serialize take a type parameter. However, this is obviously not flexible enough as within the same file, one might want to invoke e.g. serialize both with and without validation.

While ugly, it does seem that exposing the 4 functions with _unchecked is the most straitforward way to implement the desired functionality. Alternatively, while introducing breaking changes, a cleaner method might be to create an optional enum selector Option<SerializationMode> (encompassing both compression and validation procedures). While on the semantic level, rustc understands this as invoking a mode at runtime, I believe LLVM will inline the branches based on the mode invoked.

@jon-chuang
Copy link
Contributor

@weikengchen actually with regards to your example https://github.com/arkworks-rs/poly-commit/blob/master/src/data_structures.rs#L107, it does seem one can get rid of the ugly code bloat by implementing canonical (de)serialize for the String type.

@weikengchen
Copy link
Member

Yes. I will take a look!

@jon-chuang
Copy link
Contributor

Things I am currently unsure about: how to impl derive to incorporate (de)serialize_uncompressed_unchecked.

@weikengchen
Copy link
Member

weikengchen commented Dec 21, 2020

Is the problem due to whether it should default to (de)serialize_uncompressed or (de)serialize_unchecked when a specific implementation is not given?

@Pratyush
Copy link
Member

Pratyush commented Dec 21, 2020

btw Jon, an alternative to prevent downstream users from having to implement all methods manually would be to have a "Mode", as outlined in #145

The idea is to have a mode enum:

enum Compress {
	Yes,
	No,
}
enum Validate {
	Yes,
	No
}

And this enum is passed into a single method

trait Serialize {
	fn serialize_with_mode<R: Read>(self, reader: R, compression: Compression, validation: Validation) -> ioResult;
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn serialize_compressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn serialize_uncompressed_unchecked(...) {
		self.serialize_with_mode(reader, Compress::No, Validate::No)
	}
}

So users only have to implement the single method, and everything else is implemented by default?

This should also make writing the Derive macro implementation easier.

(This can be combined with the above Validator associated type idea, or your ValidData trait idea)

@Pratyush
Copy link
Member

Pratyush commented Jul 13, 2021

Incorporating the ideas above, let me propose the following design:

pub enum Compress {
	Yes,
	No,
}
pub enum Validate {
	Yes,
	No,
}

pub trait Valid {
	fn check(&self) -> Result<(), SerializationError>;
	fn batch_check(batch: &[self]) -> Result<(), SerializationError> {
		for item in batch {
			item.check()?;
		}
		Ok(())
	}
}

pub trait CanonicalSerialize {
	fn deserialize_with_mode<W: Write>(writer: W, compress: Compress) -> Result<(), SerializationError> {
		// write item to W, handling compression as per `compress`
	}
	
	fn serialize_compressed(...) {
		self.serialize_with_mode(reader, Compress::Yes)
	}
	
	fn serialize_uncompressed(...) {
		self.serialize_with_mode(reader, Compress::No)
	}
}

pub trait CanonicalDeserialize: Valid {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Result<Self, SerializationError> {
		// get item from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			Self::check(&item)?
		}
		item
	}
	
	fn deserialize_compressed(...) -> ... {
		Self::serialize_with_mode(reader, Compress::Yes, Validate::Yes)
	}
	
	fn deserialize_compressed_unchecked(...) -> ... {
		self.serialize_with_mode(reader, Compress::Yes, Validate::No)
	}
	
	fn deserialize_uncompressed(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::Yes)
	}
	
	fn deserialize_uncompressed_unchecked(...) -> ... {
		Self::deserialize_with_mode(reader, Compress::No, Validate::No)
	}
}

impl<T: Valid> CanonicalDeserialize for Vec<T> {
	fn deserialize_with_mode<R: Read>(reader: R, compress: Compress, validate: Validate) -> Self {
		// get items: Vec<T> from R, handling compression as per `compress`
		if let Validate::Yes = validate {
			T::batch_check(&items)?
		}
		items
	}
}

@Pratyush Pratyush added P-medium Priority: medium T-feature Type: new features T-performance Type: performance improvements T-refactor Type: cleanup/refactor labels Sep 9, 2021
@ghost ghost mentioned this pull request Apr 5, 2022
@Pratyush Pratyush closed this in #463 Sep 2, 2022
@Pratyush Pratyush deleted the celo_serialize_rebased branch October 26, 2022 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-medium Priority: medium T-feature Type: new features T-performance Type: performance improvements T-refactor Type: cleanup/refactor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants