Support ExtraData de-duplication in MySQL backend #1968

rolandshoemaker · 2019-11-12T20:39:12Z

(This is a rather complicated proposition, so it's more of a speculative request, rather than a concrete proposal.)

The CT personality builds leaves for inclusion that include in the ExtraData field the user submitted chain, minus the end-entity certificate. This data has relatively low cardinality and takes up a considerable amount of space in the storage backend.

It would be great if there was an option to store the ExtraData in a separate table (or something) from LeafData so that they could be deduped (based on data hash or something) with a reference to the relevant row in LeafData in a many-to-one setup. In certain setups this could save >50% of current storage requirements. It seems likely that this optimization is only really relevant to the CT usage of trillian, so I'm not entirely sure if there are issues this could cause for other personalities. This would, likely, incur slightly more expensive MySQL queries.

The text was updated successfully, but these errors were encountered:

Martin2112 · 2019-12-18T10:15:45Z

It's an interesting request, though as you say possibly more a CT thing than generic.

We have considered options like not storing leaf data in the database. As it's immutable it could be served from edge caches or whatever. That was more for performance than saving disk space though. e.g. you could pack up ranges of leaves and serve them much faster than mysql can.

Not sure we'd want to make this sort of schema change now but It's possible that experiments along these lines could be done with a modified CT personality that stores some sort of cache ID instead of the leaf data. That might be a place to start.

pphaneuf · 2019-12-18T14:29:10Z

From a purely Trillian point of view, ExtraData should probably not be in there at all, as it has absolutely nothing to do with the Merkle tree.

Having the data together makes it possible (even though it's not the case at the moment) to make get-entries (which makes up about 75% of read requests on our CT logs, the rest being mostly get-sth, and trace amounts of get-sth-consistency) have a much lower latency, by doing only one sequential read, instead of 1 sequential read + N lookups (where N is the number of entries fetched). This would probably be fewer lookups if they were deduped, though.

Note that if Trillian didn't use revisions for subtrees of logs (which I believe it doesn't need, only maps need them?), you might be looking at both a speedup of proof retrieval and a very sizable reduction in storage requirements (I don't have numbers handy, but if someone told me about 50%, I'd believe it). It would also probably speed up sequencing.

pav-kv · 2020-05-14T15:52:33Z

This discussion has migrated to google/certificate-transparency-go#691, as it was CT-specific. Generalisations are possible, but we tend to think that they should happen on the personality side.

I propose to close this issue here.

pav-kv · 2020-05-14T15:57:20Z

A good follow-up request from this one would be optional interleaving ExtraData field into the sequenced leaf data. This would help making get entries calls faster for those kinds of use-cases.

RJPercival added the feature label Nov 12, 2019

paulmattei added the Low Priority label Mar 4, 2020

rolandshoemaker mentioned this issue May 13, 2020

proposal: consider adding a chain storage interface to ctfe google/certificate-transparency-go#691

Closed

pav-kv closed this as completed May 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support ExtraData de-duplication in MySQL backend #1968

Support ExtraData de-duplication in MySQL backend #1968

rolandshoemaker commented Nov 12, 2019

Martin2112 commented Dec 18, 2019

pphaneuf commented Dec 18, 2019

pav-kv commented May 14, 2020 •

edited

Loading

pav-kv commented May 14, 2020

Support ExtraData de-duplication in MySQL backend #1968

Support ExtraData de-duplication in MySQL backend #1968

Comments

rolandshoemaker commented Nov 12, 2019

Martin2112 commented Dec 18, 2019

pphaneuf commented Dec 18, 2019

pav-kv commented May 14, 2020 • edited Loading

pav-kv commented May 14, 2020

pav-kv commented May 14, 2020 •

edited

Loading