Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What happens when an IDGraph bloats up? #1442

Closed
Kailai-Wang opened this issue Mar 13, 2023 · 11 comments · Fixed by #1478
Closed

What happens when an IDGraph bloats up? #1442

Kailai-Wang opened this issue Mar 13, 2023 · 11 comments · Fixed by #1478
Assignees
Labels
D3-chore tasks that need to be completed but don’t provide any additional features/functionality I3-high should be completed within 5 working days

Comments

@Kailai-Wang
Copy link
Collaborator

Context

We included the encrypted IDGraph in the returning event in #1080

But what if a user links too many identities and the IDGraph grows into a huge object? Maybe you can't even submit extrinsic anymore.

We need to:

  • test when it starts to be problematic
  • devise a plan to fix it

@jonalvarezz maybe we have to discuss the necessity of having such IDGraph in the returning event again.


✔️ Please set appropriate labels and assignees if applicable.

@Kailai-Wang Kailai-Wang added I3-high should be completed within 5 working days D3-chore tasks that need to be completed but don’t provide any additional features/functionality labels Mar 13, 2023
@grumpygreenguy
Copy link
Contributor

The first question that comes to mind then would be, how would IDHub retrieve the full IDGraph for a given user (assuming we have the shielding key)? Would we need to scrape the whole chain and/or maintain an index? 🤔

@jonalvarezz
Copy link
Contributor

Thank you, Kai, I am also lacking data and Parachain implementation details to answer this, so I'd appreciate it if you could elaborate how much is "too many identities" and why it may prevent users to submit more extrinsics.

Also if it helps, The Identity Hub, and in general, any client, doesn't need to get the IDGraph as part of every response (#1080). We introduced it as a quick workaround as there was no other way for a client to retrieve it, say from the Parachain's Storage pallet.

From a client perspective, the current solution is inconvenient and we need another way of retrieving the IdGraph without involving a user transaction to happen.

Would revisiting it solve this issue too?

@Kailai-Wang
Copy link
Collaborator Author

Kailai-Wang commented Mar 13, 2023

Thanks @grumpygreenguy @jonalvarezz

how would IDHub retrieve the full IDGraph for a given user (assuming we have the shielding key)?

Short answer is IDHub shouldn't be able to retrieve it without the user's signature (agreement) :)
IDH can store the result of each request and assemble the IDGraph on its own. Only if the user explicitly requests the full IDGraph (or agrees on such requests) can we proceed and "retrieve" it.

I'd appreciate it if you could elaborate how much is "too many identities" and why it may prevent users to submit more extrinsics.

We need test ourselves. But I have the impression that each extrinsic has a limit of several MB (like 5-6?) and the extrinsic needs to be packed into a PoV (proof of validity) which has a hard limit too (also several MB), otherwise the validators on the relaychain won't be able to verify the PoV in time.

But even if the size is not a problem, submitting a long extrinsic can be very slow. It bloats up the block size as well - so theoretically an attacker can try to clog the network and fill your node disk quickly by keeping sending the identity-related extrinsics.

we need another way of retrieving the IdGraph without involving a user transaction to happen

We can have a synchronous RPC - either via parachain or directly to the worker, but that requires the user signature too. So it's a privileged RPC request.

@jingleizhang
Copy link
Contributor

what if a user links too many identities and the IDGraph grows into a huge object

This is possible and actually we don't know what will happen afterwards. A test is needed.

@jonalvarezz
Copy link
Contributor

Thank you for elaborating, Kai and Eric.

Short answer is IDHub shouldn't be able to retrieve it without the user's signature (agreement)

Would it incur fees too? if it doesn't, how about:

  1. Return the IDGraph on IdentityVerified only – limit it to the 20 most recent entries
  2. Support retrieving the whole IDGraph from the Parachain (synchronous RPC)

Impact for clients
Using the IDHub as example,

  1. The app will refresh the local IDGraph every time there is an identity linked
  2. Feature a "Download latest IDGraph"/ "Sync IDGraph" button that submits the user transaction to retrieve it.

@Kailai-Wang
Copy link
Collaborator Author

It won't incur any fees.

  1. Return the IDGraph on IdentityVerified only – limit it to the 20 most recent entries -> We can do that. But can you double-check if this is enough? (e.g. you don't need any IDGraph returned in create_identity)
  2. Support retrieving the whole IDGraph from the Parachain (synchronous RPC) -> I just realised it could be hard - the substrate RPC can only query the on-chain state, but the IDGraph is stored in the sidechain (so offchain from parachain's perspective). We can however directly provide the tee-worker's endpoint for such queries - ideally it stands behind a proxy/firewall: the user's request will go to some central endpoint which forward the request to tee-worker. Is that feasible?

@jonalvarezz
Copy link
Contributor

jonalvarezz commented Mar 15, 2023

Return the IDGraph on IdentityVerified only – limit it to the 20 most recent entries -> We can do that. But can you double-check if this is enough?

Yes, it is – Although, limitting the number of returned entries to a small number only makes sense if we have an alternative to get the entire list.

We can however directly provide the tee-worker's endpoint for such queries - ideally it stands behind a proxy/firewall: the user's request will go to some central endpoint which forward the request to tee-worker. Is that feasible?

I don't know 😂 – is this something we can scale and offer to any client wanting to use our Parachain?


Maybe we can start by introducing a defensive change and cap the IDGraph to the first 100 items (I leave the number to you/ I am using a magic number that should be safe for the alpha releases). The how to independently get the entire IDGraph in one call (point 2) seems to need further discussion and research.

Additionally, the IDHub is updating the local IDGraph on IdentityVerified only. We can remove the IDGraph from the IdentityCreated payload. it should help to keep the block size small

@Kailai-Wang
Copy link
Collaborator Author

Thanks @jonalvarezz

We'll take the following actions (correct me if I'm wrong):

  • cap the IDGraph to a limited size of elements
  • only keep the IDGraph for IdentityVerified and remove IDGraph in other events
  • combined with Associate the errors with the request extrinsics #1272
    • we shall have request_ext_hash for error events
    • we shall have both identity and request_ext_hash for "normal" events
  • investigate allowing direct RPC to workers to retrieve IDGraph synchronously (unlikely available in alpha launch)

cc @jingleizhang @kaylawangnow

@jonalvarezz
Copy link
Contributor

It sounds good to me. Thanks for the wrap-up, Kai.

@Kailai-Wang Kailai-Wang self-assigned this Mar 15, 2023
@Kailai-Wang
Copy link
Collaborator Author

Will address the first two items in the PR linked to this issue.

@jingleizhang
Copy link
Contributor

sounds good, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
D3-chore tasks that need to be completed but don’t provide any additional features/functionality I3-high should be completed within 5 working days
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants