Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Reveal Substring Template #207

Merged
merged 32 commits into from
Oct 1, 2024
Merged

Feat: Reveal Substring Template #207

merged 32 commits into from
Oct 1, 2024

Conversation

shreyas-londhe
Copy link
Member

@shreyas-londhe shreyas-londhe commented Aug 1, 2024

Description

This PR introduces a new circom template called SubstringMatch that verifies if a given substring exists within a larger string at a specified index.

The SubstringMatch template uses a Random Linear Combination (RLC) approach to perform efficient substring comparisons. This method allows for constant-time verification regardless of the input string length, making it suitable for use in zk-SNARKs.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Functionality and Usage Example

The SubstringMatch template takes the following parameters and inputs:

  • Parameters:

    • maxLength: The maximum length of the input string
    • maxSubstringLength: The maximum length of the substring to be matched
  • Inputs:

    • in: An array of ASCII values representing the input string
    • startIndex: The starting index of the substring in the input string
    • revealedString: An array of ASCII values representing the substring to be matched
    • r: A random value used for the RLC calculation
  • Output:

    • isValid: A signal that is 1 if the substring matches at the given index, 0 otherwise

Example usage:

pragma circom 2.1.6;

include "substring-match.circom";

template Example() {
  var maxLength = 100;
  var maxSubstringLength = 20;
  
  component substringMatch = SubstringMatch(maxLength, maxSubstringLength);
  
  // Input string: "Hello, World!"
  substringMatch.in <== [72, 101, 108, 108, 111, 44, 32, 87, 111, 114, 108, 100, 33, 0, 0, ...]; // Padded with zeros
  
  // Start index: 7 (position of "W")
  substringMatch.startIndex <== 7;
  
  // Substring to match: "World"
  substringMatch.revealedString <== [87, 111, 114, 108, 100, 0, 0, ...]; // Padded with zeros
  
  // Random value for RLC
  substringMatch.r <== 123456789;
  
  // The output will be 1 (true) because "World" is found at index 7 in "Hello, World!"
  signal output result;
  
  result <== substringMatch.isValid;
}

This example demonstrates how to use the SubstringMatch template to verify that the substring "World" exists within the string "Hello, World!" starting at index 7.

Checklist:

  • I have discussed with the team prior to submitting this PR
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • New and existing unit tests pass locally with my changes

@shreyas-londhe shreyas-londhe changed the title Feat/substring matching Feat: Substring Match Template Aug 1, 2024
@Divide-By-0
Copy link
Member

Divide-By-0 commented Aug 3, 2024

Can we add an option for someone to constrain 'only one' or something, meaning that if its on it checks the rest of the string, and if any other substring matches then it returns an argument of 0? Can you also calculate r automatically with fiat shamir on all the inputs? So like the query string, the large string, and the index?

@shreyas-londhe
Copy link
Member Author

@Divide-By-0

Can we add an option for someone to constrain 'only one' or something, meaning that if its on it checks the rest of the string, and if any other substring matches then it returns an argument of 0?

I think these are two different use cases:

  1. Proving that a substring exists in a string at an index.
  2. Proving that a substring exists in a string.

The current template is optimised to follow Case 1.

What I suggest we can do is, we can have two variations for the two cases. For Case 2, we can use Zac's string search algorithm (https://github.com/noir-lang/noir_string_search) which is efficient for that case.

/// @input revealedString An array of ASCII values representing the substring to be matched
/// @input r A random value used for the RLC calculation
/// @output isValid A signal that is 1 if the substring matches at the given index, 0 otherwise
template SubstringMatch(maxLength, maxSubstringLength) {
Copy link
Member

@saleel saleel Aug 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if we VarLeftShift in by startIndex and then compare first endIndex - startIndex bytes of both shiftedIn and revealedString? (Or even simply reveal that as output, instead of revealString as an input). We can do this mostly with existing circuits like PackByteSubarray.

(Not sure which is more constraint efficient but something to compare if you are down)

But PR is good otherwise 🚀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out @saleel. The SelectSubArray template is indeed much more efficient than our current approach.

SelectSubArray
Screenshot 2024-08-21 at 09 06 24

SubstringMatch
Screenshot 2024-08-21 at 09 08 57

And I assume that it has been audited before too. I believe that using SelectSubArray is better for revealing substrings. We can delete this PR.

cc: @Divide-By-0

Copy link
Member

@saleel saleel Aug 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. Thank you for the benchmarks.
Yes, if there is no explicit need for comparing values for equality in circuit we can simply use SelectSubArray. Otherwise this can be modified to do "SelectSubArray + Compare packed bytes"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we just want to reveal the substring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Divide-By-0 FYI ^
I think can close this PR if you also agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@saleel made some changes, can you have a look?

shreyas-londhe and others added 17 commits September 12, 2024 23:47
Fixed a bug in retrieving the DKIM public key:

For some emails, the DKIM public key is not stored TXT record directly in `selector._domainkey.domain`, but instead points to a CNAME.

For example, when retrieving the DKIM public key for the following domain: `protonmail._domainkey.proton.me`, we first need to resolve the CNAME for this domain: `protonmail.domainkey.drfeyjwh4gwlal4e2rhajsytrp6auv2nhenecpzigu7muak6lw6ya.domains.proton.ch`.

However, using the "dns" npm package, the CNAME cannot be resolved.

With this update, all DNS resolutions will be performed using google DNS-over-HTTPS (DoH).
Co-authored-by: saleel <saleel@saleel.xyz>
* Fix SelectRegexReveal

* Add a new test for the SelectRegexReveal template.
@shreyas-londhe shreyas-londhe changed the title Feat: Substring Match Template Feat: Reveal Substring Template Sep 13, 2024
packages/circuits/helpers/reveal-substring.circom Outdated Show resolved Hide resolved
}).rejects.toThrow("Assert Failed");
});

it("should not match when substring is not at the beginning", async () => {
Copy link
Member

@Divide-By-0 Divide-By-0 Sep 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait what, it should match in the test on line 65 right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is test if for the helper circuit CheckSubstringMatch which is used to check for matches from each starting index - so I think it correct.
Btw, why cant we reveal data by taking a startIndex and length as input? I think that would be still cheaper. Is there a use-case we need to enforce there is only one substring?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for answering that @saleel, and you are correct. CheckSubstringMatch checks only a part of the input (basically a window), so this template is actually used in CountSubstringOccurrences.

Btw, why cant we reveal data by taking a startIndex and length as input? I think that would be still cheaper. Is there a use-case we need to enforce there is only one substring?

Yes, there is a use-case where the json data might be like:

{
    command: "do this",
    other_string: "command: \"do this instead\""
}

And we want to reveal n characters after the substring command. Someone can spoof it like this.


signal output substring[maxSubstringLength];

// substringStartIndex should be less than maxLength
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really necessary given the check on substringStartIndex + substringLength? Additionally we have been assuming valid index on util circuits, and asks the consuming circuit to do these range checks. This is to avoid redundant checks when main circuit use many of these utils.
Though I think its ok to do substringStartIndex + substringLength < maxLength in here.

We can also assert substringLength is not Zero?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. I'll add a note above the template regarding this 🙌

@Divide-By-0 Divide-By-0 merged commit c79e1a7 into main Oct 1, 2024
5 checks passed
@shreyas-londhe shreyas-londhe deleted the feat/substring-matching branch October 2, 2024 04:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants