Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@sigstore/verify does not verify the DSSE hash for Intoto 0.0.2 Transparency Log entries #1206

Closed
ericcornelissen opened this issue Jul 12, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@ericcornelissen
Copy link

ericcornelissen commented Jul 12, 2024

Description

The implementation of @sigstore/verify for verifying transparency log entries of intoto 0.0.2 statements, here, does not currently verify the (DSSE) hash value (i.e. tlogEntry.spec.content.hash?.value) of the entry it is verifying.

I tried to implement a verification following the calculation of the hash in the @sigstore/sign package, here (see also #151), but failed to verify entries using the npm CLI (npm audit signatures).

I stumbled upon this because I was trying to understand how npm provenance works (literally what every field in the provenance is for) and I'm unable to determine what this hash value is supposed to be...

Version

1.2.1

@ericcornelissen ericcornelissen added the bug Something isn't working label Jul 12, 2024
@bdehamer
Copy link
Collaborator

If I recall correctly, there are some issues with the intoto Rekor type which make it difficult to reliably re-calculate the hash value for verification.

Ultimately, what we need to do is ensure that the referenced Rekor entry matches the data that has been signed. We accomplish this by ensuring that the signature count matches, that the signatures are the same, and the DSSE payload is identical.

The issues with the intoto type were the reason that the newer dsse Rekor type was created. We haven't yet switched to the dsse type for npm cause there are still a lot of clients in the wild that wouldn't be able to verify the newer type. At some point we will make the switch.

@ericcornelissen
Copy link
Author

ericcornelissen commented Jul 13, 2024

If I recall correctly, there are some issues with the intoto Rekor type which make it difficult to reliably re-calculate the hash value for verification.

As far as I can tell the intoto Rekor type does not specify what the hash value must be with great precision. It just says it should be "the hash value for the archive". Besides that not being how it's implemented here, I also checked the sha256sum of the tarballs from the npm packages I'm looking at and got different values. Also, the fact that you're using JSON should (at least within the context of sigstore-js) not matter since you're using a canonicalized JSON serializer here (which I used when trying to validate the hash values).

Ultimately, what we need to do is ensure that the referenced Rekor entry matches the data that has been signed. We accomplish this by ensuring that the signature count matches, that the signatures are the same, and the DSSE payload is identical.

Per my understanding you're already doing all of these:

so then what is the hash value for?

The issues with the intoto type were the reason that the newer dsse Rekor type was created. We haven't yet switched to the dsse type for npm cause there are still a lot of clients in the wild that wouldn't be able to verify the newer type. At some point we will make the switch.

Looking at #631, it doesn't seem to be changing the verification so at first sight it's unclear to me how that relates to my question - the dsse type still has the envelopeHash.

@bdehamer
Copy link
Collaborator

I was looking at the Python implementation to try and jog my memory about this decision.

Ultimately, I think we couldn't reliably verify the hash value across implementations due to the fact that it was under-specified. We decided to abandon it and instead verify the constituent components of the envelope individually.

@ericcornelissen
Copy link
Author

ericcornelissen commented Jul 15, 2024

Please correct me if I'm wrong in understanding that: you don't verify the hash because it's underspecified/non-canonicalized JSON1 across implementations2 and you don't plan to fix that.3 Instead you'd prefer to switch to a different rekord type which also doesn't specify this aspect4.

If so, I guess it's reasonable that you don't want to start canonicalization now and break backwards compatibility. However, your proposed solution (waiting for the new rekord type) does not address this, at least so it seems to me. Also, I feel like this problem should be acknowledged somewhere in the docs.

Back to the topic of verifying the hash though, a suggestion: one way to overcome the problem of non-canonicalized JSON is to try all possible (minified) serializations. I tried to implement this to find hashes that work and it actually worked! I put the full code below, but first: from my testing the following check works fine in the context of npm provenance (worked perfectly when running npm audit signatures against https://github.com/ericcornelissen/shescape/tree/5c2f62c37f16ff45ed93bdc103fd3a3ad6b51141):

// This is a JavaScript implementation of <https://github.com/sigstore/sigstore-js/blob/c1d15d1c4f10e4a6f9f02224819d5d353bf9370a/packages/verify/src/tlog/intoto.ts#L38>
const core_1 = require("@sigstore/core");
function verifyIntoto002TLogBody(tlogEntry, content) {
  // existing code ...

  if (
    check({
      payload: content.env.payload.toString('base64'),
      payloadType: envelope.payloadType,
      signatures: [
        {
          // note that if this is undefined, this keyid will be omitted from the serialized JSON
          keyid: envelope.signatures[0].keyid,
          publicKey: core_1.encoding.base64Decode(envelope.signatures[0].publicKey),
          sig: core_1.encoding.base64Decode(envelope.signatures[0].sig),
        },
      ],
    }, tlogEntry.spec.content.hash?.value)
    ||
    check({
      payload: content.env.payload.toString('base64'),
      payloadType: envelope.payloadType,
      signatures: [
        {
          keyid: envelope.signatures[0].keyid,
          // Interestingly, some older provenance statements only verified without the
          // public key
          // publicKey: undefined,
          sig: core_1.encoding.base64Decode(envelope.signatures[0].sig),
        },
      ],
    }, tlogEntry.spec.content.hash?.value)
  ) {
    // Do nothing because the check passed; sorry too lazy to refactor
  } else {
    throw new error_1.VerificationError({
      code: 'TLOG_BODY_ERROR',
      message: 'Envelope payload hash mismatch',
    });
  }
}
function check(obj, want) {
  // I rely on the fact that `JSON.stringify` seems to serialize keys in the order in which
  // they're declared on the object in JavaScript. This may admittedly be an incorrect
  // assumption. I also assume `JSON.stringify` omits entries where the value is undefined.
  const envelopeHash = core_1.crypto.hash(JSON.stringify(obj)).toString('hex');
  return envelopeHash === want;
}

I guess this library should work outside the context of npm provenance too, so the following code could be used to verify against arbitrarily computed hash values. I'm afraid you may considers this too computationally expensive, but I'd urge you to consider using this - maybe even only as an opt-in feature?

General approach
// This is a JavaScript implementation of <https://github.com/sigstore/sigstore-js/blob/c1d15d1c4f10e4a6f9f02224819d5d353bf9370a/packages/verify/src/tlog/intoto.ts#L38>
const core_1 = require("@sigstore/core");
function verifyIntoto002TLogBody(tlogEntry, content) {
  // existing code ...

  const payloadTypes = [
    // omit this key
    undefined,

    // actual expected value
    envelope.payloadType,
  ];
  const payloads = [
    // omit this key
    undefined,

    // actual expected value
    content.env.payload.toString('base64'),
  ];
  const keyids = [
    // omit this key (even if it's present in the data)
    undefined,

    // actual expected value
    envelope.signatures[0].keyid,
  ];
  const sigs = [
    // omit this key
    undefined,

    // actual expected value
    core_1.encoding.base64Decode(envelope.signatures[0].sig),
  ];
  const publicKeys = [
    // omit this key
    undefined,

    // empty value, to account for the following comment being incorrect:
    // https://github.com/sigstore/sigstore-js/blob/c1d15d1c4f10e4a6f9f02224819d5d353bf9370a/packages/sign/src/witness/tlog/entry.ts#L160
    "",

    // actual expected value
    core_1.encoding.base64Decode(envelope.signatures[0].publicKey),
  ];

  const want = tlogEntry.spec.content.hash?.value;
  let found = false;
  for (const payloadType of payloadTypes) {
  for (const payload of payloads) {
  for (const sig of sigs) {
  for (const keyid of keyids) {
  for (const publicKey of publicKeys) {

    // This checks all the possible serializations of all possible combinations of
    // values for each field. Probably should break early as soon as found = true.    

    if (check({ payload, payloadType, signatures: [{ keyid, publicKey, sig }] }, want)) found = true;
    if (check({ payload, payloadType, signatures: [{ keyid, sig, publicKey }] }, want)) found = true;
    if (check({ payload, payloadType, signatures: [{ publicKey, keyid, sig }] }, want)) found = true;
    if (check({ payload, payloadType, signatures: [{ publicKey, sig, keyid }] }, want)) found = true;
    if (check({ payload, payloadType, signatures: [{ sig, publicKey, keyid }] }, want)) found = true;
    if (check({ payload, payloadType, signatures: [{ sig, keyid, publicKey }] }, want)) found = true;

    if (check({ payload, signatures: [{ keyid, publicKey, sig }], payloadType }, want)) found = true;
    if (check({ payload, signatures: [{ keyid, sig, publicKey }], payloadType }, want)) found = true;
    if (check({ payload, signatures: [{ publicKey, keyid, sig }], payloadType }, want)) found = true;
    if (check({ payload, signatures: [{ publicKey, sig, keyid }], payloadType }, want)) found = true;
    if (check({ payload, signatures: [{ sig, publicKey, keyid }], payloadType }, want)) found = true;
    if (check({ payload, signatures: [{ sig, keyid, publicKey }], payloadType }, want)) found = true;


    if (check({ payloadType, payload, signatures: [{ keyid, publicKey, sig }] }, want)) found = true;
    if (check({ payloadType, payload, signatures: [{ keyid, sig, publicKey }] }, want)) found = true;
    if (check({ payloadType, payload, signatures: [{ publicKey, keyid, sig }] }, want)) found = true;
    if (check({ payloadType, payload, signatures: [{ publicKey, sig, keyid }] }, want)) found = true;
    if (check({ payloadType, payload, signatures: [{ sig, publicKey, keyid }] }, want)) found = true;
    if (check({ payloadType, payload, signatures: [{ sig, keyid, publicKey }] }, want)) found = true;

    if (check({ payloadType, signatures: [{ keyid, publicKey, sig }], payload }, want)) found = true;
    if (check({ payloadType, signatures: [{ keyid, sig, publicKey }], payload }, want)) found = true;
    if (check({ payloadType, signatures: [{ publicKey, keyid, sig }], payload }, want)) found = true;
    if (check({ payloadType, signatures: [{ publicKey, sig, keyid }], payload }, want)) found = true;
    if (check({ payloadType, signatures: [{ sig, publicKey, keyid }], payload }, want)) found = true;
    if (check({ payloadType, signatures: [{ sig, keyid, publicKey }], payload }, want)) found = true;


    if (check({ signatures: [{ keyid, publicKey, sig }], payload, payloadType }, want)) found = true;
    if (check({ signatures: [{ keyid, sig, publicKey }], payload, payloadType }, want)) found = true;
    if (check({ signatures: [{ publicKey, keyid, sig }], payload, payloadType }, want)) found = true;
    if (check({ signatures: [{ publicKey, sig, keyid }], payload, payloadType }, want)) found = true;
    if (check({ signatures: [{ sig, publicKey, keyid }], payload, payloadType }, want)) found = true;
    if (check({ signatures: [{ sig, keyid, publicKey }], payload, payloadType }, want)) found = true;

    if (check({ signatures: [{ keyid, publicKey, sig }], payloadType, payload }, want)) found = true;
    if (check({ signatures: [{ keyid, sig, publicKey }], payloadType, payload }, want)) found = true;
    if (check({ signatures: [{ publicKey, keyid, sig }], payloadType, payload }, want)) found = true;
    if (check({ signatures: [{ publicKey, sig, keyid }], payloadType, payload }, want)) found = true;
    if (check({ signatures: [{ sig, publicKey, keyid }], payloadType, payload }, want)) found = true;
    if (check({ signatures: [{ sig, keyid, publicKey }], payloadType, payload }, want)) found = true;

  }
  }
  }
  }
  }

  if (!found) {
    throw new error_1.VerificationError({
      code: 'TLOG_BODY_ERROR',
      message: 'Envelope payload hash mismatch',
    });
  }
}
function check(obj, want) {
  // I rely on the fact that `JSON.stringify` seems to serialize keys in the order in which
  // they're declared on the object in JavaScript. This may admittedly be an incorrect
  // assumption. I also assume `JSON.stringify` omits entries where the value is undefined.
  const envelopeHash = core_1.crypto.hash(JSON.stringify(obj)).toString('hex');
  return envelopeHash === want;
}

Footnotes

  1. never mind the fact that the spec says it should be the hash of an archive.

  2. never mind that you do canonicalize it in this project.

  3. the fact that the payloadHash is over the base64 encoding of a canonically serialized JSON object (namely an intoto v1.0 statement for npm provenance) is not relevant here because this value is included in the data being validated.

  4. though at least it says it should be the thing it currently is.

@bdehamer
Copy link
Collaborator

Congrats on getting verification of the hash value working! I don't think I ever tried to brute force all of the possible envelope permutations 😆

You are correct, we have purposefully decided NOT to verify the hash value due to the reasons you cited. Working with the maintainers of Rekor and the other Sigstore client libraries we settled on an alternative approach which provides the same guarantee about the TLog entry while avoiding the problems with the hash value.

I was mistaken to suggest that the dsse type was created to address this particular issue (there were other issues with the intoto type which lead to its introduction).

I appreciate the effort you've put into this analysis, however, I'm not inclined to make any changes to the verification algorithm unless there is reason to believe the current approach is exploitable in some way. Our use cases extend beyond the narrow scope of signing/verifying npm packages so its necessary to be conservative about changes which could impact compatibility.

@ericcornelissen
Copy link
Author

You are correct, we have purposefully decided NOT to verify the hash value due to the reasons you cited. Working with the maintainers of Rekor and the other Sigstore client libraries we settled on an alternative approach which provides the same guarantee about the TLog entry while avoiding the problems with the hash value.

[...]

I appreciate the effort you've put into this analysis, however, I'm not inclined to make any changes to the verification algorithm unless there is reason to believe the current approach is exploitable in some way. Our use cases extend beyond the narrow scope of signing/verifying npm packages so its necessary to be conservative about changes which could impact compatibility.

Fair enough, I'll close this now with my original question answered.

For the record, I don't think there's any problems not verifying it since it could be recomputed by an attacker anyway and isn't communicated out-of-band. In particular, anyone editing the data in the envelope could also update the hash.value because it's part of the same JSON object - except for envelope.payload but for that you do check the hash separately (with the payloadHash).

It does still leave me wondering why the hash is there (same for envelopeHash in the DSSE rekord type) 🤔

I was mistaken to suggest that the dsse type was created to address this particular issue (there were other issue with the intoto type which lead to its introduction).

Thanks for clarifying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants