Skip to content
This repository has been archived by the owner on Jun 14, 2022. It is now read-only.

How to do flattening ? #37

Open
rohe opened this issue Jan 11, 2019 · 31 comments
Open

How to do flattening ? #37

rohe opened this issue Jan 11, 2019 · 31 comments

Comments

@rohe
Copy link
Owner

rohe commented Jan 11, 2019

One of the corner stones of this draft is to allow a trust anchor (federation operator) or for that matter any intermedia entity to limit/restrict the metadata of a leaf entity (e.g. RP/OP).
If for instance the trust anchor decides that the only allowed signing algorithms are elliptic curve algorithms then the evaluated metadata of a leaf entity must only contain such algorithms.

The first attempt at supporting this is what's in the draft right now. A simple system that allows a superior to state the limits of what a leaf entity's metadata can look like.

The reality is of course a bit more complex than what the model assumes.
Take the contacts claim as an example. Here you probably want the subordinates to add their contacts to the ones that are defined by superiors.

Or imaging that a claim (there is none in OP or RP metadata to my knowledge) that has an integer as value. In some cases that value could be something on a scale in which case the superior might want to state you can't have a value >= 10 or < 5. In other cases the value would instead be one in a set and not something you could handle as a something on an ordinal scale.

It would be nice if we could find simple rules one for each value type disregarding what claim it was.
If we got into exceptions (like contacts mentioned above) they would have to be very few anything else would be a disaster.

@alejandro-perez
Copy link
Contributor

alejandro-perez commented Jan 11, 2019

We could define a bunch a "policies" for flattening, being the default "is_subset".
Another one could be "ignore", which would mean that any definition in lower MS should prevale.
Another one could be "is_superset", meaning you can add but not remove (eg. imagine a "notification" claim containing a list of emails that should be notified in case of issues).

Finally, we define an additional claim called "policy" being a JSON object, each key is a claim name, and the value is the policy name to apply:

[.....],
"policy": {
   "contacts": "is_superset",
   "a_random_claim": "ignore"
}

This would allow for further policies to be defined.

Just an idea though :)

@alejandro-perez
Copy link
Contributor

I thought of two more: max and min. max is the one applied implicitly to exp, for instance.

@alejandro-perez
Copy link
Contributor

alejandro-perez commented Jan 11, 2019

Another one would be frozen/fixed/readonly. Indicating it MUST stay as it is. I.e. cannot be redefined.

@alejandro-perez
Copy link
Contributor

IMO policy should be "is_superset", so lower MS could add policy to undefined claims, but cannot modify those already having a policy.

@alejandro-perez
Copy link
Contributor

Alternatively, if this was desired to allow more complexity, a policy could be a JSON object itself, having a name and a set of parameters, but IMO that would be too much complexity.

"policy": {
   "contacts": "is_superset",
   "a_random_claim": "ignore",
   "some_integer_claim": {"name": "range", "min": 5, "max": 15}
}

@nckroy
Copy link

nckroy commented Jan 11, 2019

Here is another set of policies:
must_supply_any_value - the claim must be present with one or more values, but the values required are unspecified
must_supply_uri_value - the claim must be present with a value that is a valid URI, values required are unspecified
must_supply_url_value - the claim must be present with a value that is a valid URL, values required are unspecified
must_supply_https_value - the claim must be present with a value that is a valid HTTPS URL, values required are unspecified
must_supply_urn_value - the claim must be present with a value that is a valid URN
must_supply_email_value - the claim must be present with a value that is a valid e-mail address

@nckroy
Copy link

nckroy commented Jan 11, 2019

These are meant to allow a superior to require specific types of values being supplied, but not what those values have to be. This would support, for example, InCommon's Baseline Expectations program: https://spaces.at.internet2.edu/display/BE/Implementing+Baseline+Expectations+in+InCommon+Metadata

@rohe
Copy link
Owner Author

rohe commented Jan 11, 2019

Looks like we're about to define a domain specific language.
Different policies can be applied to claims dependent on the claim value type.
Sounds interesting but complex. Also, I'd like to see how a set of policies defined in entity statements in a trust chain are interpreted. How to 'flatten' policies :-/

@alejandro-perez
Copy link
Contributor

Looks like we're about to define a domain specific language.

Yeah :(. I would not think of it to be that complex, though.

Sounds interesting but complex. Also, I'd like to see how a set of policies defined in entity statements in a trust chain are interpreted. How to 'flatten' policies :-/

In some previous comment I mentioned that "policies" should implicitly be of type "is_superset", so lower MS could add policies, but could not delete or modify existing ones.

But all of this could be defined on a different document. It could even be defined just for a particular federation. Or a federation could just define on their SLA agreements how claims should be flattened (I think we already discussed this in Copenhagen, didn't we?).

In any case, those were my 2 cents :)

@rohe
Copy link
Owner Author

rohe commented Jan 12, 2019

I don't believe in federations defining their own flattening functions. That would lead to an interoperability nightmare.
We need one model for how flattening is done.
Now, @alejandro-perez what you propose gets me thinking about claims requests using the claims parameter https://openid.bitbucket.io/connect/openid-connect-core-1_0.html#IndividualClaimsRequests . If we could do something similar to that ...

@rohe
Copy link
Owner Author

rohe commented Jan 12, 2019

@nckroy You must remember that a leaf entity in the general case may not know which federations it belongs to (and it shouldn't have to).
Think of it this way: every leaf entity states that this is what I'm able to do, whom ever I talk to and then have the policies reduce this to what actually is going to be used. So having a policy that says you have to have this or you have to have that just doesn't work. For instance, what would the leaf entity do if it belonged to 2 federations with conflicting views on which crypto algorithms to use. One stating that everyone should use RS256 and the other adamant about ES256 being used ? The idea in the draft is that the leaf just states "I can do RS256 and ES256" (provided it can) and then at run time have that statement filtered by the policies such that if you chose to work within federation-RS256 then the resulting metadata statement for the leaf would only list RS256 and vice versa.

@rohe
Copy link
Owner Author

rohe commented Jan 12, 2019

Ok, thought a bit more along the claims request path and came up with these examples:

OP metadata policy:

{
  "scopes_supported": {
    "subset_of": ["openid", "email", "profile"]
  },
  "claims_parameter_supported": {
    "value": true
  },
  "op_policy_uri": {
    "default": "https://op.example.com/policy.html"
  }
}

RP metadata policy:

{
  "response_types": {
    "subset_of": ["code", "code token"]
  },
  "grant_types": {
    "subset_of": ["authorization_code", "implicit"]
  },
  "application_type": {
    "value": "web"
  },
  "contacts": {
    "add" : "support@federation.example.com"
  },
  "policy_uri": {
    "add": "https://federation.example.com/policy.html"
  },
  "id_token_signed_response_alg": {
    "one_of": ["ES256", "ES384", "ES512"]
  },
  "token_endpoint_auth_method": {
    "value": "private_key_jwt"
  }
}

The pattern should be obvious. The key words are:

subset_of: Only these values are allowed. If the list of allowed values are ["A","B","C"] and the OP lists ["A","C","D"] as its values. The flattening would result in the set ["A","C"].

value: The value of this claim is fixed to this one allowed value.

one_of: The value of the claim can be one of the listed

add: There is no limitation of which values to use. This value should be added to the resulting list.

default: If no other value is given this one should be used.

Just an idea :-)

@rohe
Copy link
Owner Author

rohe commented Jan 13, 2019

Did a Proof-of-concept implementation and this is what I get:

The Federation Operators policy:

{
    "scopes": {
        "subset_of": ["openid", "eduperson"]
    },
    "response_types": {
        "subset_of": ["code", "code id_token"]
    }
}

The organisations policy:

{
    "contacts": {
        "add": ["helpdesk@example.com"]
    },
    "logo_uri": {
        "one_of": ["https://example.com/logo1.jpg", "https://example.com/logo2.jpg"],
        "default": "https://example.com/logo1.jpg"
    },
    "policy_uri": {
        "value": "https://example.com/policy.html"
    },
    "tos_uri": {
        "value": "https://example.com/tos.html"
    }
}

The metadata statement from the RP:

{
    "contacts": ["rp_admins@cs.example.com"],
    "redirect_uris": ["https://cs.example.com/rp1"],
    "response_types": ["code"]
}

And the result after applying the policies:

{
    'contacts': ['rp_admins@cs.example.com', 'helpdesk@example.com'],
    'redirect_uris': ['https://cs.example.com/rp1'],
    'response_types': ['code'],
    'logo_uri': 'https://example.com/logo1.jpg',
    'policy_uri': 'https://example.com/policy.html',
    'tos_uri': 'https://example.com/tos.html'
}

@alejandro-perez
Copy link
Contributor

LGTM

@rohe
Copy link
Owner Author

rohe commented Jan 14, 2019

That's all the confirmation I needed :-)

@daserzw
Copy link

daserzw commented Jan 15, 2019

Looks great to me as well!

Other considerations:

  • it would be great to have a standard set of policies that MUST be implemented.
  • extensions to the standard set are possible through other specs or profiles.
  • a regex policy would probably be a useful addition to the standard set --- it should be applied to strings and list of strings.

@nckroy
Copy link

nckroy commented Jan 15, 2019 via email

@nckroy
Copy link

nckroy commented Jan 15, 2019

Roland, would it be possible for the Federation Operators policy to include a requirement to supply at least one contact that is an email address? Eventually, would it be possible for the Federation Operators policy to require at least one contact that is an email address, and a specific type, for example "Technical Contact"?

@alejandro-perez
Copy link
Contributor

alejandro-perez commented Jan 15, 2019 via email

@rohe
Copy link
Owner Author

rohe commented Jan 15, 2019

By using regex as @daserzw proposes you can probably check that a contact is in fact an email address. But there is no way to demand that there is a value at all. At least not for the time being.

In https://openid.bitbucket.io/connect/openid-connect-core-1_0.html#IndividualClaimsRequests there is the verb essential but it's always up to the supplier of the information (in the case of claims requests the OP) to do as they pleases.There is no way to directly enforce anything.

The only thing you can do is run a checking service that runs around and validates the metadata for all the RPs/OPs in the federation.

@nckroy
Copy link

nckroy commented Jan 15, 2019 via email

@c00kiemon5ter
Copy link

c00kiemon5ter commented Jan 16, 2019

Re:regular expressions, that has proven to be problematic for things like shibmd:Scope in the SAML world due to the proliferation of different regex implementations in different languages/frameworks.

You can define the regex to be of a certain standard; see POSIX BRE, POSIX ERE, PCRE, etc. Pick one to dictate the valid expression -- ie, don't depend on an implementation or programming language..
Notice, that different standards have different capabilities, and these affect the performance of the implementation.


  • a regex policy would probably be a useful addition to the standard set --- it should be applied to strings

why not numbers, too?

and list of strings.

why a list of strings and not a list of list of strings? I guess, what we care about are the items of those lists, no matter how nested they are. Even though, I have no use case, should this be limited by the standard?


What happens when the regex (or any rule for that matter) is applied to an incompatible value-type?


By using regex as @daserzw proposes you can probably check that a contact is in fact an email address. But there is no way to demand that there is a value at all.

an empty value will not match the regex rule; but, I think what you're saying is that the rule will not be checked, if no such claim is in the response.


"subset_of": ["code", "code id_token"]

is id_token code a valid subset? I think it should be.
Sets by definition are not ordered.
A better representation would be a list of sets:

"subset_of": [{"code"}, {"code", "id_token"}]

This allows you to implement this check (using Python's set datatype) as:

rulesets = [{"code"}, {"code", "id_token"}]
value = set("id_token code".split())
any(value.issubset(rs) for rs in rulesets)

appropriate datatypes are there for other langs.

@rohe
Copy link
Owner Author

rohe commented Jan 16, 2019

@nckroy We can include essential as a key word but what will the consequences be ?
To go back to something I said a while ago. If we are talking about the general case, not the special 1-level deep version federations today use, then a leaf entry may not know which federations it belongs to so it will not know what policies that will be applied to its metadata.

This together with Andreas leading point: A leaf entry MUST, disregarding if it's an RP or an OP, have an identity that is independent on who it's going to talking to and which federations it belongs to.

Will result in a leaf entity publishing: This is what I can do !

What it doesn't mean is that the metadata used in one context (an RP talking to an OP within the confines of one federation) is absolutely the same as in another context. But what it has meant so far is that an entity's basic view of itself is the same in all contexts.

Whether an entity's metadata lives up to the expectations of a certain federation can be check by anyone that can collect the trust chain starting with the leaf entity and ending in the trust anchor of the federation.

Such a check will definitely happen at run time when 2 parties are gathering metadata about each other.

I guess the best one can do is have the members in the federation be responsible for the entities they own.

@daserzw
Copy link

daserzw commented Jan 16, 2019

Re:regular expressions, that has proven to be problematic for things like shibmd:Scope in the SAML world due to the proliferation of different regex implementations in different languages/frameworks.

You can define the regex to be of a certain standard; see POSIX BRE, POSIX ERE, PCRE, etc. Pick one to dictate the valid expression -- ie, don't depend on an implementation or programming language..
Notice, that different standards have different capabilities, and these affect the performance of the implementation.

I totally agree.

  • a regex policy would probably be a useful addition to the standard set --- it should be applied to strings

why not numbers, too?

Maybe I'm missing your point, but usually regex does not know about arithmetic and/or quantity, right? So basically numbers will be treated as characters in a string.

and list of strings.

why a list of strings and not a list of list of strings? I guess, what we care about are the items of those lists, no matter how nested they are. Even though, I have no use case, should this be limited by the standard?

Probably not, but consider that the less specific we are, the less clean and interoperable implementations will come out. So, for example you can have also a

What happens when the regex (or any rule for that matter) is applied to an incompatible value-type?

IMO this is a very good question. I think there are two main strategies to deal with that:

  1. keeping a map of all the claims and matching policies, and then fire an error if you spot a wrong match --- it's really cumbersome...
  2. you have basically single-valued and multiple-valued claims:
    2.1. subset and add can be applied only to multiple-valued claims.
    2.2. one_of, value, default can be applied only to single-valued claims.

By using regex as @daserzw proposes you can probably check that a contact is in fact an email address. But there is no way to demand that there is a value at all.

an empty value will not match the regex rule; but, I think what you're saying is that the rule will not be checked, if no such claim is in the response.

"subset_of": ["code", "code id_token"]

is id_token code a valid subset? I think it should be.
Sets by definition are not ordered.
A better representation would be a list of sets:

"subset_of": [{"code"}, {"code", "id_token"}]

This allows you to implement this check (using Python's set datatype) as:

rulesets = [{"code"}, {"code", "id_token"}]
value = set("id_token code".split())
any(value.issubset(rs) for rs in rulesets)

appropriate datatypes are there for other langs.

@rohe
Copy link
Owner Author

rohe commented Jan 16, 2019

  • a regex policy would probably be a useful addition to the standard set --- it should be applied to strings

why not numbers, too?

Why not :-)

and list of strings.

why a list of strings and not a list of list of strings? I guess, what we care about are the items of those lists, no matter how nested they are. Even though, I have no use case, should this be limited by the standard?

There is a fine line between having a solution that covers the 'known' universe of data types and one that covers everything we can think up. I've by design not considered JSON objects for instance !
I think we should stay with simple data types that we know are in use.

What happens when the regex (or any rule for that matter) is applied to an incompatible value-type?

It MUST fail !

By using regex as @daserzw proposes you can probably check that a contact is in fact an email address. But there is no way to demand that there is a value at all.

an empty value will not match the regex rule; but, I think what you're saying is that the rule will not be checked, if no such claim is in the response.

Correct !

"subset_of": ["code", "code id_token"]

is id_token code a valid subset? I think it should be.

This is something I've always have though problematic with the standard.
It's sort of a set but at the same time not.
https://openid.net/specs/oauth-v2-multiple-response-types-1_0.html
defines the set of response types and they are ordered lists.
In reality though ,any decent OAuth2/OIDC library treats them internally as sets.

@c00kiemon5ter
Copy link

c00kiemon5ter commented Jan 16, 2019

This is something I've always have though problematic with the standard.
It's sort of a set but at the same time not.
https://openid.net/specs/oauth-v2-multiple-response-types-1_0.html
defines the set of response types and they are ordered lists.
In reality though ,any decent OAuth2/OIDC library treats them internally as sets.

From the linked document:

  • Multiple-Valued Response Types

The OAuth 2.0 specification allows for registration of space-separated response_type parameter values. If a Response Type contains one of more space characters (%20), it is compared as a space-delimited list of values in which the order of values does not matter.

I would say it is an unordered list; this differs from a set in the sense that it can contain duplicate entries.

@rohe
Copy link
Owner Author

rohe commented Jan 16, 2019

Well, what I've always been baffled about is that they actually explicitly registered multiple valued response types. Why not just say you can combine response types and that on the wire they must be represented as strings with space separated values. Separating the value (a set of values) and the encoding.
But alas no. Which has lead to some implementors assuming/believing that the registered values are the only allowed according to the standard.

@rohe
Copy link
Owner Author

rohe commented Jan 16, 2019

I want us to use "subset_of": ["code", "code id_token"]
instead of "subset_of": [{"code"}, {"code", "id_token"}]
since the standard says the values are of the form "code id_token".
We just have to make the comparison function smart enough to understand that for response_types,
"code id_token" is equivalent to "id_token code".

@daserzw
Copy link

daserzw commented Jan 16, 2019

I think "subset_of": ["code", "code id_token"] can be deserialized as a set of set, something like:

metadata_policies = {"subset_of": ["code", "code id_token"]}
subset_of_ruleset = Set([Set(rule.split()) for rule in metadata_policies["subset_of"]])
value = set("id_token code".split())
any(value.issubset(rs) for rs in subset_of_ruleset)

@c00kiemon5ter
Copy link

 metadata_policies = {"subset_of": ["code", "code id_token"]}
-subset_of_ruleset = Set([Set(rule.split()) for rule in metadata_policies["subset_of"]])
+subset_of_ruleset = [set(rule.split()) for rule in metadata_policies["subset_of"]]
 value = set("id_token code".split())
 any(value.issubset(rs) for rs in subset_of_ruleset)

@daserzw
Copy link

daserzw commented Jan 16, 2019

simpler is better ;-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants