Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Any resource server implementation is forced to couple authorization and storage #379

Closed
RubenVerborgh opened this issue Feb 15, 2022 · 11 comments
Assignees

Comments

@RubenVerborgh
Copy link
Contributor

RubenVerborgh commented Feb 15, 2022

No description provided.

@kjetilk
Copy link
Member

kjetilk commented Feb 22, 2022

As previously stated, I think there is a leap here from "resource state" to "require access to storage". Access to resource state is just that (and can be implemented by looking up in a trie or a counting bloom filter), that resource state is required to be known before authorization is a requirement that comes from that authorization is a heavy operation and should not be done needlessly. In this case, it is also important for UX, as Tim argued on the old issue.

@csarven csarven self-assigned this Mar 21, 2022
@csarven
Copy link
Member

csarven commented Mar 25, 2022

I will follow-up with a PR based on the information here and in other issues to further clarify the spec. There is no new information in this comment above and what's discussed elsewhere. I'm merely trying to expand/clarify FWIW.


The storage and the authorization components are not intended to be coupled, and one does not require knowledge about the other. If there is text implying this, we can clarify.

To take an example, an authorization system such as WAC does not require knowledge about the resource state as part of its #authorization-evaluation . Its evaluation of an authorization is concerned with finding Authorizations that match the required parameters of an operation ( #authorization-evaluation-applicable ). WAC instructs a server whether to allow or deny operations upon a resource.

It is the responsibility of the server/system to gather relevant information - not limited to storage and authorization, or in any particular order - and determine the most appropriate status code to respond with.


In response to the table in #379 (comment) (dated 2022-02-22):

1 GET|HEAD|OPTIONS C/R Read Write 403 404
2 DELETE C/R – Read 403 404
3 DELETE C/R Read 403 404
4 DELETE C/ Read - 403 404 (for C/ exists; same as 3)
5 DELETE C/R Append Read 403 404
6 DELETE C/R Write Read 403 404

Servers responding with 401, 403 or 404 will receive a test outcome of "pass" depending on the scenario.

When the authorization evaluation results in access denied, the server must translate that to "forbidden" and be able to respond with 403, unless another status code is more suitable (due to other information that's taken into account).

401 is as advertised; if valid authentication credentials is missing or the one provided is refused. Information about the latter case could be obtained from the authorization system. A 403 response could potentially reveal less information with the caveat of being less useful than 401, which may be suitable in some environments.

The 404 is deemed to be useful if/when a server has a holistic view of relevant permissions and the state of the target resource, and wants to put that knowledge to use. In the above cases (rows 1-6), since the existence of the contained resource can be known either by reading the container (C/) or reading the contained resource (C/R), the response with 403 would not minimise information leakage. 404 instructs the client more clearly on the nature of the rejection within the context of their access permissions. Had there not been Read on C/ or C/R - as shown in the other rows in the original table - the server does not need to check storage and can respond with 403.


The above on 404 is also consistent with RFC 7231 and the Solid Protocol, in particular ( #server-post-target-not-found ):

When a POST method request targets a resource without an existing representation, the server MUST respond with the 404 status code.

Provided that the existence of a resource can be known as per proper authorization, 404 is applicable for the following HTTP methods: GET, HEAD, POST, DELETE.

I've updated the table for POST /C/ (quoting below):

/ C/ C/ exists C/ doesn't exist
- - 403 403
- Read 403 404
- Append 201 403
- Read,Append 201 404
Read - 403 404
Read Append 201 404

(There is no conflicting change. It makes it more clear by showing parent container of C/ and removes C/R which was initially used as a way to understand behaviour involving Slug and C/R potentially existing. This was previously clarified with "Servers allocate unique URIs to resources on POST C/ requests. "C/R exists" is not applicable." in any case.)

https://solidproject.org/TR/2021/protocol-20211217 specifies the request semantics for requests using the POST method targeting a container, i.e., POST /C/ is responded with 201 status and Location: R in HTTP headers.

As per RFC 7231, the request semantics of the POST method also allows:

  • creating the resource /C/ and responding with 201 status without the Location header.
  • updating the resource /C/ and responding with 200 status without the Location header.

However, as per #server-post-target-not-found , POST /C/ or POST /C/R alone - in absence of specifying additional semantics - will not create those target resources when they don't exist.

The request semantics to update the state of /C/ - besides adding resources - is not specified with POST /C/.

There is WIP for updating the state of an existing RDF bearing resource /C/R: #305 . If/when that is resolved, we'll add a table for POST /C/R.

I'll leave the finer details of request semantics, operations and access modes to the WIP table in solid/web-access-control-spec#85 (comment) . (That comment should perhaps move to solid/specification as a new issue.)


7 PATCH C/R (Read) – Write 403 404

Is "(Read)" to be interpreted as payload without Insert or Delete - similar to row 2 in the original table?

I see this row 6 (403/404) in the same category as rows 1-6.

8 PATCH C/R (Insert) – Write 200 403
9 PATCH C/R (Write) – Write 200 403

Is "(Write)" intended to include all write operations or did you mean "Delete" - as currently described in #n3-patch ?

I think the information on whether the graph of a patch resource is matched is missing in this table for Delete. (The table in issue 14 has a "Match" column.)

Comment in issue 14 needs to be updated for PATCH text/n3. I'll come back to this.

10 PUT C/R – Write 200 403

At the time of the writing, the status codes in the table were intended to communicate the most suitable ones that the server can respond with - there are always exceptions but most relevant ones were mentioned at the beginning of the comment in issue 14. If we put the table/rows/status codes aside for a moment, we essentially have the following to work with:

System needs to inquire if the set of operations is permitted by the authorization system.

System needs to determine the state of resource state.

System ...

Server responds.

@csarven
Copy link
Member

csarven commented Mar 29, 2022

The authorization system carries access permissions granting agents the ability to perform operations on resources. The authorization system can instruct the server as to what's permitted. It is about the identifiers, not the resource states. There is no information in the authorization system about the current state of a resource (unless otherwise specified by a specification). Whether an operation is actually committed by the storage system is orthogonal. Similarly, as part of completing an operation, the server may want to instruct the authorization system to update its entries without also having to inform it about resource states.

@csarven
Copy link
Member

csarven commented Apr 5, 2022

The authorization component in the Solid Protocol does not include (or expect) knowledge about resource states. Neither do the access control specs (such as WAC) require knowledge about resource states in its data model or authorization evaluation. It is all about the URI of a resource.

Related: We've also ruled out authorization rules on individual representations of a resource in an issue - pardon me, the link to the repo/issue escapes me right now but Kjetil, Ruben and I (and Tim here and there, at least) discussed, resolved..

So, the notion of resource state is not factored into the authorization system. Knowledge about a resource actually existing - having a current representation - or not is not entering the authorization system. Just the identifier.

Having knowledge about a URI does not entail past, current or future state of the resource.The authorization system tracks/manages authorization rules but it doesn't specifically need to know anything about the resource state. Authorization rules are atomic / say something specific based on the identifiers. We check whether the authorization system contains the rules we are interested in.

That's my understanding of the specs but happy to be corrected. We need to agree on the fundamentals as currently written. If the architecture is insufficient, e.g., missing information about resource states, then we can have that as a separate discussion.

So for the inquiry:

Check if agent A has permission to Create a resource R

one way to express it more precisely as follows:

Check if agent with URI <A> has permission to Create a resource with URI <R>.

Or generally:

Can we create this URI... ?

Or more concrete:

Given a the request semantics involving POST targeting a URI with specific resource semantics, e.g., a container, what would be the inquiry? How about with PUT targeting a specific URI? They smell like a "create" operation.

@csarven
Copy link
Member

csarven commented Apr 5, 2022

Those questions are summaries. What are the possible inquiries in code, e.g., in SPARQL, that follows the request semantics, e.g., of PUT /C/R?

I don't see any unit of information about the resource state.

(I'm okay to continue, close this issue, or discuss elsewhere. I'll wait for Kjetil and others to respond... I don't have new information to add or a clarification to make right now.)

@kjetilk
Copy link
Member

kjetilk commented Apr 5, 2022

I expressed some time ago that this is white-board material, and that's still where I stand. I think we need that kind of conversational bandwidth when we fail to understand each other.

I might also add that I think implementations ought to have data on resource state available as early as possible, so I admit to have been a bit indifferent on the issue, but I would be interested in participating in a whiteboard session nevertheless.

@elf-pavlik
Copy link
Member

The authorization component in the Solid Protocol does not include (or expect) knowledge about resource states. Neither do the access control specs (such as WAC) require knowledge about resource states in its data model or authorization evaluation. It is all about the URI of a resource.

I believe this approach can only satisfy a subset of Authorization Use Cases. Maybe with an exception of requiring an additional system to observe the storage and based on changes to the state of data reflect more complex policies into simpler rules that don't depend on the state of data.


Could we discuss it during one of the upcoming AuthZ panel meetings?

We also have various use cases where the authorization rule depends on

  • information about who created the resource (proposed acp:creator and acp:CreatorAgent as well as the interop:creatorAccessMode)
  • Information about the time of the creation (proposed Time matcher in ACP)

In interop we also support inheritance based on relationships in data (eg. all tasks of specific project), currently, we assume that there will be a party observing the storage and translating policies dependent on relationships expressed in data to simpler rules which reflect the same policy without depending on the data anymore. But that approach might be sub-optimal.

@matthieubosquet
Copy link
Member

matthieubosquet commented Apr 6, 2022

  1. the storage component needs knowledge about authorization
  2. authorization requires a separate access to storage (requiring at least 2 storage accesses per request)

The idea in ACP is that there is an interface between a resource server (Solid RS) and an ACP engine (AS).

  1. The resource server (or Solid Server) is authoritative in describing the request (including saying who might be the owner(s) or creator(s) of a resource; who is the requesting agent; the time at which the request was initiated...) and its associated authorization graph (the full ACL representation);
  2. The ACP engine checks if the request satisfies the authorization graph and returns a set of access modes (and potentially further description, it is easy to imagine, for example an associated shape or something else...).

The AS in this scenario has no knowledge of whether a resource exists or not, it will say: "given request x and authorization graph y, the access modes z are granted".

It is up to the Solid server to decide on which status code it will use to respond.

Would specifying the way(s) a Solid server can interact with any Authorization server be something we could do? Maybe specifying a request description interface?


Note: @elf-pavlik: The ACP time matcher is not based on resource creation time, it is based on resource access time (time of the request).

@matthieubosquet
Copy link
Member

Yes, as per what the table presents.

The above table gives (a part of) a detailed view of Solid operations. I am not sure there is a better abstraction for the Solid Server than being aware of a set of modes and the operations they entail.

I can imagine pushing the abstraction further and having the AS sending back a boolean but then the AS would have to be aware of the setup required for Solid operations to be performed which sound more like a protocol concern to me… it also feels like more potential for duplication of information, trouble for evolution and less clear separation of concerns (an ACL/ACP/ODRL… engine should not be usable only with Solid and for that matters a Solid Server should not have to be aware of which authz system is in use).

@woutermont
Copy link
Contributor

Coming back to this because of my comment on #14, which brings into focus again the preference for 403 as proposed by @humont, and 'solves' cases 1-7 of this issue.

@woutermont
Copy link
Contributor

Re cases 8-10 in particular, and this issue in general, I would like to add, however, that I don't think there is really any problem. Moreover, 'solving' it by changing the expected status codes for those 'few' problematic cases does not take into account that as soon as more operations need to be distinguished, more such cases will pop up. The 'issue' is thus more fundamental than that.

Both in this issue and in solid/web-access-control-spec#97, @RubenVerborgh has characterized the problem with those cases as either [A] coupling the authorization system too tightly to knowledge of the resource system or [B] coupling the resource system too tightly to knowledge of the authorization system, both of which prevent reductive request processing (a purely sequential reduction of complexity of a request).

I would like to argue that the perspectives provided by @acoburn and described by @matthieubosquet (also supported by @kjetilk) do not fit this characterization: the resource system is authoritative in describing the request; the authorization system should not be responsible for producing a 403, but only for checking the authorization graph and identifying what operations a particular agent can perform on a particular resource. *

[@RubenVerborgh:] So then it is up to the storage system to interpret those access modes, so hence the storage component needs knowledge about authorization (in this case access modes).

Yes, and no: the resource system indeed interprets the access modes, but this was its own knowledge to begin with, not that of the authorization system! Access modes (or 'scopes') are just sets of operations the resource system can perform. The semantics of these operations are defined for the resource system itself, not for the authorization system. It is the resource server that, in the first place, provides knowledge of these operations and sets thereof to the authorization system.

Note that an authorization system that maps authentication information onto operations can perfectly form a layer in reductive request processing: a request passes once, is reduced in complexity by the mapping, and no communication from the resource system to the authorization system is needed.


* This is, not coincidentally, the way the widely-adopted OAuth family of authorization mechanisms works: the authorization server provides a token containing the allowed operations; the resource server checks if the request indeed concerns one of those operations. To the authorization server, operations are just opaque ans static references provided by the resource server, rather than meaningful or dynamic knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants