-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scalability of writing ACP to every resource #206
Comments
Can you give an example of the policy you are thinking of and what your interpretation of its working is meant to be, and which documents you are basing that on? |
I'm just looking at https://github.com/solid/authorization-panel/blob/main/proposals/acp/index.md |
Thanks. So I think that does not apply to the minimal extension to WAC described here (part of PR201) which allows the sharing of "policies" or partial access control rules, by editing the ACR to point to an axisting "Rule". |
Yes, AFAICS, WAC references one and only one relevant ACL resource, which may sit in a hierarchy. Its algorithm for locating the resource is more complex than ACPs, ACPs algo for finding the ACR is very simple since it follows every resource, but ACR algo for propagation is then more complex. Not complex as in hard to do, complex as in that I fear its runtime for Big Data[tm]. |
I find The propagation problem could be solved by using <> :include <../.acl> to each newly created ACR. Any client that wanted to break the inheritance link could remove that, and add extra Rules individually. In order to enforce in a coherent way that the owner of the POD always has access all that is needed is for each resource to contain the header |
I must admit that I do not understand how your comment is relevant to the scalability problem I try to address here, @bblfish . It is not that it can't be done with the tools that we have, it is even pretty easy. We're probably talking past each other. The problem is what the time complexity of the operation is; even if it is easy to do programmatically and the semantics is straightforward, it can have high time complexity. Now, I fully admit that I'm not particularly knowledgeable about the available algorithms to use to implement this stuff, it may well be that there exists some implementation that solves this problem, but I feel that the intuition that a URI space is a trie, and that a trie is worst case O(n) is fairly well established. If that intuition is correct, then the propagation mechanism won't scale for non-trivial datasets. |
The relevant text you are referring to is this one I think:
I think we agree that this entails that a change to a policy higher up the tree (closer to the root) would require the server to go through all child, grandchildren, ... ACRs, check them and add links to the new policy or even delete some links to no longer existing ones. In the Issue 210: add :include relation proposal I just opened, a new resource would just need to add an So with |
Right, that's a good point! Yeah, you'd have to examine the entire branch up to the root to determine access right, but that is likely to be more manageable, especially with caching strategies (which would also be easier). I tend to think that WAC's easily identifiable single access control is a reasonable starting point (on the balance of Einstein's "Everything should be as simple as possible, but not simpler"), and that inheritance should be added later once we know the scalability properties, but I can see how your proposal makes this more feasible. |
I'd like to understand the scalability of the decision to write access controls along with every resource. I'm in pretty deep waters here, my intuition may be completely off, but at the very least, we should possibly provide some implementation advice, so here goes:
If we write an access policy to all resources, I assume that we need to block all access until all these are written. If not, we might have a situation where a controller writes an access policy, only to see it take effect much later. So, we have to make sure this operation can be done quickly.
Intuitively, a URI space is represented by a trie, and searching a trie has an average time complexity of O(log n) and a worst case of O(n). Since we know nothing on how people will organize their URI space, I think we should assume the worst case.
My thinking is further that n is far from a constant, I certainly hope that Solid takes a lion's share of an exponential future, so n(t) ~ exp(t). If the trie assumption is correct, then the write speeds will therefore get exponentially slower in the worst case or linearly slower in the average case when we get bigger data. In addition, for every resource, there's an insert into a set, but I suppose that's a constant time operation these days.
I suppose that volatile memory is getting exponentially faster, and SSDs seems to be going that way too, but HDDs aren't, they have had a linear performance development. And then, there's the price/performance. I haven't been following that too closely.
It had me worried to read that every change of permissions would have to be propagated this way, could anyone please comfort me on this one?
The text was updated successfully, but these errors were encountered: