-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access Denied when IAM policy give access (Read/Write/Listing) to only a prefix area #847
Comments
The reason it is a problem is that I want to let a user upload data to "their" area, without being able to see other areas in the bucket. E.g., this fails in import pandas
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=list('abc'), index=['foo', 'bar'])
filename = 'fubar.csv'
s3loc = f's3://{topdir}/{filename}'
df.to_csv(s3loc) |
I guess I can circumvent with io.StringIO() as f:
df.to_csv(f)
boto3.resource('s3').Bucket(bucket_name).Object(f'{prefix}/{filename}').put(Body=f.getvalue()) |
It would be useful to see the exception: the calls in s3fs and the response message. You can also turn on logging for "s3fs" to see the chain of calls being made. I notice that you provide a region for s3fs but not for boto3, which is a possible difference between the two.
This is indeed puzzling, suggests that something is being cached, perhaps even on AWS's side.
Of course, s3fs is only a convenience layer, so no one is required to use it :) |
As I bumped again against this today, I thought I would provide a bit more info. First thing: the The Attached are (sanitized) log files and stack traces from 4 cases:
|
Thanks for the info, @pdemarti . I'll try to decode it. This was generated using your code at the top of this thread? |
Not exactly (there is error capture, logger setup, and sanitizing of the stack traces and log files). But, basically, the calls to |
What version of fsspec do you have? The pandas route is trying to ensure that the "directory" exists and is writable, but it doesn't have the permissions (rightly) to check anything above the given path; this code allows to ignore PermissionError since 4 months ago. |
I'm not sure what we should do about this error. When doing ls(path), there are three possibilities:
It's checking the second possibility that is throwing the error. In fact, it should be FileNotFound, right? |
The following would change ls(no-directory) to FileNotFound: @@ -996,14 +996,17 @@ class S3FileSystem(AsyncFileSystem):
else:
files = await self._lsdir(path, refresh, versions=versions)
if not files and "/" in path:
- files = await self._lsdir(
- self._parent(path), refresh=refresh, versions=versions
- )
- files = [
- o
- for o in files
- if o["name"].rstrip("/") == path and o["type"] != "directory"
- ]
+ try:
+ files = await self._lsdir(
+ self._parent(path), refresh=refresh, versions=versions
+ )
+ files = [
+ o
+ for o in files
+ if o["name"].rstrip("/") == path and o["type"] != "directory"
+ ]
+ except IOError:
+ pass |
One issue with this is that if IAM only allows read access (listing and get object) to
Also, S3 being an object store, "directories" are of course just an illusion; they are "auto-created". For example, writing an object To summarize, I believe the following should help with both scenarios in this issue ( Assuming IAM read/write/list access to
|
Aha! First, apologies: I was using an old version ( However, that version predates the fix you mention. And indeed, I just tested with the latest version on {'fsspec': '2024.2.0', 's3fs': '2024.2.0'} And we get, for the 4 cases described earlier:
So, I think the only thing to consider would be to successfully return an empty list when doing a bkt = boto3.resource('s3').Bucket(bucket_name)
>>> list(bkt.objects.filter(Prefix='some/prefix/test-user/'))
[] |
Of course, there is no way to tell the difference between not finding a path and a "virtual empty" directory - otherwise all possible directories always exist, and that seems wrong.
This is the specific case in your workflow that was not working as expected; but this has nothing to do with pandas writing. |
Yes, this is somewhat subjective, but I understand your reticence. That said, not all possible directories would always exist: if a In the other cases, is there any actionable value in raising In any case, |
Oh, but people do, trust me! |
Say I have the following IAM policy for a user named "test-user" (as a Python dict instead of JSON, for concision):
This works just fine:
But the following code gives me
Access Denied
:If, however, I change the listing permission in IAM so the whole bucket can be listed:
Then it instantly works.
Another puzzling thing is, if I change the IAM permission back, the code above keeps working, even after starting a new Python interpreter.
Versions
The text was updated successfully, but these errors were encountered: