-
Notifications
You must be signed in to change notification settings - Fork 9.1k
HADOOP-19365. AAL support for auditing. #7723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@steveloughran I am starting work for auditing support. This is how I think we should do it:
The span-ID here will actually be the ID for the We will also need to something similar for the referrer header:
All of this feels a bit hacky, but I can't figure out a better way to do this. Can't think of a way to attach the S3 request in AAL to the span, so we going with this overwriting values approach. it does achieve the desired end result. Always find it a bit hard to navigate around this auditing code..let me know if you think there's a better way to do this. If you're ok with my current approach, i will work on finishing up this PR. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
core design looks good
import org.apache.hadoop.fs.store.LogExactlyOnce; | ||
import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader; | ||
import org.apache.hadoop.security.UserGroupInformation; | ||
import software.amazon.s3.analyticsaccelerator.request.RequestAttributes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now forcing this to always be on the classpath...best off just to copy them into S3AInternalAuditConstants
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
annoyingly, if you create a new Execution attribute the SDK throws and exception:
No duplicate ExecutionAttribute names allowed but both ExecutionAttributes 16f8b8be and 234dd4d6 have the same name: span. ExecutionAttributes should be referenced from a shared static constant to protect against erroneous or unexpected collisions.
So to get the attribute, you need to use the shared attribute, which is why i need to take the dependency here.
doesn't make any sense to me why on a GET the attribute needs to be shared, will reach out to the SDK team.. but not much else I can do right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if it is a bug they fix, we can't bump the SDK right now. Is there some clever way to use reflection to do this binding, through our DynMethods code?
.build(); | ||
} | ||
|
||
String spanId = executionAttributes.getAttribute(RequestAttributes.SPAN_ID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these would need to go above 393 and passsed into the referrer, as done with the range and delete key size; having the range from the request would be good too.
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
d4af8e4
to
8be8d18
Compare
@steveloughran @mukund-thakur could you please review this PR to add audit support to AAL, just rebased it and tested again |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented, minor changes requested
import org.apache.hadoop.fs.store.LogExactlyOnce; | ||
import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader; | ||
import org.apache.hadoop.security.UserGroupInformation; | ||
import software.amazon.s3.analyticsaccelerator.request.RequestAttributes; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
even if it is a bug they fix, we can't bump the SDK right now. Is there some clever way to use reflection to do this binding, through our DynMethods code?
// Expect 4 audited requests. One HEAD, and 3 GETs. The 3 GETs are because the read policy is WHOLE_FILE, | ||
// in which case, AAL will start prefetching till EoF on file open in 8MB chunks. The file read here | ||
// s3://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz, has a size of ~21MB, resulting in 3 GETS: | ||
// [0-8388607, 8388608-16777215, 16777216-21511173]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slick. These are parallel GETs aren't they?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, parallel GETs
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AWSRequestAnalyzer.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java
Show resolved
Hide resolved
// for delete op, attach the number of files to delete | ||
attachDeleteKeySizeAttribute(sdkRequest); | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cut
...op-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java
Show resolved
Hide resolved
@steveloughran updated the PR, only change is adding the javadoc and cutting the extra new line. |
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
merge and backport as you desire
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
Description of PR
Relevant AAL PR: awslabs/analytics-accelerator-s3#280
modifyHttpRequest
executionInterceptor in the LoggingAuditor called, these are now available in theExecutionAttributes
and can be used for logging.Example logs:
How was this patch tested?
Tested in eu-west-1,
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
For code changes:
LICENSE
,LICENSE-binary
,NOTICE-binary
files?