HADOOP-19365. AAL support for auditing.#7723
Conversation
|
@steveloughran I am starting work for auditing support. This is how I think we should do it:
The span-ID here will actually be the ID for the We will also need to something similar for the referrer header: All of this feels a bit hacky, but I can't figure out a better way to do this. Can't think of a way to attach the S3 request in AAL to the span, so we going with this overwriting values approach. it does achieve the desired end result. Always find it a bit hard to navigate around this auditing code..let me know if you think there's a better way to do this. If you're ok with my current approach, i will work on finishing up this PR. |
|
💔 -1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
core design looks good
| import org.apache.hadoop.fs.store.LogExactlyOnce; | ||
| import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader; | ||
| import org.apache.hadoop.security.UserGroupInformation; | ||
| import software.amazon.s3.analyticsaccelerator.request.RequestAttributes; |
There was a problem hiding this comment.
now forcing this to always be on the classpath...best off just to copy them into S3AInternalAuditConstants
There was a problem hiding this comment.
annoyingly, if you create a new Execution attribute the SDK throws and exception:
No duplicate ExecutionAttribute names allowed but both ExecutionAttributes 16f8b8be and 234dd4d6 have the same name: span. ExecutionAttributes should be referenced from a shared static constant to protect against erroneous or unexpected collisions.
So to get the attribute, you need to use the shared attribute, which is why i need to take the dependency here.
doesn't make any sense to me why on a GET the attribute needs to be shared, will reach out to the SDK team.. but not much else I can do right now.
There was a problem hiding this comment.
even if it is a bug they fix, we can't bump the SDK right now. Is there some clever way to use reflection to do this binding, through our DynMethods code?
| .build(); | ||
| } | ||
|
|
||
| String spanId = executionAttributes.getAttribute(RequestAttributes.SPAN_ID); |
There was a problem hiding this comment.
these would need to go above 393 and passsed into the referrer, as done with the range and delete key size; having the range from the request would be good too.
|
We're closing this stale PR because it has been open for 100 days with no activity. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
d4af8e4 to
8be8d18
Compare
|
@steveloughran @mukund-thakur could you please review this PR to add audit support to AAL, just rebased it and tested again |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
|
🎊 +1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
commented, minor changes requested
| import org.apache.hadoop.fs.store.LogExactlyOnce; | ||
| import org.apache.hadoop.fs.store.audit.HttpReferrerAuditHeader; | ||
| import org.apache.hadoop.security.UserGroupInformation; | ||
| import software.amazon.s3.analyticsaccelerator.request.RequestAttributes; |
There was a problem hiding this comment.
even if it is a bug they fix, we can't bump the SDK right now. Is there some clever way to use reflection to do this binding, through our DynMethods code?
| // Expect 4 audited requests. One HEAD, and 3 GETs. The 3 GETs are because the read policy is WHOLE_FILE, | ||
| // in which case, AAL will start prefetching till EoF on file open in 8MB chunks. The file read here | ||
| // s3://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz, has a size of ~21MB, resulting in 3 GETS: | ||
| // [0-8388607, 8388608-16777215, 16777216-21511173]. |
There was a problem hiding this comment.
slick. These are parallel GETs aren't they?
There was a problem hiding this comment.
yup, parallel GETs
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/AWSRequestAnalyzer.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/audit/impl/LoggingAuditor.java
Show resolved
Hide resolved
| // for delete op, attach the number of files to delete | ||
| attachDeleteKeySizeAttribute(sdkRequest); | ||
|
|
||
|
|
...op-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AAnalyticsAcceleratorStreamReading.java
Show resolved
Hide resolved
|
@steveloughran updated the PR, only change is adding the javadoc and cutting the extra new line. |
|
💔 -1 overall
This message was automatically generated. |
steveloughran
left a comment
There was a problem hiding this comment.
+1
merge and backport as you desire
|
💔 -1 overall
This message was automatically generated. |
|
💔 -1 overall
This message was automatically generated. |
Backports: * HADOOP-19394. S3A: Integrate with AAL's readVectored(). (#7720) * HADOOP-19664. S3A: Analytics stream to use Java sync client. (#7909) * HADOOP-19698. S3A: Add AAL dependency to LICENSE-binary. * HADOOP-19587. S3A: Adds in support for SSE-C to AAL (#7906) * HADOOP-19365. S3A: Adds in support for auditing for AAL. (#7723)
Description of PR
Relevant AAL PR: awslabs/analytics-accelerator-s3#280
modifyHttpRequestexecutionInterceptor in the LoggingAuditor called, these are now available in theExecutionAttributesand can be used for logging.Example logs:
How was this patch tested?
Tested in eu-west-1,
mvn -Dparallel-tests -DtestsThreadCount=16 clean verifyFor code changes:
LICENSE,LICENSE-binary,NOTICE-binaryfiles?