Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add build option for exposing self-test failure messages #433

Merged
merged 48 commits into from
Mar 7, 2025

Conversation

WillChilds-Klein
Copy link
Contributor

@WillChilds-Klein WillChilds-Klein commented Feb 17, 2025

Issue #, if available: ACCP-129

THIS FEATURE IS NOT INTENDED FOR PRODUCTION AND IS THUS LEFT UNDOCUMENTED IN OUR README

Notes

This PR builds on prior work from @amirhosv on the fips-experimentation branch to provide an alternative mode of handling AWS-LC self test failures. We call this (non-default) mode FIPS_SELF_TEST_SKIP_ABORT, and it is only usable when ACCP is built in FIPS mode.

By default, AWS-LC will abort() on self-test failures. However, as of AWS-LC v1.47, when built with the AWSLC_FIPS_FAILURE_CALLBACK build flag AWS-LC will call a weak symbol AWS_LC_fips_failure_callback function to handle self test failures instead of aborting. When ACCP is built with -DFIPS_SELF_TEST_SKIP_ABORT, ACCP defines AWS_LC_fips_failure_callback such that it appends to a queue of error strings in ACCP's native heap. To manage the accumulated error strings, we add a native std::vector wrapper ConcurrentStringVector providing a minimal, threadsafe API. Once the error queue is non-empty, all subsequent getInstance calls on an algorithm provided by ACCP will throw FipsStatusException.

We provide two functions for callers to query fips self test error state on an AmazonCorrettoCryptoProvider instance:

  • public boolean isFipsStatusOk()
  • public List<String> getFipsSelfTestFailures()

We could get away with only the latter function, but we provide isFipsStatusOk() to avoid performance costs of copying error strings over the JNI.

Testing

To adequately test the new mode, we need to build AWS-LC with FIPS_BREAK_TEST=TESTS to programmatically break AWS-LC's pairwise consistency tests (PCTs). We test against each available PCT breakage during key generation by setting the appropriate environment variable, indicating which PCT to break. Unfortunately, Java doesn't appear to have a standard utility for manipulating process environment variables at runtime, so we create our own TestUtil.setEnv that calls POSIX setenv/unsetenv over the JNI.

In addition to CI tests, I've also executed run_accp_basic_tests.sh with the new --fips-self-test-failure-no-abort flag, which will eventually be incorporated into our GitHub CI.

$ TEST_JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto.x86_64 ./tests/ci/run_accp_basic_tests.sh --fips-self-test-failure-no-abort
...
BUILD SUCCESSFUL in 40m 15s
18 actionable tasks: 13 executed, 5 up-to-date

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@WillChilds-Klein WillChilds-Klein changed the title [DRAFT] Self test failure [DRAFT] Add build option for exposing self-test failure messages Feb 17, 2025
private native int fipsStatusErrorCount();

/**
* @return true if and only if the underlying libcrypto library's FIPS related checks pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you guarantee that there are in fact 0 errors rather than we simply haven't run the self-test yet? There's no logic here to guard against returning a false-positive from an untested module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run AWS-LC's self-tests when ACCP's INSTANCE is constructed in a java static block. Those self-tests get executed in a threadpool, so there's a very slight potential for a race, but we do run AWS-LC self-tests immediately on load.

Copy link
Contributor Author

@WillChilds-Klein WillChilds-Klein Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I guess we could checkTests() here, but that might be expensive.

EDIT: no, this is a function in ACCPService, not AmazonCorrettoCryptoProvider...

Copy link
Contributor Author

@WillChilds-Klein WillChilds-Klein Mar 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, even better, I can just check testsPassed!

EDIT: no, this is a function in ACCPService, not AmazonCorrettoCryptoProvider...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up going with getSelfTestStatus().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geedo0 -- looking into this a little more, this behavioral change will break callers (and our tests) when SkipAbort is enabled. we now check isFIpsStatusOk() in every call to ACCPService.newInstance, so unless the caller busy-polls getSelfTestStatus() != SelfTestStatus.NOT_RUN before using the provider, calls to newInstance will throw an exception until the power-on self tests have completed.

we could busy-poll that status in this method, and maybe that's OK because this is a very specialized build option, but that seems like a recipe for strange behavior. but maybe it's favorable in order to preserve current semantics? removing this check basically changes the semantics of true in this method from "self-tests have affirmatively passed" to "no self-test failures have occurred".

i removed this check in 34faca36e, but perhaps busy-polling in isFipsStatusOk is the better approach.

please let me know what you think.

Copy link
Contributor Author

@WillChilds-Klein WillChilds-Klein Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked offline with @geedo0 and we convinced ourselves that busy polling in isFipsStatusOk is the way to go so we can guarantee we don't provide any crypto algorithms until the self tests have affirmatively passed.

Implemented busy poll with a deadline in 6d9a71a0c.

A more general solution would be to block the main thread here, but there's a scary comment advising against that. I'll do some experiments out-of-band of this PR.

@@ -157,6 +159,7 @@ static String getProperty(String propertyName, String def) {
PROVIDER_VERSION = oldVersion;
FIPS_BUILD = available && isFipsMode();
EXPERIMENTAL_FIPS_BUILD = available && isExperimentalFipsMode();
FIPS_SELF_TEST_SKIP_ABORT = available && isFipsSelfTestFailureSkipAbort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic here is confusing me and a bit concerning. Why should all these be be false if the module is unavailable? I worry that a library load failure might break our assertions and slip through the detection cracks of failing loudly. It's happened before with our customers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was pattern-matching here, so can't speak to original intent. I think it might be reasonable to throw a RuntimeException if library loading fails, but that seems out of scope for this change.

geedo0
geedo0 previously approved these changes Mar 6, 2025
geedo0
geedo0 previously approved these changes Mar 7, 2025
@WillChilds-Klein WillChilds-Klein requested a review from geedo0 March 7, 2025 18:41
@WillChilds-Klein WillChilds-Klein enabled auto-merge (squash) March 7, 2025 20:55
Copy link
Contributor

@alexw91 alexw91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small nit, otherwise LGTM

@WillChilds-Klein WillChilds-Klein merged commit 3abb411 into corretto:main Mar 7, 2025
11 checks passed
@WillChilds-Klein WillChilds-Klein deleted the self-test-failure branch March 7, 2025 22:23
chockalingamc pushed a commit to chockalingamc/amazon-corretto-crypto-provider that referenced this pull request Mar 12, 2025
*Issue #, if available:* ACCP-129

_**THIS FEATURE IS NOT INTENDED FOR PRODUCTION AND IS THUS LEFT
UNDOCUMENTED IN OUR README**_

## Notes

This PR builds on prior work from @amirhosv on the
`fips-experimentation` branch to provide an alternative mode of handling
AWS-LC self test failures. We call this (non-default) mode
`FIPS_SELF_TEST_SKIP_ABORT`, and it is only usable when ACCP is built in
FIPS mode.

By default, AWS-LC will `abort()` on self-test failures. However, as of
AWS-LC v1.47, when built with the `AWSLC_FIPS_FAILURE_CALLBACK` build
flag AWS-LC will [call][1] a [weak symbol][2]
`AWS_LC_fips_failure_callback` function to handle self test failures
instead of aborting. When ACCP is built with
`-DFIPS_SELF_TEST_SKIP_ABORT`, ACCP defines
`AWS_LC_fips_failure_callback` such that it appends to a queue of error
strings in ACCP's native heap. To manage the accumulated error strings,
we add a native `std::vector` wrapper `ConcurrentStringVector` providing
a minimal, threadsafe API. Once the error queue is non-empty, all
subsequent `getInstance` calls on an algorithm provided by ACCP will
throw `FipsStatusException`.

We provide two functions for callers to query fips self test error state
on an `AmazonCorrettoCryptoProvider` instance:

- `public boolean isFipsStatusOk()`
- `public List<String> getFipsSelfTestFailures()`

We could get away with only the latter function, but we provide
`isFipsStatusOk()` to avoid performance costs of copying error strings
over the JNI.


[1]:
https://github.com/aws/aws-lc/blob/1d8b807ed1ae75c89beda6c73a4ae27c404fa46f/crypto/fipsmodule/bcm.c#L416
[2]:
https://github.com/aws/aws-lc/blob/1d8b807ed1ae75c89beda6c73a4ae27c404fa46f/crypto/internal.h#L1427-L1432

## Testing

To adequately test the new mode, we need to build AWS-LC with
`FIPS_BREAK_TEST=TESTS` to programmatically break AWS-LC's pairwise
consistency tests (PCTs). We test against each available PCT breakage
during key generation by setting the appropriate environment variable,
indicating which PCT to break. Unfortunately, Java doesn't appear to
have a standard utility for manipulating process environment variables
at runtime, so we create our own `TestUtil.setEnv` that calls POSIX
`setenv`/`unsetenv` over the JNI.

In addition to CI tests, I've also executed `run_accp_basic_tests.sh`
with the new `--fips-self-test-failure-no-abort` flag, which will
eventually be incorporated into our GitHub CI.

```
$ TEST_JAVA_HOME=/usr/lib/jvm/java-17-amazon-corretto.x86_64 ./tests/ci/run_accp_basic_tests.sh --fips-self-test-failure-no-abort
...
BUILD SUCCESSFUL in 40m 15s
18 actionable tasks: 13 executed, 5 up-to-date
```

---

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: Amir Vakili <[email protected]>
Co-authored-by: Amir Vakili <[email protected]>
Co-authored-by: Gerardo Ravago 🇵🇭 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants