-
Notifications
You must be signed in to change notification settings - Fork 0
external connectivity checks in health endpoint #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've reviewed the changes. Here's a summary of potential issues and suggestions for improvement:
Review Summary:
The changes refactor the health check endpoint, switching from Strings to proper Health
objects and improving resource management. Error handling and status reporting have been improved.
File-Specific Comments:
File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java
-
Improvement: Switching to org.eclipse.microprofile.health.Health
objects is a significant improvement. This provides a standardized and more informative way to represent the health status of the application and its dependencies.
-
Improvement: Resource management with try-with-resources
for Pipelines
and PipelineRuns
is excellent. It ensures resources are properly closed, preventing potential leaks.
-
Improvement: Injecting applicationVersion
via @ConfigProperty
is great.
-
Minor: The logic for the Tekton health check is a bit complex. It could potentially be simplified using helper functions to encapsulate the logic for checking pipelines and pipelineRuns.
-
Suggestion: The namespace log message can be simplified by using the following syntax "Tekton health check: namespace={}"
.
-
Minor: Consider using a more specific exception type when closing the resources to avoid masking other potential errors. For instance, catch KubernetesClientException
to check for possible issues while closing Tekton resources.
-
Minor: The orElse
logic could be simplified to orElse("UNKNOWN")
.
Here's an example of what to do:
private HealthCheckResponse checkTektonHealth() {
String namespace = this.namespace;
try {
NonNamespaceOperation<Pipeline, PipelineList, Resource<Pipeline>> pipelines = tektonClient.v1().pipelines().inNamespace(namespace);
try (Pipelines pipelineList = pipelines.list()) {
if (pipelineList == null) {
LOG.warn("Tekton health check: Pipelines list is null in namespace={}", namespace);
return HealthCheckResponse.down("Tekton Pipelines", "Pipelines list is null");
}
NonNamespaceOperation<PipelineRun, PipelineRunList, Resource<PipelineRun>> pipelineRuns = tektonClient.v1().pipelineRuns().inNamespace(namespace);
try (PipelineRuns pipelineRunList = pipelineRuns.list()) {
if (pipelineRunList == null) {
LOG.warn("Tekton health check: PipelineRuns list is null in namespace={}", namespace);
return HealthCheckResponse.down("Tekton PipelineRuns", "PipelineRuns list is null");
}
LOG.info("Tekton health check: successful in namespace={}", namespace);
return HealthCheckResponse.up("Tekton");
} catch (KubernetesClientException e) {
LOG.warn("Tekton health check: limited permissions in namespace={}", namespace, e);
return HealthCheckResponse.up("Tekton", "Limited Permissions");
}
} catch (KubernetesClientException e) {
LOG.error("Tekton health check: failed to list pipelines in namespace={}", namespace, e);
return HealthCheckResponse.down("Tekton Pipelines", e.getMessage());
}
} catch (Exception e) {
LOG.error("Tekton health check: general failure in namespace={}", namespace, e);
return HealthCheckResponse.down("Tekton", e.getMessage());
}
}
Okay, I've reviewed the changes. Here's a summary and detailed comments:
Health
objects and improving resource management. Error handling and status reporting have been improved.src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java
Improvement: Switching to org.eclipse.microprofile.health.Health
objects is a significant improvement. This provides a standardized and more informative way to represent the health status of the application and its dependencies.
Improvement: Resource management with try-with-resources
for Pipelines
and PipelineRuns
is excellent. It ensures resources are properly closed, preventing potential leaks.
Improvement: Injecting applicationVersion
via @ConfigProperty
is great.
Minor: The logic for the Tekton health check is a bit complex. It could potentially be simplified using helper functions to encapsulate the logic for checking pipelines and pipelineRuns.
Suggestion: The namespace log message can be simplified by using the following syntax "Tekton health check: namespace={}"
.
Minor: Consider using a more specific exception type when closing the resources to avoid masking other potential errors. For instance, catch KubernetesClientException
to check for possible issues while closing Tekton resources.
Minor: The orElse
logic could be simplified to orElse("UNKNOWN")
.
private HealthCheckResponse checkTektonHealth() {
String namespace = this.namespace;
try {
NonNamespaceOperation<Pipeline, PipelineList, Resource<Pipeline>> pipelines = tektonClient.v1().pipelines().inNamespace(namespace);
try (Pipelines pipelineList = pipelines.list()) {
if (pipelineList == null) {
LOG.warn("Tekton health check: Pipelines list is null in namespace={}", namespace);
return HealthCheckResponse.down("Tekton Pipelines", "Pipelines list is null");
}
NonNamespaceOperation<PipelineRun, PipelineRunList, Resource<PipelineRun>> pipelineRuns = tektonClient.v1().pipelineRuns().inNamespace(namespace);
try (PipelineRuns pipelineRunList = pipelineRuns.list()) {
if (pipelineRunList == null) {
LOG.warn("Tekton health check: PipelineRuns list is null in namespace={}", namespace);
return HealthCheckResponse.down("Tekton PipelineRuns", "PipelineRuns list is null");
}
LOG.info("Tekton health check: successful in namespace={}", namespace);
return HealthCheckResponse.up("Tekton");
} catch (KubernetesClientException e) {
LOG.warn("Tekton health check: limited permissions in namespace={}", namespace, e);
return HealthCheckResponse.up("Tekton", "Limited Permissions");
}
} catch (KubernetesClientException e) {
LOG.error("Tekton health check: failed to list pipelines in namespace={}", namespace, e);
return HealthCheckResponse.down("Tekton Pipelines", e.getMessage());
}
} catch (Exception e) {
LOG.error("Tekton health check: general failure in namespace={}", namespace, e);
return HealthCheckResponse.down("Tekton", e.getMessage());
}
}
Review Summary:
The changes introduce a more robust health check endpoint, including checks for database and Tekton connectivity. It also adds application version information to the health response. Good improvements overall. However, there's room for improvement in error handling and clarity.
File-Specific Comments:
File: pom.xml
- Improvement: The addition of the
<resources>
section with<filtering>true</filtering>
is good. This allows placeholder values in files within thesrc/main/resources
directory to be replaced during the build process, enabling dynamic configuration based on the build environment.
File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java
-
Clarity: The code could benefit from more descriptive variable names (e.g., instead of
dbStatus
, usedatabaseStatus
). -
Error Handling (Database): The
checkDatabaseHealth()
method catchesSQLException
but always returns "DOWN" regardless of the specific exception. It would be more informative to differentiate between connection errors, authentication failures, and other potential SQL exceptions and reflect those in the status string. Consider logging the specific exception message. -
Error Handling (Tekton): The
checkTektonHealth()
method simply catches anyException
. It would be much better to catch specific exceptions that might occur when interacting with the Tekton client (e.g., connection errors, authentication errors, permission errors). Log the exception. -
Tekton Health Check Implementation: The current Tekton health check returns "UP" if the Tekton client is injected. This only verifies that the bean was created. A more meaningful check would be to attempt a simple operation against the Tekton API (e.g., listing pipelines in the specified namespace) to ensure the client can actually communicate with the Tekton cluster.
-
Namespace Configuration: Consider adding a validation step to ensure the
sast.ai.workflow.namespace
configuration property is set. If it's not, the Tekton health check (if implemented as suggested above) will likely fail. -
Status Codes: Using
Response.Status.SERVICE_UNAVAILABLE
is appropriate when dependencies are down.
Here's an example of improved error handling and Tekton health check:
private String checkDatabaseHealth() {
try (var connection = dataSource.getConnection()) {
return "UP";
} catch (SQLException e) {
LOG.error("Database health check failed: " + e.getMessage(), e);
return "DOWN"; // Or more specific status based on SQLException
}
}
private String checkTektonHealth() {
try {
// Attempt a simple Tekton API operation
tektonClient.v1().pipelines().inNamespace(namespace).list();
return "UP";
} catch (Exception e) { //Consider specific exceptions like KubernetesClientException
LOG.error("Tekton health check failed: " + e.getMessage(), e);
return "DOWN";
}
}
Okay, I've reviewed the code differences provided. Here's a summary of potential issues and suggestions for improvement:
Review Summary:
The changes introduce more detailed health checks for the database and Tekton, which is good. However, the Tekton health check includes some logic that might be better handled with more specific error handling. Also, the health check logic is string-based and lacks proper use of the Health
object.
File: <source>
-
Issue: The health status is provided as a string, which could cause issues with consistency in the long run.
- Suggestion: Return proper
Health
objects instead of simple Strings. This allows to also incorporate additional details.
- Suggestion: Return proper
-
Issue: The Tekton health check differentiates between not being able to list pipelines and a more narrow error when trying to access pipeline runs. The latter gets handled somewhat gracefully by returning an UP status with limited permissions, which is fine. However, if
pipelines == null
, this will result in aNullPointerException
.- Suggestion: Return
DOWN
forpipelines == null
since it is an essential part of the Tekton integration.
- Suggestion: Return
-
Issue: The code duplicates
namespace
in log messages within thecheckTektonHealth
method.- Suggestion: Extract the namespace to a local variable and reuse that variable for the log messages.
-
Issue: The code does not close resources, especially
pipelines
andpipelineRuns
.- Suggestion: Properly close those resources in a
finally
block.
- Suggestion: Properly close those resources in a
-
Issue: The
applicationVersion
is hardcoded as"1.0.0-SNAPSHOT"
.- Suggestion: Inject the
applicationVersion
as a property (e.g., via@Value("${application.version}")
) for better configurability and consistency.
- Suggestion: Inject the
Next Steps:
Please address the issues above. I will then review the updated code again.
application.properties
:
-
Clarity: Adding
quarkus.application.version
is good for identifying the application version. -
Suggestion: Consider adding a brief comment explaining why
sast.ai.workflow.namespace
andsast.ai.workflow.known.false.positive.base.repo
are configurable via environment variables, especially the fallback values. This will help with understanding the configuration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thank you for providing all the file contents and summaries. Here is a final review:
Overall Assessment:
The pull request introduces a functional health check endpoint with improved error handling and component-specific status reporting. The code is generally well-structured, readable, and addresses many of the initially identified issues. However, the Tekton health check implementation and handling of limited permissions still need attention.
Key Concerns and Suggestions:
-
Tekton Health Check Robustness: Listing pipelines might not accurately reflect the overall health of the Tekton Pipelines system. Consider alternative approaches like:
- Querying Tekton's controller manager or webhook service.
- Creating and running a very simple Tekton Task.
-
Limited Permissions Handling: Returning an "up" status with a message when permissions are limited is misleading.
- Change the status to "degraded" or "warn" to accurately reflect the state.
- Include the namespace in the
HealthResponseDto
when reporting limited permissions.
-
Exception Handling Specificity: Avoid catching generic Exception
when more specific exceptions can be caught.
- Catch
InterruptedException
, IOException
, or Kubernetes-specific exceptions separately to provide more context.
-
Configuration:
- Ensure that the
namespace
configuration has a default value and that the application handles missing namespace values gracefully, potentially failing to start if it's essential.
-
Testing:
- Add unit or integration tests to cover different scenarios, including successful health checks, database connection failures, Tekton API errors, and limited permission situations.
-
Security:
- Be careful about exposing sensitive information (e.g., database connection strings, Tekton API keys) in log messages or health check responses. Mask or redact sensitive data appropriately. Consider role-based access control for the
/health
endpoint, especially if it exposes internal component statuses.
Specific Suggestions:
-
TektonController.java
:
- Replace
listing Pipelines
with a call to Tekton's controller manager
or creating/running a simple Tekton Task.
- Change the health status to
degraded
for limited permission.
- Catch more specific exceptions instead of generic
Exception
.
- Ensure namespace is included when returning a degraded state due to permission issues.
-
application.properties
:
- Add a default value for
tekton.namespace
.
Next Steps:
- Address the concerns regarding the Tekton health check implementation.
- Implement the suggested changes for limited permissions handling and exception specificity.
- Add comprehensive unit or integration tests.
- Review and address any security concerns related to sensitive information exposure.
By addressing these points, the pull request will significantly improve the robustness, reliability, and clarity of the health check endpoint.
Okay, I've reviewed the changes. Here's a summary and some specific comments:
Tekton Health Check Robustness: Listing pipelines might not accurately reflect the overall health of the Tekton Pipelines system. Consider alternative approaches like:
- Querying Tekton's controller manager or webhook service.
- Creating and running a very simple Tekton Task.
Limited Permissions Handling: Returning an "up" status with a message when permissions are limited is misleading.
- Change the status to "degraded" or "warn" to accurately reflect the state.
- Include the namespace in the
HealthResponseDto
when reporting limited permissions.
Exception Handling Specificity: Avoid catching generic Exception
when more specific exceptions can be caught.
- Catch
InterruptedException
,IOException
, or Kubernetes-specific exceptions separately to provide more context.
Configuration:
- Ensure that the
namespace
configuration has a default value and that the application handles missing namespace values gracefully, potentially failing to start if it's essential.
Testing:
- Add unit or integration tests to cover different scenarios, including successful health checks, database connection failures, Tekton API errors, and limited permission situations.
Security:
- Be careful about exposing sensitive information (e.g., database connection strings, Tekton API keys) in log messages or health check responses. Mask or redact sensitive data appropriately. Consider role-based access control for the
/health
endpoint, especially if it exposes internal component statuses.
TektonController.java
:
- Replace
listing Pipelines
with a call toTekton's controller manager
or creating/running a simple Tekton Task. - Change the health status to
degraded
for limited permission. - Catch more specific exceptions instead of generic
Exception
. - Ensure namespace is included when returning a degraded state due to permission issues.
application.properties
:
- Add a default value for
tekton.namespace
.
Review Summary:
The changes introduce a more structured approach to health checks, including component-level health status and overall health determination. The use of static factory methods improves readability and provides a cleaner API for creating HealthResponseDto
instances. The addition of component
and message
fields allows for more detailed health information.
File-Specific Comments:
-
pom.xml: The addition of resource filtering is good. It allows variables in
src/main/resources
to be replaced during the build process. No issues here. -
src/main/java/com/redhat/sast/api/v1/dto/response/HealthResponseDto.java:
- Clarity: The new static factory methods (
up
,down
,overall
) significantly improve the clarity of how to createHealthResponseDto
objects. - Duplication: The initial constructor and the private constructor with parameters both initialize
timestamp
anddependencies
. Consider having the parameterized constructor call the default constructor to avoid duplication. determineOverallHealth
Method: The logic is clear and concise. It correctly determines the overall health based on the status of the dependencies.- Immutability: The
dependencies
map is defensively copied in the setter, which is good. However, consider making the class immutable to further improve thread safety and predictability. addDependency
Method: Good to have a method for adding dependencies and their statuses. Consider checking for null or empty component names.
- Clarity: The new static factory methods (
I'll wait for the next file diffs.
Okay, I've reviewed the code differences. Here's a summary of my findings and suggestions:
Review Summary
The changes introduce a more robust health check endpoint. Instead of hardcoding "UP" statuses, the code now attempts to connect to the database and Tekton and reflects the actual status in the response. The use of HealthResponseDto
is improved, and the code is more structured. However, there are potential improvements in exception handling, configuration, and security considerations.
File-Specific Comments
File: src/main/java/com/redhat/sast/api/v1/dto/response/HealthResponseDto.java
- Improvement: The
addDependency(HealthResponseDto componentHealth)
method is a nice addition for composing health check results. - Suggestion: Consider adding a
HealthResponseDto
constructor that takescomponent
,status
, andmessage
to streamline dependency creation.
File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java
- Improvement: The code now checks the database and Tekton health, which is a significant improvement over the hardcoded values. The use of
@ConfigProperty
is also good. - Issue: The code catches
SQLException
,KubernetesClientException
, and genericException
but doesn't provide specific error messages in theHealthResponseDto
. This makes debugging harder. Consider logging the exceptions with appropriate severity levels (WARN, ERROR) and including relevant details in themessage
field of theHealthResponseDto
. - Suggestion: Instead of directly injecting
DataSource
, consider using a dedicated service/component for database health checks. This promotes better separation of concerns and testability. Inject this new service intoHealthResource
. - Suggestion: For Tekton health check, consider checking for the existence of specific resources (e.g., a known Pipeline) instead of just listing all pipelines/runs. This could provide a more targeted and reliable health indicator.
- Suggestion: The
namespace
variable should have a default value defined in the@ConfigProperty
annotation. If it's essential and missing, the application should probably fail to start. Consider usingdefaultValue = ""
and then checking for an empty string. If empty, throw anIllegalStateException
during application startup. - Suggestion: Consider using try-with-resources when working with
dataSource.getConnection()
to ensure resources are closed. - Suggestion: Add more logging, especially around the Tekton client interaction, to diagnose potential connectivity or permission issues.
- Security Consideration: Be careful about exposing sensitive information (e.g., database connection strings, Tekton API keys) in log messages or health check responses. Mask or redact sensitive data appropriately. Consider role-based access control for the
/health
endpoint, especially if it exposes internal component statuses. - Clarity: The lines
"UP".equals(health.getStatus())
andResponse.ok(health).build()
are unnecessarily verbose. Usehealth.getStatus().equals("UP")
and there is no need to add.build()
, just returnResponse.ok(health)
.
I will await further code differences.
Okay, I've reviewed the code differences you provided for the health check endpoint. Here's a summary of my observations and suggestions:
Review Summary:
The changes introduce a more robust health check with specific checks for database and Tekton components. Error handling is improved with more detailed logging and specific status updates. The code is generally well-structured, but there are a few areas where clarity and efficiency could be enhanced.
File: [The file name]
Here's a breakdown of specific issues and suggestions:
-
Clarity and Structure:
HealthResponseDto
Creation: The original code created a newHealthResponseDto
object within thecatch
block. The new code correctly reuses and populates health to avoid repeated instantiation.checkDatabaseHealth()
andcheckTektonHealth()
methods promote code clarity by extracting component-specific checks into separate functions.
-
Error Handling:
- Exception Logging: The code now logs exceptions using
LOG.errorf(e, ...)
which is good practice. - Specific Error Messages: Returning the exception message in
HealthResponseDto.down()
provides valuable debugging information.
- Exception Logging: The code now logs exceptions using
-
Performance/Efficiency:
- Database Query: The database health check uses
SELECT 1
, which is a simple and efficient way to verify connectivity.
- Database Query: The database health check uses
-
Security:
- Namespace Logging: Logging the namespace during Tekton health checks is helpful for debugging and auditing purposes.
-
Suggestions:
- Consider externalizing the health check query (
SELECT 1
). Although unlikely to change, externalizing it provides flexibility. - Consider adding metrics for health check execution time. This can help identify performance bottlenecks.
- File Name: Please provide the file name so I can provide more specific feedback if needed.
- Consider externalizing the health check query (
Okay, I've reviewed the changes. Here's a summary and potential issues:
Review Summary:
The changes introduce a health check endpoint for Tekton pipelines and adds the application version to the properties file. The Tekton health check attempts to list pipelines in a given namespace and returns a health status based on the outcome. Error handling is present, with different statuses returned based on the exception type.
Potential Issues and Suggestions:
-
File:
src/main/java/ai/sast/sast_ai/controller/TektonController.java
-
Issue: The health check relies on listing pipelines. While functional, this might not be the most robust or efficient way to determine Tekton's health. Listing pipelines checks the Kubernetes API server and the Tekton API, but it does not check if Tekton Pipelines controller or webhook are working. If those are down, API server and Tekton API will still work.
-
Suggestion: Consider using a more direct method to check Tekton's health, such as querying Tekton's controller manager or webhook service (if applicable) or creating a very simple Tekton task and checking to see if it runs. This could give more confidence that the system is functional.
-
Issue: Limited permissions are handled by logging a warning and returning an "up" status with a message. This might be misleading. If permissions are limited, the health check isn't fully functional.
-
Suggestion: Consider returning a "degraded" status instead of "up" with a message when permissions are limited. This more accurately reflects the state of the health check.
-
Suggestion: Include the namespace in the HealthResponseDto.
-
Issue: Catching
Exception
is generally discouraged. -
Suggestion: Catch more specific exceptions if possible to handle errors more precisely. For instance,
InterruptedException
orIOException
could be caught separately to provide more context in the error message.
-
-
File:
src/main/resources/application.properties
- Issue: None
Overall, the changes seem reasonable, but the Tekton health check could be more robust and the error handling could be more specific. The limited permissions case needs to be carefully considered to avoid misrepresenting the health status.
</details>
No description provided.