Skip to content

external connectivity checks in health endpoint #22

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 30, 2025

Conversation

JudeNiroshan
Copy link
Collaborator

No description provided.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I've reviewed the changes. Here's a summary of potential issues and suggestions for improvement:

Review Summary:

The changes refactor the health check endpoint, switching from Strings to proper Health objects and improving resource management. Error handling and status reporting have been improved.

File-Specific Comments:

File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java

  • Improvement: Switching to org.eclipse.microprofile.health.Health objects is a significant improvement. This provides a standardized and more informative way to represent the health status of the application and its dependencies.

  • Improvement: Resource management with try-with-resources for Pipelines and PipelineRuns is excellent. It ensures resources are properly closed, preventing potential leaks.

  • Improvement: Injecting applicationVersion via @ConfigProperty is great.

  • Minor: The logic for the Tekton health check is a bit complex. It could potentially be simplified using helper functions to encapsulate the logic for checking pipelines and pipelineRuns.

  • Suggestion: The namespace log message can be simplified by using the following syntax "Tekton health check: namespace={}".

  • Minor: Consider using a more specific exception type when closing the resources to avoid masking other potential errors. For instance, catch KubernetesClientException to check for possible issues while closing Tekton resources.

  • Minor: The orElse logic could be simplified to orElse("UNKNOWN").

Here's an example of what to do:

    private HealthCheckResponse checkTektonHealth() {
        String namespace = this.namespace;
        try {
            NonNamespaceOperation<Pipeline, PipelineList, Resource<Pipeline>> pipelines = tektonClient.v1().pipelines().inNamespace(namespace);
            try (Pipelines pipelineList = pipelines.list()) {
                if (pipelineList == null) {
                    LOG.warn("Tekton health check: Pipelines list is null in namespace={}", namespace);
                    return HealthCheckResponse.down("Tekton Pipelines", "Pipelines list is null");
                }
                NonNamespaceOperation<PipelineRun, PipelineRunList, Resource<PipelineRun>> pipelineRuns = tektonClient.v1().pipelineRuns().inNamespace(namespace);
                try (PipelineRuns pipelineRunList = pipelineRuns.list()) {
                    if (pipelineRunList == null) {
                        LOG.warn("Tekton health check: PipelineRuns list is null in namespace={}", namespace);
                        return HealthCheckResponse.down("Tekton PipelineRuns", "PipelineRuns list is null");
                    }
                    LOG.info("Tekton health check: successful in namespace={}", namespace);
                    return HealthCheckResponse.up("Tekton");
                } catch (KubernetesClientException e) {
                    LOG.warn("Tekton health check: limited permissions in namespace={}", namespace, e);
                    return HealthCheckResponse.up("Tekton", "Limited Permissions");
                }
            } catch (KubernetesClientException e) {
                LOG.error("Tekton health check: failed to list pipelines in namespace={}", namespace, e);
                return HealthCheckResponse.down("Tekton Pipelines", e.getMessage());
            }
        } catch (Exception e) {
            LOG.error("Tekton health check: general failure in namespace={}", namespace, e);
            return HealthCheckResponse.down("Tekton", e.getMessage());
        }
    }
Okay, I've reviewed the changes. Here's a summary and detailed comments:

Review Summary:

The changes introduce a more robust health check endpoint, including checks for database and Tekton connectivity. It also adds application version information to the health response. Good improvements overall. However, there's room for improvement in error handling and clarity.

File-Specific Comments:

File: pom.xml

  • Improvement: The addition of the <resources> section with <filtering>true</filtering> is good. This allows placeholder values in files within the src/main/resources directory to be replaced during the build process, enabling dynamic configuration based on the build environment.

File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java

  • Clarity: The code could benefit from more descriptive variable names (e.g., instead of dbStatus, use databaseStatus).

  • Error Handling (Database): The checkDatabaseHealth() method catches SQLException but always returns "DOWN" regardless of the specific exception. It would be more informative to differentiate between connection errors, authentication failures, and other potential SQL exceptions and reflect those in the status string. Consider logging the specific exception message.

  • Error Handling (Tekton): The checkTektonHealth() method simply catches any Exception. It would be much better to catch specific exceptions that might occur when interacting with the Tekton client (e.g., connection errors, authentication errors, permission errors). Log the exception.

  • Tekton Health Check Implementation: The current Tekton health check returns "UP" if the Tekton client is injected. This only verifies that the bean was created. A more meaningful check would be to attempt a simple operation against the Tekton API (e.g., listing pipelines in the specified namespace) to ensure the client can actually communicate with the Tekton cluster.

  • Namespace Configuration: Consider adding a validation step to ensure the sast.ai.workflow.namespace configuration property is set. If it's not, the Tekton health check (if implemented as suggested above) will likely fail.

  • Status Codes: Using Response.Status.SERVICE_UNAVAILABLE is appropriate when dependencies are down.

Here's an example of improved error handling and Tekton health check:

    private String checkDatabaseHealth() {
        try (var connection = dataSource.getConnection()) {
            return "UP";
        } catch (SQLException e) {
            LOG.error("Database health check failed: " + e.getMessage(), e);
            return "DOWN"; // Or more specific status based on SQLException
        }
    }

    private String checkTektonHealth() {
        try {
            // Attempt a simple Tekton API operation
            tektonClient.v1().pipelines().inNamespace(namespace).list();
            return "UP";
        } catch (Exception e) { //Consider specific exceptions like KubernetesClientException
            LOG.error("Tekton health check failed: " + e.getMessage(), e);
            return "DOWN";
        }
    }

Okay, I've reviewed the code differences provided. Here's a summary of potential issues and suggestions for improvement:

Review Summary:

The changes introduce more detailed health checks for the database and Tekton, which is good. However, the Tekton health check includes some logic that might be better handled with more specific error handling. Also, the health check logic is string-based and lacks proper use of the Health object.

File: <source>

  • Issue: The health status is provided as a string, which could cause issues with consistency in the long run.

    • Suggestion: Return proper Health objects instead of simple Strings. This allows to also incorporate additional details.
  • Issue: The Tekton health check differentiates between not being able to list pipelines and a more narrow error when trying to access pipeline runs. The latter gets handled somewhat gracefully by returning an UP status with limited permissions, which is fine. However, if pipelines == null, this will result in a NullPointerException.

    • Suggestion: Return DOWN for pipelines == null since it is an essential part of the Tekton integration.
  • Issue: The code duplicates namespace in log messages within the checkTektonHealth method.

    • Suggestion: Extract the namespace to a local variable and reuse that variable for the log messages.
  • Issue: The code does not close resources, especially pipelines and pipelineRuns.

    • Suggestion: Properly close those resources in a finally block.
  • Issue: The applicationVersion is hardcoded as "1.0.0-SNAPSHOT".

    • Suggestion: Inject the applicationVersion as a property (e.g., via @Value("${application.version}")) for better configurability and consistency.

Next Steps:

Please address the issues above. I will then review the updated code again.

application.properties:

  • Clarity: Adding quarkus.application.version is good for identifying the application version.

  • Suggestion: Consider adding a brief comment explaining why sast.ai.workflow.namespace and sast.ai.workflow.known.false.positive.base.repo are configurable via environment variables, especially the fallback values. This will help with understanding the configuration.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thank you for providing all the file contents and summaries. Here is a final review:

Overall Assessment:

The pull request introduces a functional health check endpoint with improved error handling and component-specific status reporting. The code is generally well-structured, readable, and addresses many of the initially identified issues. However, the Tekton health check implementation and handling of limited permissions still need attention.

Key Concerns and Suggestions:

  1. Tekton Health Check Robustness: Listing pipelines might not accurately reflect the overall health of the Tekton Pipelines system. Consider alternative approaches like:

    • Querying Tekton's controller manager or webhook service.
    • Creating and running a very simple Tekton Task.
  2. Limited Permissions Handling: Returning an "up" status with a message when permissions are limited is misleading.

    • Change the status to "degraded" or "warn" to accurately reflect the state.
    • Include the namespace in the HealthResponseDto when reporting limited permissions.
  3. Exception Handling Specificity: Avoid catching generic Exception when more specific exceptions can be caught.

    • Catch InterruptedException, IOException, or Kubernetes-specific exceptions separately to provide more context.
  4. Configuration:

    • Ensure that the namespace configuration has a default value and that the application handles missing namespace values gracefully, potentially failing to start if it's essential.
  5. Testing:

    • Add unit or integration tests to cover different scenarios, including successful health checks, database connection failures, Tekton API errors, and limited permission situations.
  6. Security:

    • Be careful about exposing sensitive information (e.g., database connection strings, Tekton API keys) in log messages or health check responses. Mask or redact sensitive data appropriately. Consider role-based access control for the /health endpoint, especially if it exposes internal component statuses.

Specific Suggestions:

  • TektonController.java:

    • Replace listing Pipelines with a call to Tekton's controller manager or creating/running a simple Tekton Task.
    • Change the health status to degraded for limited permission.
    • Catch more specific exceptions instead of generic Exception.
    • Ensure namespace is included when returning a degraded state due to permission issues.
  • application.properties:

    • Add a default value for tekton.namespace.

Next Steps:

  1. Address the concerns regarding the Tekton health check implementation.
  2. Implement the suggested changes for limited permissions handling and exception specificity.
  3. Add comprehensive unit or integration tests.
  4. Review and address any security concerns related to sensitive information exposure.

By addressing these points, the pull request will significantly improve the robustness, reliability, and clarity of the health check endpoint.

Okay, I've reviewed the changes. Here's a summary and some specific comments:

Review Summary:

The changes introduce a more structured approach to health checks, including component-level health status and overall health determination. The use of static factory methods improves readability and provides a cleaner API for creating HealthResponseDto instances. The addition of component and message fields allows for more detailed health information.

File-Specific Comments:

  • pom.xml: The addition of resource filtering is good. It allows variables in src/main/resources to be replaced during the build process. No issues here.

  • src/main/java/com/redhat/sast/api/v1/dto/response/HealthResponseDto.java:

    • Clarity: The new static factory methods (up, down, overall) significantly improve the clarity of how to create HealthResponseDto objects.
    • Duplication: The initial constructor and the private constructor with parameters both initialize timestamp and dependencies. Consider having the parameterized constructor call the default constructor to avoid duplication.
    • determineOverallHealth Method: The logic is clear and concise. It correctly determines the overall health based on the status of the dependencies.
    • Immutability: The dependencies map is defensively copied in the setter, which is good. However, consider making the class immutable to further improve thread safety and predictability.
    • addDependency Method: Good to have a method for adding dependencies and their statuses. Consider checking for null or empty component names.

I'll wait for the next file diffs.

Okay, I've reviewed the code differences. Here's a summary of my findings and suggestions:

Review Summary

The changes introduce a more robust health check endpoint. Instead of hardcoding "UP" statuses, the code now attempts to connect to the database and Tekton and reflects the actual status in the response. The use of HealthResponseDto is improved, and the code is more structured. However, there are potential improvements in exception handling, configuration, and security considerations.

File-Specific Comments

File: src/main/java/com/redhat/sast/api/v1/dto/response/HealthResponseDto.java

  • Improvement: The addDependency(HealthResponseDto componentHealth) method is a nice addition for composing health check results.
  • Suggestion: Consider adding a HealthResponseDto constructor that takes component, status, and message to streamline dependency creation.

File: src/main/java/com/redhat/sast/api/v1/resource/HealthResource.java

  • Improvement: The code now checks the database and Tekton health, which is a significant improvement over the hardcoded values. The use of @ConfigProperty is also good.
  • Issue: The code catches SQLException, KubernetesClientException, and generic Exception but doesn't provide specific error messages in the HealthResponseDto. This makes debugging harder. Consider logging the exceptions with appropriate severity levels (WARN, ERROR) and including relevant details in the message field of the HealthResponseDto.
  • Suggestion: Instead of directly injecting DataSource, consider using a dedicated service/component for database health checks. This promotes better separation of concerns and testability. Inject this new service into HealthResource.
  • Suggestion: For Tekton health check, consider checking for the existence of specific resources (e.g., a known Pipeline) instead of just listing all pipelines/runs. This could provide a more targeted and reliable health indicator.
  • Suggestion: The namespace variable should have a default value defined in the @ConfigProperty annotation. If it's essential and missing, the application should probably fail to start. Consider using defaultValue = "" and then checking for an empty string. If empty, throw an IllegalStateException during application startup.
  • Suggestion: Consider using try-with-resources when working with dataSource.getConnection() to ensure resources are closed.
  • Suggestion: Add more logging, especially around the Tekton client interaction, to diagnose potential connectivity or permission issues.
  • Security Consideration: Be careful about exposing sensitive information (e.g., database connection strings, Tekton API keys) in log messages or health check responses. Mask or redact sensitive data appropriately. Consider role-based access control for the /health endpoint, especially if it exposes internal component statuses.
  • Clarity: The lines "UP".equals(health.getStatus()) and Response.ok(health).build() are unnecessarily verbose. Use health.getStatus().equals("UP") and there is no need to add .build(), just return Response.ok(health).

I will await further code differences.

Okay, I've reviewed the code differences you provided for the health check endpoint. Here's a summary of my observations and suggestions:

Review Summary:

The changes introduce a more robust health check with specific checks for database and Tekton components. Error handling is improved with more detailed logging and specific status updates. The code is generally well-structured, but there are a few areas where clarity and efficiency could be enhanced.

File: [The file name]

Here's a breakdown of specific issues and suggestions:

  • Clarity and Structure:

    • HealthResponseDto Creation: The original code created a new HealthResponseDto object within the catch block. The new code correctly reuses and populates health to avoid repeated instantiation.
    • checkDatabaseHealth() and checkTektonHealth() methods promote code clarity by extracting component-specific checks into separate functions.
  • Error Handling:

    • Exception Logging: The code now logs exceptions using LOG.errorf(e, ...) which is good practice.
    • Specific Error Messages: Returning the exception message in HealthResponseDto.down() provides valuable debugging information.
  • Performance/Efficiency:

    • Database Query: The database health check uses SELECT 1, which is a simple and efficient way to verify connectivity.
  • Security:

    • Namespace Logging: Logging the namespace during Tekton health checks is helpful for debugging and auditing purposes.
  • Suggestions:

    • Consider externalizing the health check query (SELECT 1). Although unlikely to change, externalizing it provides flexibility.
    • Consider adding metrics for health check execution time. This can help identify performance bottlenecks.
    • File Name: Please provide the file name so I can provide more specific feedback if needed.

Okay, I've reviewed the changes. Here's a summary and potential issues:

Review Summary:

The changes introduce a health check endpoint for Tekton pipelines and adds the application version to the properties file. The Tekton health check attempts to list pipelines in a given namespace and returns a health status based on the outcome. Error handling is present, with different statuses returned based on the exception type.

Potential Issues and Suggestions:

  • File: src/main/java/ai/sast/sast_ai/controller/TektonController.java

    • Issue: The health check relies on listing pipelines. While functional, this might not be the most robust or efficient way to determine Tekton's health. Listing pipelines checks the Kubernetes API server and the Tekton API, but it does not check if Tekton Pipelines controller or webhook are working. If those are down, API server and Tekton API will still work.

    • Suggestion: Consider using a more direct method to check Tekton's health, such as querying Tekton's controller manager or webhook service (if applicable) or creating a very simple Tekton task and checking to see if it runs. This could give more confidence that the system is functional.

    • Issue: Limited permissions are handled by logging a warning and returning an "up" status with a message. This might be misleading. If permissions are limited, the health check isn't fully functional.

    • Suggestion: Consider returning a "degraded" status instead of "up" with a message when permissions are limited. This more accurately reflects the state of the health check.

    • Suggestion: Include the namespace in the HealthResponseDto.

    • Issue: Catching Exception is generally discouraged.

    • Suggestion: Catch more specific exceptions if possible to handle errors more precisely. For instance, InterruptedException or IOException could be caught separately to provide more context in the error message.

  • File: src/main/resources/application.properties

    • Issue: None

Overall, the changes seem reasonable, but the Tekton health check could be more robust and the error handling could be more specific. The limited permissions case needs to be carefully considered to avoid misrepresenting the health status.

</details>

@JudeNiroshan JudeNiroshan merged commit 7e7392b into main Jun 30, 2025
3 checks passed
@JudeNiroshan JudeNiroshan deleted the improve-health-endpoint branch June 30, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant