Skip to content

Conversation

@dominik-burdzy
Copy link
Collaborator

Add SparkHistoryDiscoveryService and supporting DTOs to automatically resolve Spark History Server URLs via Knox by probing Spark 2 and Spark 3 endpoints.

@github-actions
Copy link

github-actions bot commented Jan 26, 2026

Code Coverage Report

Overall Project 62.81% -0.04% 🍏
Files changed 97.78% 🍏

File Coverage
ClouderaClustersTask.java 100% 🍏
ClouderaCmfHostsTask.java 100% 🍏
ClouderaClusterCpuChartTask.java 100% 🍏
ClouderaApiHostsTask.java 100% 🍏
ClouderaHostRamChartTask.java 100% 🍏
ApiHostDto.java 100% 🍏
ApiYarnApplicationDto.java 100% 🍏
ApiConfigDto.java 100% 🍏
ApiHostListDto.java 100% 🍏
ApiRoleDto.java 100% 🍏
ApiServiceDto.java 100% 🍏
ApiClusterListDto.java 100% 🍏
ApiClusterDto.java 100% 🍏
ApiServiceListDto.java 100% 🍏
ApiRoleListDto.java 100% 🍏
ApiRoleConfigGroupRefDto.java 100% 🍏
ApiHostRefDto.java 100% 🍏
ApiConfigListDTO.java 100% 🍏
SparkHistoryDiscoveryService.java 97.39% -2.61% 🍏
ClouderaManagerConnector.java 95.68% 🍏
ClouderaYarnApplicationTypeTask.java 95.44% 🍏
ClouderaYarnApplicationsTask.java 93.52% 🍏
ClouderaManagerHandle.java 91.75% -1.55% 🍏
AbstractClouderaYarnApplicationTask.java 90.75% 🍏

@dominik-burdzy dominik-burdzy force-pushed the feature/b456377401/discover-spark-history branch 2 times, most recently from 0373d4b to 0ce3ab6 Compare January 26, 2026 12:32
private static final Logger logger = LoggerFactory.getLogger(SparkHistoryDiscoveryService.class);

private static final List<String> CANDIDATE_PATHS =
ImmutableList.of("spark3history", "sparkhistory");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's nice. In general we provide an option to users to override or provide the additional possible values as CLI parameters.

I think it can be useful here as well to add an additional parameter that users may use to specify their custom name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we also should create a ticket to update our public documentation :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I added customCadidatePaths to method params. I will pass user argument there in the task I will create in the next PR.

import com.fasterxml.jackson.annotation.JsonProperty;

@JsonIgnoreProperties(ignoreUnknown = true)
public class ApiConfigDTO {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to standardize our treatment of acronyms in class names. To maintain consistency, we should choose between strict PascalCase (ApiRoleDto) or preserving the acronym (APIRoleDTO). Let's pick one to avoid inconsistent naming in the future.

CC @vladislav-sidorovich

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I will refactor all of them to PascalCase.


/**
* Discovers the active Spark History Server URL. Probes Spark 3 and Spark 2 endpoints to see
* which one is alive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it be a scenario where both Spark 3 and Spark 2 are alive?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect it is a really rare case, but as you mentioned it I think it could happen when user modernizes the env and move some jobs from Spark2 to Spark3. I reimplemented it to return all possible reachable urls.

* Discovers the active Spark History Server URL. Probes Spark 3 and Spark 2 endpoints to see
* which one is alive.
*/
public Optional<String> resolveUrl(String clusterName, CloseableHttpClient knoxClient) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any active references to this method. Is it intended for use in a future CL?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. It will be used by a task parsing Spark event logs and extracting spark application metadata into jsonl.

@dominik-burdzy dominik-burdzy force-pushed the feature/b456377401/discover-spark-history branch from 0ce3ab6 to 24fbabf Compare January 27, 2026 07:22
Add SparkHistoryDiscoveryService and supporting DTOs to automatically resolve Spark History Server URLs via Knox by probing Spark 2 and Spark 3 endpoints.
@dominik-burdzy dominik-burdzy force-pushed the feature/b456377401/discover-spark-history branch from 24fbabf to 81c2257 Compare January 27, 2026 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants