[b456377401] Cache Spark YARN applications #1029

dominik-burdzy · 2025-11-19T12:59:28Z

Collected Spark YARN applications need to be cached as they will be needed in the upcoming task connecting to Spark Server History and fetching their source and used Spark version from event logs.
Make ClouderaAPIHostsTask as the only source of truth for caching hosts as this is more reliable than CMF endpoint. It simplifies the logic and make it less error-prone.
Move common methods used in multiple tests into utils class.

1. Collected Spark YARN applications need to be cached as they will be needed in the upcoming task connecting to Spark Server History and fetching their source and used Spark version from event logs. 2. Make ClouderaAPIHostsTask as the only source of truth for caching hosts as this is more reliable than CMF endpoint. It simplifies the logic and make it less error-prone. 3. Move common methods used in multiple tests into utils class.

github-actions · 2025-11-20T10:23:48Z

Code Coverage Report

Overall Project	61.91%	🍏
Files changed	100%	🍏

File	Coverage
YarnApplicationType.java	100%	🍏
ClouderaAPIHostsTask.java	100%	🍏
ClouderaCMFHostsTask.java	100%	🍏
ClouderaYarnApplicationTypeTask.java	95.44%	🍏
ClouderaManagerHandle.java	90.7%	🍏

vladislav-sidorovich · 2025-11-23T10:08:36Z

.../edwmigration/dumper/application/dumper/connector/cloudera/manager/ClouderaCMFHostsTask.java

      }
    }
-
-    handle.initHostsIfNull(hosts);


Ok, such changes make sense.

vladislav-sidorovich · 2025-11-23T10:09:32Z

...edwmigration/dumper/application/dumper/connector/cloudera/manager/ClouderaManagerHandle.java

  }

-  public synchronized void initHostsIfNull(List<ClouderaHostDTO> hosts) {
-    // Todo


What this todo is about?

I think it was about commented preconditions below - uncommented caused some tests failing. I fixed TODO by changing the implementation to have a single source of truth for caching hosts and by improving tests.

vladislav-sidorovich · 2025-11-23T10:12:15Z

...edwmigration/dumper/application/dumper/connector/cloudera/manager/ClouderaManagerHandle.java

  }
+
+  @AutoValue
+  public abstract static class ClouderaYarnApplicationDTO {


Please move it to a separate class in com.google.edwmigration.dumper.application.dumper.connector.cloudera.manager.dto; for code consistency.

All classes like this are placed here: you can check ClouderaClusterDTO and ClouderaHostDTO defined above. dto package is about dtos representing Cloudera Manager responses. For me, classes placed here are more like "models" as they are used in business logic. What do you think about moving all of them to "model" package and renaming?

vladislav-sidorovich · 2025-11-23T10:21:04Z

...igration/dumper/application/dumper/connector/cloudera/manager/model/YarnApplicationType.java


-import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
-import com.fasterxml.jackson.annotation.JsonProperty;
+public enum YarnApplicationType {


If we want to extract it into the separate we should not lose the meaning of known or predefined types. Also Java doc with explanation will help.

What is the reason to extract it? I mean to have default types in specific task looks good and reasonable. Such approach has limited scope and easy to support.

The main reason is about using "SPARK" enum here. I think it's better to compare it to something what is already defined than to pure String.

If you prefer to keep it in the task, maybe I could create a private enum class there and keep predefinedAppTypes var as YarnApplicationType.values()?

dominik-burdzy requested review from kajgol, vladislav-sidorovich and zaldis November 19, 2025 12:59

dominik-burdzy requested a review from shevek-google as a code owner November 19, 2025 12:59

dominik-burdzy marked this pull request as draft November 19, 2025 13:02

dominik-burdzy force-pushed the feature/b456377401/cache-spark-yarn-apps branch from 21441db to a572910 Compare November 20, 2025 10:22

dominik-burdzy marked this pull request as ready for review November 20, 2025 10:33

vladislav-sidorovich reviewed Nov 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[b456377401] Cache Spark YARN applications #1029

[b456377401] Cache Spark YARN applications #1029

Uh oh!

dominik-burdzy commented Nov 19, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

vladislav-sidorovich Nov 23, 2025

Uh oh!

vladislav-sidorovich Nov 23, 2025

Uh oh!

dominik-burdzy Nov 24, 2025

Uh oh!

vladislav-sidorovich Nov 23, 2025

Uh oh!

dominik-burdzy Nov 24, 2025

Uh oh!

vladislav-sidorovich Nov 23, 2025

Uh oh!

dominik-burdzy Nov 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[b456377401] Cache Spark YARN applications #1029

Are you sure you want to change the base?

[b456377401] Cache Spark YARN applications #1029

Uh oh!

Conversation

dominik-burdzy commented Nov 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 20, 2025

Code Coverage Report

Uh oh!

vladislav-sidorovich Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

vladislav-sidorovich Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

dominik-burdzy Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

vladislav-sidorovich Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

dominik-burdzy Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

vladislav-sidorovich Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

dominik-burdzy Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dominik-burdzy commented Nov 19, 2025 •

edited

Loading