Skip to content

Conversation

@dominik-burdzy
Copy link
Collaborator

@dominik-burdzy dominik-burdzy commented Nov 19, 2025

  1. Collected Spark YARN applications need to be cached as they will be needed in the upcoming task connecting to Spark Server History and fetching their source and used Spark version from event logs.
  2. Make ClouderaAPIHostsTask as the only source of truth for caching hosts as this is more reliable than CMF endpoint. It simplifies the logic and make it less error-prone.
  3. Move common methods used in multiple tests into utils class.

1. Collected Spark YARN applications need to be cached as they will be needed in the upcoming task connecting to Spark Server History and fetching their source and used Spark version from event logs.
2. Make ClouderaAPIHostsTask as the only source of truth for caching hosts as this is more reliable than CMF endpoint. It simplifies the logic and make it less error-prone.
3. Move common methods used in multiple tests into utils class.
@dominik-burdzy dominik-burdzy force-pushed the feature/b456377401/cache-spark-yarn-apps branch from 21441db to a572910 Compare November 20, 2025 10:22
@github-actions
Copy link

Code Coverage Report

Overall Project 61.91% 🍏
Files changed 100% 🍏

File Coverage
YarnApplicationType.java 100% 🍏
ClouderaAPIHostsTask.java 100% 🍏
ClouderaCMFHostsTask.java 100% 🍏
ClouderaYarnApplicationTypeTask.java 95.44% 🍏
ClouderaManagerHandle.java 90.7% 🍏

@dominik-burdzy dominik-burdzy marked this pull request as ready for review November 20, 2025 10:33
}
}

handle.initHostsIfNull(hosts);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, such changes make sense.

}

public synchronized void initHostsIfNull(List<ClouderaHostDTO> hosts) {
// Todo

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What this todo is about?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it was about commented preconditions below - uncommented caused some tests failing. I fixed TODO by changing the implementation to have a single source of truth for caching hosts and by improving tests.

}

@AutoValue
public abstract static class ClouderaYarnApplicationDTO {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move it to a separate class in com.google.edwmigration.dumper.application.dumper.connector.cloudera.manager.dto; for code consistency.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All classes like this are placed here: you can check ClouderaClusterDTO and ClouderaHostDTO defined above. dto package is about dtos representing Cloudera Manager responses. For me, classes placed here are more like "models" as they are used in business logic. What do you think about moving all of them to "model" package and renaming?


import com.fasterxml.jackson.annotation.JsonIgnoreProperties;
import com.fasterxml.jackson.annotation.JsonProperty;
public enum YarnApplicationType {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to extract it into the separate we should not lose the meaning of known or predefined types. Also Java doc with explanation will help.

What is the reason to extract it? I mean to have default types in specific task looks good and reasonable. Such approach has limited scope and easy to support.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason is about using "SPARK" enum here. I think it's better to compare it to something what is already defined than to pure String.

If you prefer to keep it in the task, maybe I could create a private enum class there and keep predefinedAppTypes var as YarnApplicationType.values()?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants