-
Notifications
You must be signed in to change notification settings - Fork 63
[b456377401] Cache Spark YARN applications #1029
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1. Collected Spark YARN applications need to be cached as they will be needed in the upcoming task connecting to Spark Server History and fetching their source and used Spark version from event logs. 2. Make ClouderaAPIHostsTask as the only source of truth for caching hosts as this is more reliable than CMF endpoint. It simplifies the logic and make it less error-prone. 3. Move common methods used in multiple tests into utils class.
21441db to
a572910
Compare
Code Coverage Report
|
| } | ||
| } | ||
|
|
||
| handle.initHostsIfNull(hosts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, such changes make sense.
| } | ||
|
|
||
| public synchronized void initHostsIfNull(List<ClouderaHostDTO> hosts) { | ||
| // Todo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this todo is about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was about commented preconditions below - uncommented caused some tests failing. I fixed TODO by changing the implementation to have a single source of truth for caching hosts and by improving tests.
| } | ||
|
|
||
| @AutoValue | ||
| public abstract static class ClouderaYarnApplicationDTO { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move it to a separate class in com.google.edwmigration.dumper.application.dumper.connector.cloudera.manager.dto; for code consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All classes like this are placed here: you can check ClouderaClusterDTO and ClouderaHostDTO defined above. dto package is about dtos representing Cloudera Manager responses. For me, classes placed here are more like "models" as they are used in business logic. What do you think about moving all of them to "model" package and renaming?
|
|
||
| import com.fasterxml.jackson.annotation.JsonIgnoreProperties; | ||
| import com.fasterxml.jackson.annotation.JsonProperty; | ||
| public enum YarnApplicationType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to extract it into the separate we should not lose the meaning of known or predefined types. Also Java doc with explanation will help.
What is the reason to extract it? I mean to have default types in specific task looks good and reasonable. Such approach has limited scope and easy to support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main reason is about using "SPARK" enum here. I think it's better to compare it to something what is already defined than to pure String.
If you prefer to keep it in the task, maybe I could create a private enum class there and keep predefinedAppTypes var as YarnApplicationType.values()?
Uh oh!
There was an error while loading. Please reload this page.