Skip to content

Conversation

@kashikamahajan
Copy link
Collaborator

No description provided.

@iross iross self-requested a review October 14, 2025 13:47
Copy link
Collaborator

@iross iross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's in good shape overall! There are a few things worth addressing here and I think we should go ahead and fix them as part of this PR so that you can try out that aspect of the PR workflow. Commits to this branch will automatically be included in the PR, and you can mark all the comments as resolved as you push fixes.


## 📁 Repository Structure

- hold_classifer.py # Diagnoses held jobs and groups by hold reasons
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the files have been renamed since this readme was written.

if wall_time:
runtimes.append(wall_time)

from statistics import median
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You've already imported statistics, so no need to do it again here: you can just use statistics.median

print(f"{'Efficiency Notes':^80}")
print("=" * 80)

def warn(resource, efficiency):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the upper and lower bounds should be arguments. I can definitely see value in being able to set different thresholds for each resource type.


if __name__ == "__main__":
if len(sys.argv) != 2:
print("Usage: python htcondor_cluster_summary.py <ClusterId>")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of the script is wrong

filepath = os.path.join(data_dir, f"cluster_{cluster_id}_jobs.csv")

if not os.path.exists(filepath):
print(f"File not found: {filepath}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be an informative error message for a user.

code = ad.eval("HoldReasonCode")
subcode = ad.eval("HoldReasonSubCode")

reason = ad.eval("HoldReason").split('. ')[0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment here to explain why the split. I'm guessing it eliminates a lot of the EP-specific things that would muddy up the similarities?

Dict[int, List[Tuple[str, int]]]: A dictionary mapping each HoldReasonCode to a list of (HoldReason, HoldReasonSubCode) tuples.
"""
def group_by_code(cluster_id):
global total_jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a global? It seems to only be used in this function.


while hits and len(all_hits) < MAX_RESULTS:
remaining = MAX_RESULTS - len(all_hits)
to_add = hits[:remaining]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing the reason for this. If hits is just the next page of results from ES, I don't see why we'd want to potentially truncate that collection.

def load_job_data(cluster_id, folder="cluster_data"):
filepath = os.path.join(folder, f"cluster_{cluster_id}_jobs.csv")
if not os.path.exists(filepath):
print(f"File not found: {filepath}")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, a user who sees this message won't necessarily understand that it was a result of a bad ClusterID (or at least a Cluster whose csv isn't where the program expected)

return jobs

# safe conversion to float
def safe_float(val):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things like this (and the validate_params) which are repeated patterns can be moved into utils so that definition changes only need to happen once.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! It looks like you started to define things there, but haven't imported them from there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants