-
Notifications
You must be signed in to change notification settings - Fork 44
modules: Move feasibility/satisfiability checking into a new module #1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
c31f3e7
to
c6f6c95
Compare
93b13a6
to
3d94893
Compare
9e2814e
to
bdb3fca
Compare
6308980
to
b8f1289
Compare
9b84f5d
to
eae90b4
Compare
6e18b88
to
cd8ba09
Compare
65ca1fc
to
4eaef1c
Compare
fd25295
to
fd31259
Compare
Problem: feasibility.check RPCs take up too much of sched-fluxion-resource's single-threaded time. Add sched-fluxion-feasibility module that can run on multiple ranks to handle feasibility.check RPCs.
Problem: All calls to the feasibility.check RPC are in s-f-resource tests, not s-f-feasibility tests. Create a feasibility module test file, t4014, and update other feasibility.check tests to query the feasibility module.
Problem: t4014 does not test the behavior of resource -> feasibility notification after the feasibility module restarts. This is a special case because resource needs to send first-time information after the restart, including all resources, lost resources, and resource graph expiration. Create t4015 to test feasibility notification.
Problem: PR flux-framework#1352 changed populate_resource_db to support resource shrinking, but it left in an old version of the function. Remove it.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1285 +/- ##
========================================
+ Coverage 76.4% 76.5% +0.1%
========================================
Files 112 115 +3
Lines 16700 16929 +229
========================================
+ Hits 12765 12962 +197
- Misses 3935 3967 +32
🚀 New features to boost your workflow:
|
/*****************************************************************************\ | ||
* Copyright 2014 Lawrence Livermore National Security, LLC | ||
* (c.f. AUTHORS, NOTICE.LLNS, LICENSE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the copyright year:
/*****************************************************************************\ | |
* Copyright 2014 Lawrence Livermore National Security, LLC | |
* (c.f. AUTHORS, NOTICE.LLNS, LICENSE) | |
/*****************************************************************************\ | |
* Copyright 2025 Lawrence Livermore National Security, LLC | |
* (c.f. AUTHORS, NOTICE.LLNS, LICENSE) |
@@ -0,0 +1,1548 @@ | |||
/*****************************************************************************\ | |||
* Copyright 2014 Lawrence Livermore National Security, LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright 2014 Lawrence Livermore National Security, LLC | |
* Copyright 2025 Lawrence Livermore National Security, LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good work. I have a few nits, and a minor point for discussion regarding the RPC name.
Apart from that, I'd like to discuss changing the satisfiability pruning filter to be ALL:core,ALL:node
and perhaps add ALL:gpu
as well. Adding a node
type to the pruning filter by default should make the is_satisfiable
function in the traverser faster and more powerful.
@@ -113,7 +113,7 @@ def rpc_namespace_info(self, rank, type_name, identity): | |||
|
|||
def rpc_satisfiability(self, jobspec): | |||
payload = {"jobspec": jobspec} | |||
return self.handle.rpc("sched-fluxion-resource.satisfiability", payload).get() | |||
return self.handle.rpc("feasibility.check", payload).get() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's revisit this RPC name. Adhering to the existing pattern yields sched-fluxion-feasibility.satisfiability
which is clunky.
@@ -0,0 +1,192 @@ | |||
/*****************************************************************************\ | |||
* Copyright 2014 Lawrence Livermore National Security, LLC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright 2014 Lawrence Livermore National Security, LLC | |
* Copyright 2025 Lawrence Livermore National Security, LLC |
version: 9999 | ||
resources: | ||
- type: node | ||
count: 2 | ||
with: | ||
- type: slot | ||
count: 1 | ||
label: default | ||
with: | ||
- type: core | ||
count: 1 | ||
attributes: | ||
system: | ||
duration: 300 | ||
tasks: | ||
- command: [ "hostlist" ] | ||
slot: default | ||
count: | ||
per_slot: 1 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following the current directory structure convention, I think it's better to move shrink2-4
into t/data/resource/jobspecs/satisfiability
.
prune-filters=ALL:core subsystems=containment policy=high && | ||
load_feasibility load-file=${grug} load-format=grug \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we should set prune-filters
to ALL:core,ALL:node
in the tests and in the module.
@@ -86,6 +86,7 @@ set(ALL_TESTS | |||
t4012-set-status.t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commit message body should wrap at 72 characters.
Problem: Feasibility checking takes up a significant amount of sched-fluxion-resource's time that could be better spent on scheduling.
Solution: Move feasibility checking into a new module, sched-fluxion-feasibility, that can run on multiple ranks.