Add a metric to track job creation to pod creation time. #13

wonderyl · 2025-01-15T12:28:40Z

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds a metric to measure latency from job creation to pod creation in JobLifecycleLatency.

Which issue(s) this PR fixes:

n/a

Special notes for your reviewer:

Passed unit tests

$ go test ./...
ok      k8s.io/perf-tests/clusterloader2/api    (cached)
?       k8s.io/perf-tests/clusterloader2/cmd    [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/chaos      [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/errors     [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/execservice        [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/flags      [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/framework  [no test files]
ok      k8s.io/perf-tests/clusterloader2/pkg/config     (cached)
?       k8s.io/perf-tests/clusterloader2/pkg/framework/config   [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/imagepreload       [no test files]
ok      k8s.io/perf-tests/clusterloader2/pkg/framework/client   (cached)
?       k8s.io/perf-tests/clusterloader2/pkg/measurement        [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/bundle  [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/dns     [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/executors       [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/metrics [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/network [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/network-policy  [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/common/probes  [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/checker   [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/gatherers [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/informer  [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/kubelet   [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/kubemark  [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/measurement/util/workerqueue       [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/metadata   [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/prometheus/clients [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/provider   [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/state      [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/test       [no test files]
?       k8s.io/perf-tests/clusterloader2/pkg/tuningset  [no test files]
ok      k8s.io/perf-tests/clusterloader2/pkg/measurement/common 5.871s
ok      k8s.io/perf-tests/clusterloader2/pkg/measurement/common/slos    5.515s
ok      k8s.io/perf-tests/clusterloader2/pkg/measurement/util   (cached)
ok      k8s.io/perf-tests/clusterloader2/pkg/measurement/util/runtimeobjects    (cached)
ok      k8s.io/perf-tests/clusterloader2/pkg/modifier   (cached)
ok      k8s.io/perf-tests/clusterloader2/pkg/prometheus (cached)
ok      k8s.io/perf-tests/clusterloader2/pkg/util       (cached)

wonderyl · 2025-01-16T01:58:42Z

@microsoft-github-policy-service agree

alyssa1303 · 2025-01-18T00:03:24Z

clusterloader2/pkg/measurement/common/job_lifecycle_latency.go

@@ -60,6 +61,8 @@ func createJobLifecycleLatencyMeasurement() measurement.Measurement {
 		selector:        util.NewObjectSelector(),
 		jobStateEntries: measurementutil.NewObjectTransitionTimes(jobLifecycleLatencyMeasurementName),
 		eventQueue:      workqueue.New(),
+		podCreationTime: measurementutil.NewPodCreationEventTimes(),
+		eventTicker:     time.NewTicker(time.Minute),


This means that the measurement will run very 1 minute to collect data right? I wonder if we should do second for more fine-grained result? Or allow users to customize the frequency with a variable?

Correct me if I'm wrong, but the events are stored in ETCD for a while, every time we list, it will return all the events in its cache. As long as the interval is smaller than the life span the cache, it should not miss anything.

Talked to Anson, I'm working on a new version that uses informer of pods instead of ListEvents.

alyssa1303 · 2025-01-18T00:05:04Z

clusterloader2/pkg/measurement/common/job_lifecycle_latency_test.go

+		}
+	}
+
+	time.Sleep(2 * time.Second)


Why do we have to wait for 2 seconds in the test?

The ticker is set to 1 second. It sleeps for 2 seconds so that at least 1 tick is done.

alyssa1303 · 2025-01-18T00:06:17Z

clusterloader2/pkg/measurement/common/job_lifecycle_latency_test.go

+		}
+		gotP50 := getMetric(createToPodStart, "Perc50")
+		if gotP50 != 3*time.Minute {
+			t.Errorf("Expect create_to_pod_start Perc50 = 3 minutes, got %v", gotP50)


The majority of measurement reports time in second so you might want to consider second instead of minute to align with upstream

"Second" is tested in Line 170, right? Are you asking for every test case to use seconds as unit?

anson627 · 2025-01-22T00:59:23Z

clusterloader2/pkg/framework/client/objects.go

@@ -265,6 +262,13 @@ func ListEvents(c clientset.Interface, namespace string, name string, options ..
 	return obj, nil
 }

+// ListEvents retrieves events for the object with the given name.
+func ListEvents(c clientset.Interface, namespace string, name string, options ...*APICallOptions) (obj *apiv1.EventList, err error) {


can we change the order of these two functions, it will make resolve conflicts easier

anson627 · 2025-01-22T01:00:44Z

clusterloader2/pkg/measurement/common/job_lifecycle_latency.go

@@ -130,6 +135,7 @@ func (p *jobLifecycleLatencyMeasurement) start(c clientset.Interface) error {
 		p.addEvent,
 	)
 	go p.processEvents()
+	go measurementutil.RunEveryTick(p.eventTicker, p.getFuncToListJobEvents(c), p.stopCh)


or do we have to measure periodically? or it can be event driven

Add a metric to track job creation to pod creation time.

86e2aaf

wonderyl requested review from anson627 and alyssa1303 January 15, 2025 12:29

wonderyl self-assigned this Jan 15, 2025

anson627 force-pushed the master branch from af5924d to 6ec6333 Compare January 17, 2025 21:20

wonderyl changed the base branch from master to main January 19, 2025 05:48

alyssa1303 reviewed Jan 21, 2025

View reviewed changes

anson627 reviewed Jan 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a metric to track job creation to pod creation time. #13

Add a metric to track job creation to pod creation time. #13

Uh oh!

wonderyl commented Jan 15, 2025 •

edited

Loading

Uh oh!

wonderyl commented Jan 16, 2025

Uh oh!

alyssa1303 Jan 18, 2025

Uh oh!

wonderyl Jan 22, 2025

Uh oh!

wonderyl Jan 22, 2025

Uh oh!

alyssa1303 Jan 18, 2025

Uh oh!

wonderyl Jan 22, 2025

Uh oh!

alyssa1303 Jan 18, 2025

Uh oh!

wonderyl Jan 22, 2025

Uh oh!

anson627 Jan 22, 2025

Uh oh!

anson627 Jan 22, 2025

Uh oh!

Uh oh!

Add a metric to track job creation to pod creation time. #13

Are you sure you want to change the base?

Add a metric to track job creation to pod creation time. #13

Uh oh!

Conversation

wonderyl commented Jan 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Uh oh!

wonderyl commented Jan 16, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wonderyl commented Jan 15, 2025 •

edited

Loading