NETOBSERV-2402: Adding lokistack status to console plugin configmap#2142
Conversation
|
@OlivierCazade: This pull request references NETOBSERV-2402 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
8d89c3d to
6815b9b
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2142 +/- ##
==========================================
- Coverage 71.96% 71.60% -0.36%
==========================================
Files 93 93
Lines 10491 10567 +76
==========================================
+ Hits 7550 7567 +17
- Misses 2466 2513 +47
- Partials 475 487 +12
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| if mgr.ClusterInfo.HasLokiStack() { | ||
| builder.Watches(&lokiv1.LokiStack{}, &handler.EnqueueRequestForObject{}) |
There was a problem hiding this comment.
ideally, we would enqueue only when our configured lokistack is affected, not all lokistacks
but I guess that's fine, as we don't expect hundreds of lokistacks out there :-)
memodi
left a comment
There was a problem hiding this comment.
Hi @OlivierCazade - adding review comments from Claude, I focused on must-fix and high impact fixes including better error reporting.
Besides the comments, it also pointed out missing test coverage on the operator side:
Missing in Operator:
- Test for LokiStack status embedding in configmap
- Test for namespace defaulting logic
- Test for behavior when LokiStack is not found
- Test for behavior when LokiStack CRD is not available
| if mgr.ClusterInfo.HasLokiStack() { | ||
| builder.Watches(&lokiv1.LokiStack{}, &handler.EnqueueRequestForObject{}) |
There was a problem hiding this comment.
from Claude review:
Problem: This will create reconcile requests for LokiStack objects, not FlowCollectors. When a LokiStack named "logging-loki" changes, it will try to reconcile a FlowCollector named "logging-loki", which likely doesn't exist.
Fix Required:
builder.Watches(&lokiv1.LokiStack{}, handler.EnqueueRequestsFromMapFunc(
func(ctx context.Context, obj client.Object) []reconcile.Request {
lokiStack := obj.(*lokiv1.LokiStack)
var flowCollectors flowslatest.FlowCollectorList
if err := mgr.GetClient().List(ctx, &flowCollectors); err != nil {
log.FromContext(ctx).Error(err, "Failed to list FlowCollectors")
return []reconcile.Request{}
}
var requests []reconcile.Request
for _, fc := range flowCollectors.Items {
if fc.Spec.Loki.Mode == flowslatest.LokiModeLokiStack &&
fc.Spec.Loki.LokiStack.Name == lokiStack.Name {
ns := fc.Spec.Loki.LokiStack.Namespace
if ns == "" {
ns = fc.Namespace
}
if ns == lokiStack.Namespace {
requests = append(requests, reconcile.Request{
NamespacedName: types.NamespacedName{
Name: fc.Name,
Namespace: fc.Namespace,
},
})
}
}
}
return requests
},
))
There was a problem hiding this comment.
I did not think about making a k8s query inside the handler to filter Lokistacks.
The approach I thought was to wait for the flowcollector to be created and start a dedicated controller with a static flowcollector name. This was adding a lot of complexity and I was not sure if it was worth it.
This looks like a more simple solution, to the price of a k8s querry in the handler function.
@jotak what do you think ?
There was a problem hiding this comment.
Not sure we need to overcomplicate things here. It's basically my comment here: #2142 (comment) ; having a couple of false-positive reconcile events is not so important; we're talking about lokistack objects, it's not expected to have many, and they don't change often.
btw I think claude answer is wrong the enqueue request is not for a flow-collector named after the loki stack, it's for any flow-collector? (EnqueueRequestForObject{} with empty params)
There was a problem hiding this comment.
If we really want to narrow down to our configured lokistack, in other situation we just keep in controller state the last-time seen element that we want to check (we do that in a couple of places for flowcollector.spec.namespace iirc); we could do the same with the configured lokistack.
Like this: https://github.com/netobserv/network-observability-operator/blob/main/internal/controller/flp/flp_controller.go#L60
| lokiStack = nil | ||
| log.FromContext(ctx).Info("Could not get the LokiStack resource.") |
There was a problem hiding this comment.
from Claude review:
Problems:
- Logs at Info level instead of Warning
- Actual error is discarded
- Can't distinguish between "not found" and "permission denied"
- Should surface in FlowCollector status
Fix:
if err := r.Client.Get(ctx, types.NamespacedName{Name: desired.Loki.LokiStack.Name, Namespace: ns}, lokiStack); err != nil {
lokiStack = nil
if apierrors.IsNotFound(err) {
log.FromContext(ctx).Info("LokiStack resource not found, status will not be available",
"name", desired.Loki.LokiStack.Name,
"namespace", ns)
} else {
log.FromContext(ctx).Error(err, "Failed to get LokiStack resource",
"name", desired.Loki.LokiStack.Name,
"namespace", ns)
}
// TODO: Consider surfacing this in FlowCollector status
}
6815b9b to
89e605f
Compare
51f5209 to
0439130
Compare
|
/retest |
| return nil | ||
| } | ||
|
|
||
| func getLokiStatus(lokiStack *lokiv1.LokiStack) string { |
There was a problem hiding this comment.
not tested, but looking at the code I think there's a problem here:
In reconcileConfigMap, the lokiStack object is created and can have the following values:
- a reference to the lokistack that was found
- nil if an error occurred
- a reference to an empty (0-value) struct
&lokiv1.LokiStack{}when not in LokiStack mode
So in this last situation, the returned value would be pending, which seems incorrect?
IMO, what we could do:
- if there was an error when fetching LokiStack, set this error as status (e.g. the console plugin could display something like "LokiStack not found")
- if the lokistack shows a non-ready status (error/pending condition), set a message with that condition status
- if the loki stack is ready, set as ready
- if not in lokistack mode, set as an empty string
Also, I'm not sure it's useful to check for the presence of the LokiStack API: if it's configured in LokiStack mode BUT the API is not present, the config is wrong, so it's ok to just display the error message that would come up when trying to fetch LokiStack?
wdyt?
There was a problem hiding this comment.
In theory, we should not enter the case of not being in LokiStack, but since we are passing LokiStack as a pointer I addded a not nil check. I modified the function to make this more understandable.
When not in LokiStack mode, the operator fallback to the previous mode, using the StatusURL.
If the lokistack shows a non-ready status, finding the right error might be tricky, each LokiStack component has its own status meaning we may display the wrong error. IMO this is simpler to display a pending status and let the user investigate, the LokiStack was provided by the user.
I removed the LokiStack API check.
0439130 to
23bc1bc
Compare
23bc1bc to
f93c1dd
Compare
664361f to
ac940f9
Compare
|
/retest |
|
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:efa5a04 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-efa5a04Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-efa5a04
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
fee84b3 to
17f7bb2
Compare
3764313 to
bb7a73e
Compare
|
/ok-to-test |
|
New images:
They will expire after two weeks. To deploy this build: # Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:b2dd1c3 make deploy
# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-b2dd1c3Or as a Catalog Source: apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: netobserv-dev
namespace: openshift-marketplace
spec:
sourceType: grpc
image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-b2dd1c3
displayName: NetObserv development catalog
publisher: Me
updateStrategy:
registryPoll:
interval: 1m |
|
/label qe-approved |
|
@OlivierCazade: This pull request references NETOBSERV-2402 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
works much better now! thanks @OlivierCazade |
|
/label qe-approved |
bb7a73e to
4e968c1
Compare
|
/lgtm |
|
/cherry-pick release-1.11 |
|
/approve |
|
@OlivierCazade: only netobserv org members may request cherry picks. If you are already part of the org, make sure to change your membership to public. Otherwise you can still do the cherry-pick manually. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: OlivierCazade The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/cherry-pick release-1.11 |
|
@OlivierCazade: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@OlivierCazade: new pull request created: #2403 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Description
Add Lokistack status to the plugin configmap.
The Lokistack is also now watched to update any status change.
Dependencies
n/a
Checklist
If you are not familiar with our processes or don't know what to answer in the list below, let us know in a comment: the maintainers will take care of that.