Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Katalyst-colocation-orm can be installed on enhanced-k8s cluster but katalyst-colocation cannot be installed #617

Open
ozline opened this issue Jun 10, 2024 · 1 comment

Comments

@ozline
Copy link
Contributor

ozline commented Jun 10, 2024

What happened?

I followed Colocate your application using Katalyst to install Katalyst.

It mentioned that if you use Kubewharf enhanced kubernetes, install katalyst-colocation

And if you use vanilla kubernetes, install katalyst-colocation-orm

My node follows Install Kubewharf enhanced-k8s to install enhanced k8s, but only katalyst-colocation-orm can be installed instead of katalyst-colocation

If I install katalyst-colocation, it will report the following error in katalyst-colocation-agent

I0610 13:10:27.641756       1 state_checkpoint.go:121] "[cpu_plugin] State checkpoint: restored state from checkpoint"
I0610 13:10:27.641777       1 util.go:68] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] get reservedQuantityInt: 0 from ReservedCPUCores configuration
I0610 13:10:27.641787       1 util.go:77] [katalyst-core/pkg/agent/qrm-plugins/cpu/util.GetCoresReservedForSystem] take reservedCPUs:  by reservedCPUsNum: 0
I0610 13:10:27.641832       1 policy.go:950] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).cleanPools] there is no pool to delete
I0610 13:10:27.641842       1 policy.go:964] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReservePool] initReservePool reserve:
I0610 13:10:27.641859       1 state_mem.go:109] "[cpu_plugin] updated cpu plugin pod entries" podUID="reserve" containerName="" allocationInfo="{\"pod_uid\":\"reserve\",\"owner_pool_name\":\"reserve\",\"allocation_result\":\"\",\"original_allocation_result\":\"\",\"topology_aware_assignments\":{},\"original_topology_aware_assignments\":{},\"init_timestamp\":\"\",\"labels\":null,\"annotations\":null,\"qosLevel\":\"\"}"
I0610 13:10:27.644274       1 policy.go:1039] [katalyst-core/pkg/agent/qrm-plugins/cpu/dynamicpolicy.(*DynamicPolicy).initReclaimPool] exist initial reclaim: 0-9
I0610 13:10:27.644300       1 agent.go:102] needToRun "qrm_cpu_plugin"
I0610 13:10:27.644308       1 agent.go:91] initializing "qrm_io_plugin"
I0610 13:10:27.644320       1 agent.go:102] needToRun "qrm_io_plugin"
I0610 13:10:27.644325       1 agent.go:91] initializing "qrm_network_plugin"
W0610 13:10:27.644335       1 util.go:122] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.filterNICsByAvailability] nic: eno1 doesn't have IP address
I0610 13:10:27.644344       1 util.go:302] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.getReservedBandwidth] reservedBanwidth: 0, nicCount: 1, policy: first,
I0610 13:10:27.644361       1 state_net.go:47] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.NewNetworkPluginState] initializing new network plugin in-memory state store"
I0610 13:10:27.644372       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644511       1 util.go:37] [GenerateMachineState: katalyst-core/pkg/agent/qrm-plugins/network/state.GenerateMachineState] NIC wlp2s0's speed: -1, capacity: [0/0], reservation: 0
I0610 13:10:27.644531       1 state_net.go:121] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetMachineState] updated network plugin machine state" NICMap="{\"wlp2s0\":{\"egress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"ingress_state\":{\"Capacity\":0,\"SysReservation\":0,\"Reservation\":0,\"Allocatable\":0,\"Allocated\":0,\"Free\":0},\"pod_entries\":{}}}"
I0610 13:10:27.644543       1 state_net.go:145] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*networkPluginState).SetPodEntries] updated network plugin pod resource entries" podEntries="{}"
I0610 13:10:27.644555       1 state_checkpoint.go:136] "[network_plugin: katalyst-core/pkg/agent/qrm-plugins/network/state.(*stateCheckpoint).restoreState] state checkpoint: restored state from checkpoint"
I0610 13:10:27.644572       1 policy.go:177] [katalyst-core/pkg/agent/qrm-plugins/network/staticpolicy.(*StaticPolicy).ApplyConfig] apply configs, qosLevelToNetClassMap: map[dedicated_cores:0 reclaimed_cores:0 shared_cores:0 system_cores:0], podLevelNetClassAnnoKey: katalyst.kubewharf.io/net_class_id, podLevelNetAttributesAnnoKeys: []
I0610 13:10:27.644581       1 agent.go:102] needToRun "qrm_network_plugin"
I0610 13:10:27.644588       1 agent.go:91] initializing "periodical-handler-manager"
I0610 13:10:27.644593       1 agent.go:102] needToRun "periodical-handler-manager"
I0610 13:10:27.644600       1 agent.go:91] initializing "katalyst-agent-orm"
I0610 13:10:27.644631       1 manager.go:86] "Creating topology manager with policy per scope" topologyPolicyName=""
E0610 13:10:27.644640       1 manager.go:129] unknown policy: ""
E0610 13:10:27.644647       1 agent.go:94] Error initializing "katalyst-agent-orm"
I0610 13:10:27.644662       1 file.go:257] [GetUniqueLock] release lock successfully
I0610 13:10:28.396105       1 file.go:90] fsNotify watcher notify "/var/lib/kubelet/resource-plugins/kubelet_qrm_checkpoint": CREATE
I0610 13:10:28.396155       1 topology_adapter.go:281] qrm state file changed, notify to update topology status
I0610 13:10:28.396166       1 kubeletplugin.go:177] send topology change notification to plugin kubelet-reporter-plugin
run command error: failed to init ORM: unknown policy: ""

Only katalyst-agent not working

root@debian-node-1:~# kubectl get pods -n katalyst-system
NAME                                                       READY   STATUS             RESTARTS      AGE
katalyst-colocation-katalyst-agent-f5glx                   0/1     CrashLoopBackOff   4 (36s ago)   2m32s
katalyst-colocation-katalyst-agent-jzgft                   0/1     CrashLoopBackOff   4 (52s ago)   2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-jcn9m   1/1     Running            0             2m32s
katalyst-colocation-katalyst-controller-59b5c89cd6-vpjvq   1/1     Running            0             2m32s
katalyst-colocation-katalyst-metric-85c47ff4bf-nl9sf       1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-8mszz    1/1     Running            0             2m32s
katalyst-colocation-katalyst-scheduler-77cdd9d66f-c27qc    1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-ngz2x       1/1     Running            0             2m32s
katalyst-colocation-katalyst-webhook-5f6ccc7cb-vrnzs       1/1     Running            0             2m32s

But install katalyst-colocation-orm in Kubewharf enhanced kubernetes work fine(pod status of agent is Running

What did you expect to happen?

install katalyst-colocation in KubeWharf-enhanced-kubernetes work fine

How can we reproduce it (as minimally and precisely as possible)?

Install katalyst-colocation using helm after installing KubeWharf-enhanced-kubernetes

helm install katalyst-colocation -n katalyst-system --create-namespace kubewharf/katalyst-colocation

Software version

No response

@pendoragon
Copy link
Member

I think we have some issue with the katalyst-colocation helm chart here which enables orm by default, will have to fix it.

BTW, installing kubewharf enhanced kubernetes is error-prone and the installation guide is not universal enough to cover every scenario. so if possible I would recommend trying katalyst-colocation-orm on a vanilla kubernetes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants