Skip to content
This repository has been archived by the owner on Aug 9, 2024. It is now read-only.

FateCluster Fails reconciliation #34

Open
fbalicchia opened this issue Mar 24, 2021 · 10 comments
Open

FateCluster Fails reconciliation #34

fbalicchia opened this issue Mar 24, 2021 · 10 comments

Comments

@fbalicchia
Copy link

Hi,
after deployed Fate operator and apply kubefate and fatecluster Fate operator seems to fail during reconciliation phase maintaining fatecluster crd in status creating.
here request that fails

Here a error log from controller side

2021-03-24T07:44:49.001Z DEBUG controllers.FateCluster request info {"url": "http://kubefate-kubefate-kubefate-sample.kube-fate:8080/v1/cluster/8e4c85be-4428-4f51-a55d-bac3db91816c", "type": "GET", "body": ""}

and here from service side

2021/03/24 07:52:25 /workspace/pkg/modules/cluster_db.go:135 record not found [0.611ms] [rows:0] SELECT * FROM clustersWHERE uuid = '8e4c85be-4428-4f51-a55d-bac3db91816c' ANDclusters.deleted_atIS NULL ORDER BYclusters.id LIMIT 1 2021-03-24T07:52:25Z ERR workspace/pkg/api/cluster.go:152 > get cluster error error="record not found" uuid=8e4c85be-4428-4f51-a55d-bac3db91816c 2021-03-24T07:52:25Z ERR usr/local/go/src/net/http/server.go:1919 > Request ip=10.244.0.5 latency=1.1971 method=GET path=/v1/cluster/8e4c85be-4428-4f51-a55d-bac3db91816c status=500 user-agent=Go-http-client/1.1

Do I need to run some init actions before use examples ?

Thanks

@LaynePeng
Copy link
Contributor

It seems the FATE cluster is deploying, and the log from controller is a debug message. Can everything works after the FATE crd created? Or can we describe the pod status of FATE cluster and see if any error there?

@fbalicchia
Copy link
Author

The problem seems that crd stay stuck. After applied ./config/samples/app_v1beta1_fatecluster.yaml crd remain in status creating cause probably controller can't close reconcile ?
Thanks for help

@fbalicchia
Copy link
Author

fbalicchia commented Apr 9, 2021

Hi there @LaynePeng
did you managed to investigate ?

@LaynePeng
Copy link
Contributor

Hi there @LaynePeng
did you managed to investigate ?

We still cannot reproduce this problem? Any other tips can be found in logs? @owlet42 Have you any idea on this problem?

@owlet42
Copy link
Contributor

owlet42 commented Apr 10, 2021

It may be an accident, if there is more log information, maybe it can be solved.

@fbalicchia
Copy link
Author

Hi there,
I haven't many logs than you see above but I can reproduce problem easily with

cat clusterconfig-1.18.yaml << EOF > clusterconfig-1.18.yaml
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  image: kindest/node:v1.18.8
  extraPortMappings:
  - containerPort: 31080
    hostPort: 80
  - containerPort: 31443
    hostPort: 443
EOF


kind create cluster --config clusterconfig-1.18.yaml --name fate-operator
from fate-operator root folder
export IMG=federatedai/fate-controller:bc5420bbe25
make docker-build-without-test
kind load docker-image federatedai/fate-controller:bc5420bbe25  --name fate-operator
make deploy

k apply -f config/samples/rbac-config.yaml
k apply -f config/samples/kubefate-secret.yaml
k create ns fate-9999

k create -f ./config/samples/app_v1beta1_kubefate.yaml
k get pods -n kube-fate
kubectl create -f ./config/samples/app_v1beta1_fatecluster.yaml
kubectl get fatecluster -A

kubectl get fatecluster -A

fate-9999   fatecluster-sample   9999      Creating

k logs fate-operator-controller-manager-86b58ffc9b-666sh manage -n fate-operator-system

021-04-11T11:02:36.886Z	DEBUG	controllers.FateCluster	retry	{"retry": 3}
2021-04-11T11:02:36.887Z	DEBUG	controllers.FateCluster	request info	{"url": "http://kubefate-kubefate-kubefate-sample.kube-fate:8080/v1/cluster/562481ab-6c84-4279-888a-ff81b5e7e965", "type": "GET", "body": ""}
2021-04-11T11:02:37.641Z	DEBUG	controllers.FateCluster	request code	{"Type": "GET", "Path": "cluster/562481ab-6c84-4279-888a-ff81b5e7e965", "respCode": 500, "respBody": "{\"error\":\"record not found\"}"}

@fbalicchia
Copy link
Author

Hi @owlet42 did you managed to investigate ?

@LaynePeng
Copy link
Contributor

Any new update about this issue?

@fbalicchia
Copy link
Author

Hi @LaynePeng not from my side. I haven't see any relevant commit Do I need to replicate test ?

@fbalicchia
Copy link
Author

Any new update about this issue?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants