Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workflow-manager cluster checkins intermittently happening really frequently #71

Open
jackfrancis opened this issue Apr 22, 2016 · 6 comments
Labels

Comments

@jackfrancis
Copy link
Member

see these cluster_id values:

  • dfa0c19a-2eba-4631-b92f-d884c50fe04e
  • 8b88f34c-d4ca-4344-92f0-273891e6b9a0
@jackfrancis jackfrancis added this to the v2.0-beta4 milestone Apr 22, 2016
@mboersma mboersma modified the milestones: v2.0-rc1, v2.0-beta4 May 9, 2016
@arschles
Copy link
Member

A few notes here: the return value for the dfa0c19a-2eba-4631-b92f-d884c50fe04e cluster is as follows:

curl https://versions.deis.com/v2/clusters/dfa0c19a-2eba-4631-b92f-d884c50fe04e

{
    "id": "dfa0c19a-2eba-4631-b92f-d884c50fe04e",
    "firstSeen": "0001-01-01T00:00:00Z",
    "lastSeen": "0001-01-01T00:00:00Z",
    "components": [{
        "component": {
            "name": "deis-builder",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-controller",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-database",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-logger",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-minio",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-registry",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-router",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }, {
        "component": {
            "name": "deis-workflow-manager",
            "description": "For testing only!"
        },
        "version": {
            "train": "",
            "version": "2.0.0-beta2",
            "data": null
        }
    }]
}

Note that the firstSeen and lastSeen values are effectively set to 0, which is a symptom of #115. It, however, hides the issue from clients of the API.

Also, deis/workflow-manager#56 was recently merged, which fixed a large number of issues related to the background jobs scheduler/runner, some of which have existed for a while afaict. My guess is that one or more of these problems was causing the intermittent burst of checkins.

I'm leaving this open but moving to v2 so that @jackfrancis can take another look.

@arschles arschles modified the milestones: v2.0, v2.0-rc1 May 23, 2016
@slack slack removed this from the v2.0 milestone Jun 3, 2016
@arschles
Copy link
Member

arschles commented Jun 6, 2016

@jackfrancis if you have a chance to take a look at this, it would be very helpful. If this issue is still occurring, can you add more details on repro steps (if any) and/or how to see the buggy behavior. If it's not, feel free to close.

@jackfrancis
Copy link
Member Author

jackfrancis commented Jun 12, 2016

This is still happening, though not as frequently. An exemplary cluster has cluster_id 7ca7e0b7-eb37-449d-baad-37e387cc2835. It has 6 checkins in a period where we'd expect to see just 5:

deis_prod_pg=> select created_at from clusters_checkins where cluster_id = '7ca7e0b7-eb37-449d-baad-37e387cc2835';
     created_at      
---------------------
 2016-06-10 19:59:39
 2016-06-11 07:59:38
 2016-06-11 08:03:58
 2016-06-11 20:03:25
 2016-06-12 08:03:25
 2016-06-12 20:09:10
(6 rows)

(Note the two checkins just over 4 minutes apart on June 11.)

@kmala
Copy link
Contributor

kmala commented Jun 13, 2016

may be he uninstalled and installed the deis cluster or the wfm component restarted which are possible scenarios i can think of

@mboersma
Copy link
Member

@jackfrancis is this issue ongoing or has it resolved with newer versions of Workflow?

@jackfrancis
Copy link
Member Author

Looks like this bug is resilient.

select created_at from clusters_checkins where cluster_id = 'b4bfe468-5197-4e8c-8ea5-d1bdbc48cc85' order by created_at asc;
     created_at      
---------------------
 2016-08-20 01:22:07
 2016-08-20 13:22:07
 2016-08-21 01:22:07
 2016-08-21 09:20:27
 2016-08-21 16:03:28
 2016-08-21 16:12:18
 2016-08-21 18:32:03
 2016-08-21 20:04:19
 2016-08-21 22:10:01
 2016-08-22 01:41:24
 2016-08-22 03:51:56
 2016-08-22 05:51:05
 2016-08-22 05:55:01
 2016-08-22 09:42:42
 2016-08-22 10:45:55
(15 rows)

That's a v2.4.0 cluster:

https://versions.deis.com/v3/clusters/b4bfe468-5197-4e8c-8ea5-d1bdbc48cc85

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants