Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing metrics on Cluster with Down Node #48

Open
tmysl opened this issue Dec 28, 2017 · 2 comments
Open

Missing metrics on Cluster with Down Node #48

tmysl opened this issue Dec 28, 2017 · 2 comments
Assignees
Labels

Comments

@tmysl
Copy link

tmysl commented Dec 28, 2017

Hello, we recently lost a node in our cluster and started getting some odd behavior since that happened.

Can no longer see:

  • cluster CPU usage on graphs
  • protocol based metrics
  • job engine
  • cluster network

I'm not sure if the node loss is related, but i am seeing this in the logs:

2017-12-28 19:56:46,043:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.siq.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,043:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.jobd.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,043:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.smb2.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,043:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.nfs4.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,043:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.irp.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,044:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.lsass_in.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,044:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.lsass_out.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,044:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.papi.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:56:46,044:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.protostats.hdfs.total' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'ifs.bytes.in.rate' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'ifs.bytes.out.rate' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'ifs.ops.in.rate' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'ifs.ops.out.rate' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.cpu.idle.avg' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.cpu.intr.avg' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,889:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.cpu.user.avg' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'.
2017-12-28 19:57:16,890:isi_data_insights_daemon:WARNING: Query for stat: 'cluster.cpu.sys.avg' on 'MY-CLUSTER', returned error: 'Degraded result, some node input missing'

I've ruled out configuration, network, even used a different mgmt ip address. We have a second cluster that is health and reading metrics just fine, is this expected for a degraded cluster?

@tenortim
Copy link
Collaborator

No, it's not expected that you would be missing the data like that. I'll need to set up a test to repro but this is definitely a real bug.

@tenortim tenortim added the bug label Nov 6, 2019
@tenortim tenortim self-assigned this Nov 6, 2019
@tenortim
Copy link
Collaborator

tenortim commented Nov 3, 2021

There seems to be an issue with the degraded functionality not working correctly in the SDK. Still looking into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants