Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Link Application Metrics with Chaos Experiments #38

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

STRRL
Copy link
Member

@STRRL STRRL commented Feb 16, 2022

Signed-off-by: STRRL [email protected]

rendered markdown

@STRRL STRRL marked this pull request as ready for review February 17, 2022 09:25
@STRRL STRRL changed the title WIP: Link Application Metrics with Chaos Experiments RFC: Link Application Metrics with Chaos Experiments Feb 18, 2022
@g1eny0ung g1eny0ung self-requested a review February 18, 2022 05:59
This feature is all about Chaos Dashboard, would not affect anything with other
components and CRD.

We would use Grafana as the source of application metrics, and use Grafana Panel
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems weird. If the goal is to correlate chaos experiments with user applications, then why we don't just correlate with the metrics directly? Grafana itself is the UI/display layer.

Copy link
Member

@g1eny0ung g1eny0ung Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to answer all of the following questions together, but my understanding may not be as accurate as @STRRL and @cwen0, feel free to @ them.

Overall, this is more of an exploration. The goal is to bring users an integrated interface that includes experiments, workflows, and panels rendered from data obtained through the grafana API.

To achieve this, we want to be able to fetch the grafana panel data through the grafana API and then customize the rendering in the chaos dashboard. We will take the panel as the smallest unit and store it as a resource so that any chaos can associate the panel and combine their active time with the original panel data.

Users can simply click Share this panel, paste the link into the chaos dashboard's interface, and associate anything Chaos Mesh related with the panel's data.

It's an all-in-one experience where users no longer need to install plugins, configure the API address, and complicate their panels.

But that doesn't mean that we're going to abandon grafana datasource, we want to make it more of a community-driven project, and any good ideas and suggestions we have will be added to the datasource plugin if any.

Copy link
Member

@g1eny0ung g1eny0ung Feb 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I still want to make the point that it is extremely difficult to get close to the grafana ecosystem by way of plugins, and as someone who has had hands-on experience, writing plugins is painful (there is no extensive documentation, and I can only try to write them by referring to the source code of some plugins). I think this will make it fundamentally difficult to attract more people to participate in plugin development.

draw charts by ourself. The most important problem is that we also provide the
feature about "exploring the query" and "design panels and dashboards" like
Grafana, for user to tuning their charts.
- Choose other Grafana alternative integration, like Datadog, Elastic stack, New
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grafana is the only frontend UI. But we are comparing it with Datadog, Elastic, which are the whole observability stack, including UI, metrics collecting, storage, etc. I don't think these are good examples here. Does it make sense to compare between these?


We already have a grafana plugin as mark annotations on grafana dashboard. There
are also feature overlapping between that grafana plugin and this new feature.
It might take more effort to keep behaviors consistent.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not add the link feature in the Grafana datasource? Like currently how trace, exemplars, metrics and logs correlations are implemented? This seems less effort and can do the same thing.

Co-authored-by: Siyu Chen <[email protected]>
Signed-off-by: STRRL <[email protected]>
@STRRL STRRL force-pushed the link-application-metrics branch from a64133b to 0eab044 Compare March 1, 2022 05:06

Pros:

- It would prevent data loss when the Pod chaos-dashboard is deleted/recreated.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's time to reconsider how to treat our data in chaos-dashboard?

Was it designed as persisted or volatile? @g1eny0ung

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's hard for me to give an idea right now because it seems that both approaches have their advantages and disadvantages. I'll give it some more thought. 🤔

@STRRL STRRL mentioned this pull request Mar 2, 2022
51 tasks
@cwen0 cwen0 added the hold label Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants