Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E2E Promeheus metrics to applications #845

Merged
merged 7 commits into from
Nov 4, 2024

Conversation

eero-t
Copy link
Contributor

@eero-t eero-t commented Nov 1, 2024

Description

PR does following changes:

  • Adds request, first token and inter-token latency Prometheus metrics, for monitoring application performance
    • To ServiceOrchectrator token processing, which seems only place where this could be done
  • Adds "inprogress" HTTP metrics that can be used for scaling application uservices based on incoming requests
    • Needed because currently provided number of already processed HTTP requests tells just how much of them service currently can process, not how many additional ones it should be able to process
  • Fixes BaseStatistics method name typos

Issues

opea-project/GenAIExamples#391

Type of change

List the type of change like below. Please delete options that are not relevant.

  • New feature (non-breaking change which adds new functionality)

Dependencies

No new ones.

(prometheus_fastapi_instrumentator imported for HttpService already imported prometheus_client module to apps.)

Tests

Verified manually that produced metrics match ones from a benchmark that stresses the ChatQnA application.

Potential future changes (other PRs)

  • Add service monitors for all applications (to GenAIInfra repo)
  • Add dashboards for the new metrics (to GenAIInfra repo)
  • Add CLI options / env vars for disabling latency, inprogress and *_created metrics (prometheus_client.disable_created_metrics())?
  • Change ServiceOrchestrator object and all applications and tests creating them to provide unique name for the orchestrator instance, and use that as metric prefix. Instead of all orchestrator instances sharing the same set of megaservice_ prefixed singleton metrics...

@eero-t
Copy link
Contributor Author

eero-t commented Nov 1, 2024

Regarding the duplicate inprogress metrics error in CI tests....

Creating multiple MicroServices creates multiple HTTPServices which creates multiple prometheus-fastapi-instrumentor instances.

While prometheus-fastapi-instrumentor handled that fine for ChatQnA and normal HTTP metrics, for some reason that was not the case for its inprogress metrics in CI.

=> I think MicroService constructor name argument (currently optional) needs to be mandatory, so that it can be used to make name of inprogress metric unique for each HTTPService instance. That requires small change to Gateway class, but otherwise I think everything else is fine with that.

Note: MicroService class is not subclassed, therefore its class name does not help, as it's always the same. So I think that part needs to be dropped, also because metric names cannot contain special chars like /.

PS. prometheus-fastapi-instrumentor requires HTTPService instance specific Starlette instance, so it cannot be made singleton, like I did for metrics that I'm directly adding.

@eero-t
Copy link
Contributor Author

eero-t commented Nov 1, 2024

Rebased pre-commit changes to earlier commits, and pushed above described solution to the CI issue on enabling inprogress metrics.

I'm currently testing whether I could get somewhat similar metric (reliably!) also from ServiceOrchestrator::execute().

If that works, enabling the "inprogress" metrics for prometheus-fastapi-instrumentor can be dropped, as changes CI requires for that are a bit intrusive.

EDIT: And on further grepping tests seem to be testing on the unwanted /MicroService name suffix...

Creating multiple MicroService()s creates multiple HTTPService()s
which creates multiple Prometheus fastapi instrumentor instances.

While latter handled that fine for ChatQnA and normal HTTP metrics,
that was not the case for its "inprogress" metrics in CI.

Therefore MicroService constructor name argument is now mandatory, so
that it can be used to make "inprogress" metrics for HTTPService
instances unique.

PS. instrumentor requires HTTPService instance specific Starlette
instance, so it cannot be made singleton.

Signed-off-by: Eero Tamminen <[email protected]>
Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 94.11765% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
comps/cores/mega/base_statistics.py 50.00% 2 Missing ⚠️
Files with missing lines Coverage Δ
comps/cores/mega/gateway.py 30.24% <ø> (ø)
comps/cores/mega/http_service.py 77.46% <100.00%> (+0.65%) ⬆️
comps/cores/mega/micro_service.py 91.13% <ø> (ø)
comps/cores/mega/orchestrator.py 92.14% <100.00%> (+1.18%) ⬆️
comps/cores/mega/base_statistics.py 42.10% <50.00%> (ø)

@Spycsh
Copy link
Member

Spycsh commented Nov 4, 2024

LGTM

@Spycsh Spycsh merged commit a6998a1 into opea-project:main Nov 4, 2024
13 checks passed
Spycsh added a commit that referenced this pull request Nov 4, 2024
@Spycsh Spycsh mentioned this pull request Nov 4, 2024
4 tasks
@eero-t
Copy link
Contributor Author

eero-t commented Nov 4, 2024

@Spycsh, @lvliang-intel Any suggestions where the new metrics should be documented; in GenAIExamples, or GenAIInfra repo?

Or is it enough to add add Prometheus serviceMonitors to Helm chats for (rest of) the OPEA applications, and some Grafana dashboards for them?

@Spycsh
Copy link
Member

Spycsh commented Nov 5, 2024

Hi @eero-t , GenAIEval https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/grafana does track some Prometheus metrics and provide the naive measurement of first token latency and avg token latency, which are on the client side instead of through Prometheus. Welcome to add some documents there in the future.

@eero-t
Copy link
Contributor Author

eero-t commented Nov 5, 2024

Hi @eero-t , GenAIEval https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/grafana does track some Prometheus metrics and provide the naive measurement of first token latency and avg token latency, which are on the client side instead of through Prometheus. Welcome to add some documents there in the future.

Eval repo is for evaluating and benchmarking, whereas metrics provided by the service "frontend", are (also) for operational monitoring, normal, every day usage of the service.

I think most appropriate place would be the Infra repo, as it already includes monitoring support both with Helm charts [1], and separate manifest files + couple of Grafana dashboards [2], but that's rather Kubernetes specific.

[1] https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/monitoring.md
[2] https://github.com/opea-project/GenAIInfra/blob/main/kubernetes-addons/Observability/README.md

@Spycsh
Copy link
Member

Spycsh commented Nov 6, 2024

Hi @eero-t , GenAIEval https://github.com/opea-project/GenAIEval/tree/main/evals/benchmark/grafana does track some Prometheus metrics and provide the naive measurement of first token latency and avg token latency, which are on the client side instead of through Prometheus. Welcome to add some documents there in the future.

Eval repo is for evaluating and benchmarking, whereas metrics provided by the service "frontend", are (also) for operational monitoring, normal, every day usage of the service.

I think most appropriate place would be the Infra repo, as it already includes monitoring support both with Helm charts [1], and separate manifest files + couple of Grafana dashboards [2], but that's rather Kubernetes specific.

[1] https://github.com/opea-project/GenAIInfra/blob/main/helm-charts/monitoring.md [2] https://github.com/opea-project/GenAIInfra/blob/main/kubernetes-addons/Observability/README.md

Sure. Thanks for pointing out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants