nilRAG Metrics #74

jfdreis · 2025-03-27T13:25:41Z

Added metrics to measure communication in time and MBs when handling nilRAG

jcabrero · 2025-03-27T14:05:27Z

This wouldn't be very easy to work with. Think that every time we had a request, that request would be a log, and you would have to process all the logs to get to that. NilAI API produces multiple logs, not only those that you're printing. I would consider maybe putting those metrics to a file, and then getting information from that file. I would also consider giving the filename a random name as there are up to 50 workers and each worker would colide in the same filename. Finally, I don't really know if it fully makes sense to integrate this with the whole codebase as the goal is to have metrics.

jimouris · 2025-03-27T15:02:04Z

This wouldn't be very easy to work with. Think that every time we had a request, that request would be a log, and you would have to process all the logs to get to that. NilAI API produces multiple logs, not only those that you're printing. I would consider maybe putting those metrics to a file, and then getting information from that file. I would also consider giving the filename a random name as there are up to 50 workers and each worker would colide in the same filename. Finally, I don't really know if it fully makes sense to integrate this with the whole codebase as the goal is to have metrics.

I recommended to @jfdreis to integrate it here since we'll be doing some updates to RAG soon to make it scale and after these updates are done, we'll need more metrics. So it only makes sense to have it here instead of redoing it and redeploying it every time we change RAG.

Totally open to suggestions on what is the best way to do this. Maybe a hidden API /rag-metrics (or a flag?) that calls the /chat-completions + adds metrics makes sense? I agree that printing them doesn't make sense, but probably returning them in the chat completions response could be a good alternative.

@jfdreis follow up to @jcabrero to figure out the best way to do this.

jcabrero · 2025-03-31T16:47:46Z

I wanted to double-check some points that might need further discussion. The metrics in nilRAG seem to come from the specific measurements required from @jfdreis rather than from a broader FE or product requirement.
This means the following:

This is a feature just available for nilRAG, but not for general nilAI calls, thus if we wanted to merge it, it should shape as a more general feature.
I don't think it is good to allow any nilAI user to grab the metrics. It can help identify bottlenecks to an outsider and enable potential attacks.
The current approach seems more aligned with an internal observability requirement (i.e., feeding data to Grafana) rather than providing comprehensive API access to the measurements.
As it stands, the code is optimized for that particular measurement, meaning that merging it would essentially transfer the maintenance responsibility to me before we repeat the final measurements with the new version of nilRAG.

I suggest that the PR stays open for now, and once I have some bandwidth, I can try to transform it into Prometheus metrics that we can access through Grafana.

jimouris · 2025-03-31T17:06:05Z

Makes sense. My only concern with not merging it is that if something in the API changes, then it won't work out of the box and we'll likely want to get new numbers soon.

This is only for us as you said, so I think it'd be fine to merge and keep it as a hidden flag. If you plan on getting Grafana and Prometheus metrics soon, then let's keep it open and you can take over this PR and merge when we have the Prometheus metrics.

jfdreis added 2 commits March 26, 2025 17:25

Measuring time and communication

ff1c6e5

feat: add flag for communication measurements

1bd9a20

jimouris requested a review from jcabrero March 27, 2025 13:53

jimouris changed the title ~~Benchmarks~~ nilRAG Metrics Mar 27, 2025

feat: adding measurements results in the response

502467e

feat: changed metrics presentation in the response

88d5b6b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

nilRAG Metrics #74

nilRAG Metrics #74

Uh oh!

jfdreis commented Mar 27, 2025

Uh oh!

jcabrero commented Mar 27, 2025

Uh oh!

jimouris commented Mar 27, 2025

Uh oh!

jcabrero commented Mar 31, 2025

Uh oh!

jimouris commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nilRAG Metrics #74

Are you sure you want to change the base?

nilRAG Metrics #74

Uh oh!

Conversation

jfdreis commented Mar 27, 2025

Uh oh!

jcabrero commented Mar 27, 2025

Uh oh!

jimouris commented Mar 27, 2025

Uh oh!

jcabrero commented Mar 31, 2025

Uh oh!

jimouris commented Mar 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants