Skip to content

Conversation

jfdreis
Copy link

@jfdreis jfdreis commented Mar 27, 2025

Added metrics to measure communication in time and MBs when handling nilRAG

@jimouris jimouris requested a review from jcabrero March 27, 2025 13:53
@jimouris jimouris changed the title Benchmarks nilRAG Metrics Mar 27, 2025
@jcabrero
Copy link
Member

This wouldn't be very easy to work with. Think that every time we had a request, that request would be a log, and you would have to process all the logs to get to that. NilAI API produces multiple logs, not only those that you're printing. I would consider maybe putting those metrics to a file, and then getting information from that file. I would also consider giving the filename a random name as there are up to 50 workers and each worker would colide in the same filename. Finally, I don't really know if it fully makes sense to integrate this with the whole codebase as the goal is to have metrics.

@jimouris
Copy link
Member

This wouldn't be very easy to work with. Think that every time we had a request, that request would be a log, and you would have to process all the logs to get to that. NilAI API produces multiple logs, not only those that you're printing. I would consider maybe putting those metrics to a file, and then getting information from that file. I would also consider giving the filename a random name as there are up to 50 workers and each worker would colide in the same filename. Finally, I don't really know if it fully makes sense to integrate this with the whole codebase as the goal is to have metrics.

I recommended to @jfdreis to integrate it here since we'll be doing some updates to RAG soon to make it scale and after these updates are done, we'll need more metrics. So it only makes sense to have it here instead of redoing it and redeploying it every time we change RAG.

Totally open to suggestions on what is the best way to do this. Maybe a hidden API /rag-metrics (or a flag?) that calls the /chat-completions + adds metrics makes sense? I agree that printing them doesn't make sense, but probably returning them in the chat completions response could be a good alternative.

@jfdreis follow up to @jcabrero to figure out the best way to do this.

@jcabrero
Copy link
Member

I wanted to double-check some points that might need further discussion. The metrics in nilRAG seem to come from the specific measurements required from @jfdreis rather than from a broader FE or product requirement.
This means the following:

  • This is a feature just available for nilRAG, but not for general nilAI calls, thus if we wanted to merge it, it should shape as a more general feature.
  • I don't think it is good to allow any nilAI user to grab the metrics. It can help identify bottlenecks to an outsider and enable potential attacks.
  • The current approach seems more aligned with an internal observability requirement (i.e., feeding data to Grafana) rather than providing comprehensive API access to the measurements.
  • As it stands, the code is optimized for that particular measurement, meaning that merging it would essentially transfer the maintenance responsibility to me before we repeat the final measurements with the new version of nilRAG.

I suggest that the PR stays open for now, and once I have some bandwidth, I can try to transform it into Prometheus metrics that we can access through Grafana.

@jimouris
Copy link
Member

Makes sense. My only concern with not merging it is that if something in the API changes, then it won't work out of the box and we'll likely want to get new numbers soon.

This is only for us as you said, so I think it'd be fine to merge and keep it as a hidden flag. If you plan on getting Grafana and Prometheus metrics soon, then let's keep it open and you can take over this PR and merge when we have the Prometheus metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants