Skip to content

Change content returned by warehouse raw data REST endpoint to help clients catch silent errors. #2032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

aaronweeden
Copy link
Contributor

@aaronweeden aaronweeden commented Apr 25, 2025

Description

This PR changes the /rest/warehouse/raw-data endpoint to do proper streaming using chunked transfer encoding wherein each row is sent as:

<hex size of row>\r\n<row>\r\n

and after all rows have been sent:

0\r\n\r\n

This allows clients to verify that all the rows were sent by checking for the final 0\r\n\r\n.

While developing this PR, it was also noticed that some of the functions involved in getting the raw data can be made static since they don't manipulate instance variables; this PR updates those.

This PR also fixes a bug when generating regression test artifacts.

The CI tests for this PR depend on ubccr/xdmod-qa#41.

ubccr/xdmod-supremm#426 updates the regression test artifacts for xdmod-supremm.

ubccr/xdmod-data#73 updates xdmod-data to support the new response and warn if the 0\r\n\r\n was not received.

Motivation and Context

The current endpoint (as implemented in #1858) sends data as a JSON text sequence, but it does not send any special chunk once all the rows have been sent. Thus, the client has no way of knowing if the connection was closed before all the data were sent (because the response is streaming, the status code 200 is sent prior to any data being sent, and the size of the content is not calculated ahead of time). In tandem with the fact that ACCESS XDMoD currently has a 30-minute limit on script execution time, this led to bugs in which xdmod-data could request raw data (e.g., 2024-01-01 through 2024-03-31 in the SUPREMM realm), but only some of the rows would be returned, the script would time out, and no errors or warnings would appear on the client side.

Tests performed

This PR and ubccr/xdmod-data#73 update the regression tests.

Checklist:

  • The pull request description is suitable for a Changelog entry
  • The milestone is set correctly on the pull request
  • The appropriate labels have been added to the pull request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugfixes Category: Data Analytics Framework php Pull requests that update Php code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant