Change content returned by warehouse raw data REST endpoint to help clients catch silent errors. #2032
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR changes the
/rest/warehouse/raw-data
endpoint to do proper streaming using chunked transfer encoding wherein each row is sent as:and after all rows have been sent:
This allows clients to verify that all the rows were sent by checking for the final
0\r\n\r\n
.While developing this PR, it was also noticed that some of the functions involved in getting the raw data can be made static since they don't manipulate instance variables; this PR updates those.
This PR also fixes a bug when generating regression test artifacts.
The CI tests for this PR depend on ubccr/xdmod-qa#41.
ubccr/xdmod-supremm#426 updates the regression test artifacts for
xdmod-supremm
.ubccr/xdmod-data#73 updates
xdmod-data
to support the new response and warn if the0\r\n\r\n
was not received.Motivation and Context
The current endpoint (as implemented in #1858) sends data as a JSON text sequence, but it does not send any special chunk once all the rows have been sent. Thus, the client has no way of knowing if the connection was closed before all the data were sent (because the response is streaming, the status code 200 is sent prior to any data being sent, and the size of the content is not calculated ahead of time). In tandem with the fact that ACCESS XDMoD currently has a 30-minute limit on script execution time, this led to bugs in which
xdmod-data
could request raw data (e.g.,2024-01-01
through2024-03-31
in theSUPREMM
realm), but only some of the rows would be returned, the script would time out, and no errors or warnings would appear on the client side.Tests performed
This PR and ubccr/xdmod-data#73 update the regression tests.
Checklist: