Skip to content

modules/zstd: Add support for decoding compressed blocks (DSLX part) #2296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rw1nkler
Copy link
Contributor

This PR adds the DSLX parts of #1857 in order not to depend on cocotb-related python packages supersedes #1857

This PR extends the ZstdDecoder with support for decoding compressed blocks.

The decoder is capable of decoding RAW and RLE literals as well as sequences with predefined FSE tables. A suite of DSLX tests comprising unit tests of all underlying procs and an integration test was prepared. The integration test, similarly as in #1654, first generates a random valid ZSTD frame with compressed blocks and expected decoded output. Test data is then converted to a DSLX file (example) that is imported by the integration tests file. At the beginning of the test, the default FSE decoding tables are filled with default distributions taken from RFC 8878 section 3.1.1.3.2.2. Default Distributions . Next, the encoded frame is loaded to the system memory and the decoder is configured through a set of CSRs to start the decoding process. The decoder starts the operation and writes the decoded frame back into the output buffer in the system memory. Once it finishes, it sends a pulse on the notify channel signaling the end of the decoding. The output of the decoder is compared against the decoding result from the reference library.

The PR introduces among others:

  • SequenceDecoder - responsible for decoding sequence sections of the compressed blocks
  • FseDecoder - introduced as the core part of the SequenceDecoder
  • RefillingShiftBuffer - used for storing and outputting in forward and backward fashion an arbitrary amount of bits required by the FSE decoder
  • LiteralsDecoder - capable of decoding RAW, RLE and Huffman-coded literals
  • HuffmanDecoder - used in decoding huffman-coded literals. Decoded Huffman trees are then used to decode one or four Huffman-coded streams.
  • CommandConstructor - this proc is responsible for sending packets with decoded sequences and literals to the SequenceExecutor proc
  • RamMux and RamDemux - procs used for handling requests/responses to multiple memory models. The procs interface with 3 separate memory buffers for FSE decoding tables.

@proppy
Copy link
Member

proppy commented Jun 2, 2025

Looks like the following target blow up over the internal memory limit for test (~10GB):

//xls/modules/zstd:huffman_weights_dec_dslx_test
//xls/modules/zstd:sequence_dec_dslx_test
//xls/modules/zstd:zstd_dec_dslx_test

Is that expected? (possibly related to #1042?)

@rw1nkler
Copy link
Contributor Author

rw1nkler commented Jun 2, 2025

Looks like the following target blow up over the internal memory limit for test (~10GB):

//xls/modules/zstd:huffman_weights_dec_dslx_test
//xls/modules/zstd:sequence_dec_dslx_test
//xls/modules/zstd:zstd_dec_dslx_test

Is that expected? (possibly related to #1042?)

@proppy yest the issue is caused by huge RAM consumption in DSLX interpreter in general, which was also reported in #1897

@proppy
Copy link
Member

proppy commented Jun 2, 2025

Is there a way to reduce the number of test cases so that it fits within 10G?

import xls.modules.zstd.literals_buffer;
import xls.modules.zstd.fse_table_creator;
import xls.modules.zstd.ram_mux;
// import xls.modules.zstd.zstd_frame_testcases as comp_frame;
Copy link
Member

@proppy proppy Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that commented on purpose? is the idea here to allow different type of frame to be tested depending on what you uncomment? (if yes maybe those should be split in different tests?)

Copy link
Contributor Author

@rw1nkler rw1nkler Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a perfect solution, however currently the interpreted does not free memory between running separate test cases, which may be a problem for this test.

Copy link
Contributor Author

@rw1nkler rw1nkler Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also tried using ir-jit evaluator for dslx tests, but it works just for 29/55 of them. The other are failing due to the following reasons:

  • Incorrect token usage in the for loop which cannot be converted to ir (these tests should use unroll_for! instead)
  • IrConversionError: Could not find name for invocation: 'enumerate', indicating that the enumerate has no representation in IR?
  • Just a SegmanetationFault on many tests

@rw1nkler
Copy link
Contributor Author

rw1nkler commented Jun 2, 2025

Is there a way to reduce the number of test cases so that it fits within 10G?

@proppy I will try to reduce the number of test cases or move them to different test procs.
The ability to filter tests could help with this separation. Please take a look at #2312

@proppy
Copy link
Member

proppy commented Jun 3, 2025

@rw1nkler as discussed, let's just tag the target as local for now w/ a TODO that point to #1897 and/or #1042.

@rw1nkler
Copy link
Contributor Author

rw1nkler commented Jun 6, 2025

I noticed that OpenROAD failed with a segmentation fault in the CI. We've actually seen quite a bit of instability with this tool internally. In many cases, simply rerunning the flow has resolved the issue.

@proppy
Copy link
Member

proppy commented Jun 6, 2025

I noticed that OpenROAD failed with a segmentation fault in the CI. We've actually seen quite a bit of instability with this tool internally. In many cases, simply rerunning the flow has resolved the issue.

@QuantamHD / @mikesinouye fyi.

@QuantamHD
Copy link
Contributor

We might need to bump openroad. There were some tsan issues resolved recently.

tags = ["manual"],
target_die_utilization_percentage = "10",
)
# TODO: Uncomment after resolving problems with too long P&R
Copy link
Member

@proppy proppy Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't manual be enough to avoid triggering it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual tag is used to exclude a target from the //xls/modules/zstd:all and //xls/modules/zstd/... targets. We use it for all targets except xls_dslx_library so that the main CI can typecheck the zstd part without running other expensive checks.

Hoverwer, there is a dedicated CI for zstd. To run all manual targets, it explicitly lists them using bazel query, which searches for groups of targets with common suffixes. I made these rules more strict to allow re-enabling some previously commented-out targets. To skip them safely, I used the _skip suffix in the target names so the CI won't match them.

@mikesinouye
Copy link
Contributor

@QuantamHD's current work is finishing the Qt bazel build hookup for upstream OpenROAD. Once that's finalized, we can clean rules_hdl's deps and bump the pointer to head for OR. There have been many stability improvements in the last 15 months so hopefully that addresses whatever issues you're observing. I think this will happen soonish, once we do let's revisit this to confirm it does.

@proppy proppy requested a review from richmckeever June 6, 2025 17:50
@rw1nkler rw1nkler force-pushed the zstd_compressed_block_dec_dslx_part branch 2 times, most recently from b352452 to 19e2fba Compare June 10, 2025 08:00
lpawelcz and others added 7 commits June 10, 2025 10:39
Remove references to buffer structs as those are not used anywhere

Signed-off-by: Pawel Czarnecki <[email protected]>
rw1nkler and others added 22 commits June 10, 2025 10:44
Signed-off-by: Robert Winkler <[email protected]>
Signed-off-by: Pawel Czarnecki <[email protected]>
* Use materialize_internal_fifos when possible
* Disable the above option for procs with loopback channels
* Add missing module names
* Add xls_fifo_wrapper verilog dependency to the synthesis of the procs without materialized internal fifos

Signed-off-by: Pawel Czarnecki <[email protected]>
Co-authored-by: Wojciech Sipak <[email protected]>
No longer applicable - there are no CC tests in ZSTD module

Signed-off-by: Pawel Czarnecki <[email protected]>
Signed-off-by: Wojciech Sipak <[email protected]>
Improve formatting, wording and fix lint issues

Co-authored-by: Dominik Lau <[email protected]>
Co-authored-by: Szymon Gizler <[email protected]>
Signed-off-by: Wojciech Sipak <[email protected]>
Co-authored-by: Pawel Czarnecki <[email protected]>
Co-authored-by: Krzysztof Obłonczek <[email protected]>
Signed-off-by: Robert Winkler <[email protected]>
Let's use predefined data first and only then add
zstd as dependency.

Signed-off-by: Wojciech Sipak <[email protected]>
Signed-off-by: Robert Winkler <[email protected]>
The CI setup for the zstd module uses a bazel query to find groups of targets
that share a common suffix. By appending `_skip` to a target name,
the query no longer matches it, effectively excluding that target from the CI run.

Signed-off-by: Robert Winkler <[email protected]>
Make Bazel queries stricter so that target names must end exactly with
the specified suffix, without allowing any additional characters after it.

Signed-off-by: Robert Winkler <[email protected]>
@rw1nkler rw1nkler force-pushed the zstd_compressed_block_dec_dslx_part branch 2 times, most recently from cffde4e to 484cc69 Compare June 10, 2025 08:51
@copybara-service copybara-service bot merged commit 0d103a6 into google:main Jun 12, 2025
7 checks passed
@kgugala kgugala deleted the zstd_compressed_block_dec_dslx_part branch June 12, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.