modules/zstd: Add support for decoding compressed blocks (DSLX part) #2230

wsipak · 2025-05-21T13:08:47Z

This PR adds the DSLX parts of #1857 in order not to depend on cocotb-related python packages
supersedes #1857

This PR extends the ZstdDecoder with support for decoding compressed blocks.

The decoder is capable of decoding RAW and RLE literals as well as sequences with predefined FSE tables.
A suite of DSLX tests comprising unit tests of all underlying procs and an integration test was prepared.
The integration test, similarly as in #1654, first generates a random valid ZSTD frame with compressed blocks and expected decoded output. Test data is then converted to a DSLX file (example) that is imported by the integration tests file.
At the beginning of the test, the default FSE decoding tables are filled with default distributions taken from RFC 8878 section 3.1.1.3.2.2. Default Distributions . Next, the encoded frame is loaded to the system memory and the decoder is configured through a set of CSRs to start the decoding process. The decoder starts the operation and writes the decoded frame back into the output buffer in the system memory. Once it finishes, it sends a pulse on the notify channel signaling the end of the decoding. The output of the decoder is compared against the decoding result from the reference library.

The PR introduces among others:

SequenceDecoder - responsible for decoding sequence sections of the compressed blocks
FseDecoder - introduced as the core part of the SequenceDecoder
RefillingShiftBuffer - used for storing and outputting in forward and backward fashion an arbitrary amount of bits required by the FSE decoder
LiteralsDecoder - capable of decoding RAW, RLE and Huffman-coded literals
HuffmanDecoder - used in decoding huffman-coded literals. Decoded Huffman trees are then used to decode one or four Huffman-coded streams.
CommandConstructor - this proc is responsible for sending packets with decoded sequences and literals to the SequenceExecutor proc
RamMux and RamDemux - procs used for handling requests/responses to multiple memory models. The procs interface with 3 separate memory buffers for FSE decoding tables.

proppy · 2025-05-22T07:02:14Z

The PNR step of the action has been running for 7+ hours: https://github.com/google/xls/actions/runs/15171186417/job/42667677899?pr=2230 is that expected?

dependency_support/pip_requirements.in

.github/workflows/modules-zstd.yml

proppy · 2025-05-23T06:11:09Z

xls/modules/zstd/frame_header_test.cc

any reason we're removin this?

We modified the interface of the part responsible for decoding fame header, and it was easier for us to extend the DSLX tests. This also made the tests consistent with the rest of the modules.

xls/modules/zstd/block_dec.x

proppy · 2025-05-23T06:11:48Z

xls/modules/zstd/magic.x

is that obsolete now w/ the new implementation?

The part responsible for checking the magic number has been moved to FrameHeaderDecoder:
https://github.com/google/xls/pull/2230/files#diff-09700fe24633d73f1e4217266bae21f08269db322cc896aec4b946ecf9842cc3R428

xls/modules/zstd/repacketizer.x

proppy · 2025-05-23T06:13:25Z

xls/modules/zstd/zstd_dec_test.cc

-  static const uint32_t random_frames_count = 100;
-};
-
-// Test `random_frames_count` instances of randomly generated valid


curious if that would have done using fuzzing instead?

We tried using fuzzing, but the problem is that the control sections need to follow a specific structure and match the properties of the generated data. It might be possible to use fuzzing with some constraints on the frame, but there are a lot of rules to follow, which makes it complicated. In the end, it was easier to use a tool made for creating random test data that meets these requirements.

xls/modules/zstd/rtl/xls_fifo_wrapper.v

proppy · 2025-05-23T06:14:22Z

xls/modules/zstd/rtl/zstd_dec_wrapper.v

can we make all the verilog .sv files?

proppy · 2025-05-23T06:15:15Z

.github/workflows/modules-zstd.yml

should we bump the timeout?

After bumping the PR we noticed that some targets become longer then expected.
We are working on fixing that

proppy · 2025-05-23T06:19:07Z

The PNR step of the action has been running for 7+ hours: https://github.com/google/xls/actions/runs/15171186417/job/42667677899?pr=2230 is that expected?

@QuantamHD @mikesinouye is a 17h+ runtime expected for PnR w/ a block of this size, or are there some tweak we should do to the config to speed things up?

proppy · 2025-05-23T06:52:01Z

xls/modules/zstd/BUILD

can you add:

# pytype binary, library"

near the other load statements that will help resolve the internal version of the python rules.

mikesinouye · 2025-05-23T16:41:58Z

The PNR step of the action has been running for 7+ hours: https://github.com/google/xls/actions/runs/15171186417/job/42667677899?pr=2230 is that expected?

@QuantamHD @mikesinouye is a 17h+ runtime expected for PnR w/ a block of this size, or are there some tweak we should do to the config to speed things up?

I notice that the target_die_utilization_percentage is quite low on many of these rules (5, 10%). Core area size can unexpectedly scale with runtime even though the 'complexity' of the problem is easier with more free space. I know this at least affects gpl, cts is possible as well.

We've done a lot of work to speed things up since the current rules_hdl version - I think this would be worth investigation after bumping again. We're still planning to do this soon. Everything but Qt is building using bazel in upstream OpenROAD, so we're almost there.

dependency_support/pip_requirements.in

proppy · 2025-05-29T15:22:25Z

Looks like there is an issue w/ executing some of DSLX tests:

F0529 13:44:40.943623   21896 ast_utils.cc:568] Check failed: new_parent != nullptr node `ram::WriteReq<RAM_ADDR_WIDTH, WeightPreScanMetaDataSize(), u32:1>` had no function parent (`#[test_proc]
proc Prescan_test {
    type external_ram_addr = uN[RAM_ADDR_WIDTH];
    type external_ram_data = uN[RAM_ACCESS_WIDTH];
    type PrescanOut = WeightPreScanOutput;
    type ReadReq = ram::ReadReq<RAM_ADDR_WIDTH, RAM_NUM_PARTITIONS>;
    type ReadResp = ram::ReadResp<RAM_ACCESS_WIDTH>;
    type WriteReq = ram::WriteReq<RAM_ADDR_WIDTH, RAM_ACCESS_WIDTH, RAM_NUM_PARTITIONS>;
    type WriteResp = ram::WriteResp;
    type InternalReadReq = ram::ReadReq<RAM_ADDR_WIDTH, u32:1>;
    type InternalReadResp = ram::ReadResp<WeightPreScanMetaDataSize()>;
    type InternalWriteReq = ram::WriteReq<RAM_ADDR_WIDTH, WeightPreScanMetaDataSize(), u32:1>;
    type InternalWriteResp = ram::WriteResp;
    terminator: chan<bool> out;
    external_ram_req: chan<WriteReq> out;
    external_ram_resp: chan<WriteResp> in;
    start_prescan: chan<bool> out;
    prescan_response: chan<PrescanOut> in;
    config(terminator: chan<bool> out) {
        let (RAMExternalWriteReq_s, RAMExternalWriteReq_r) = chan<WriteReq>("Write_channel_req");
        let (RAMExternalWriteResp_s, RAMExternalWriteResp_r) = chan<WriteResp>("Write_channel_resp");
        let (RAMExternalReadReq_s, RAMExternalReadReq_r) = chan<ReadReq>("Read_channel_req");
        let (RAMExternalReadResp_s, RAMExternalReadResp_r) = chan<ReadResp>("Read_channel_resp");
        spawn ram::RamModel<RAM_ACCESS_WIDTH, RAM_SIZE, RAM_PARTITION_SIZE>(RAMExternalReadReq_r, RAMExternalReadResp_s, RAMExternalWriteReq_r, RAMExternalWriteResp_s);
        let (RAMInternalWriteReq_s, RAMInternalWriteReq_r) = chan<InternalWriteReq>("Internal_write_channel_req");
        let (RAMInternalWriteResp_s, RAMInternalWriteResp_r) = chan<InternalWriteResp>("Internal_write_channel_resp");
        let (RAMInternalReadReq_s, RAMInternalReadReq_r) = chan<InternalReadReq>("Internal_read_channel_req");
        let (RAMInternalReadResp_s, RAMInternalReadResp_r) = chan<InternalReadResp>("Internal_read_channel_resp");
        spawn ram::RamModel<WeightPreScanMetaDataSize(), RAM_SIZE, WeightPreScanMetaDataSize()>(RAMInternalReadReq_r, RAMInternalReadResp_s, RAMInternalWriteReq_r, RAMInternalWriteResp_s);
        let (PreScanStart_s, PreScanStart_r) = chan<bool>("Start_prescan");
        let (PreScanResponse_s, PreScanResponse_r) = chan<PrescanOut>("Start_prescan");
        spawn WeightPreScan(PreScanStart_r, RAMExternalReadReq_s, RAMExternalReadResp_r, PreScanResponse_s, RAMInternalReadReq_s, RAMInternalReadResp_r, RAMInternalWriteReq_s, RAMInternalWriteResp_r);
        (terminator, RAMExternalWriteReq_s, RAMExternalWriteResp_r, PreScanStart_s, PreScanResponse_r)
    }
    init {
        ()
    }
    next(state: ()) {
        let tok = join();
        let rand_state = random::rng_new(random::rng_deterministic_seed());
        for (i, rand_state) in range(u32:0, MAX_SYMBOL_COUNT / PARALLEL_ACCESS_WIDTH) {
            let (new_rand_state, data_to_send) = for (j, (rand_state, data_to_send)) in range(u32:0, PARALLEL_ACCESS_WIDTH) {
    @     0x55f[9](https://github.com/google/xls/actions/runs/15324230058/job/43117399670?pr=2230#step:8:10)fae093fc  xls::dslx::DeduceCtx::DeduceAndResolve()
    @     0x55f9fac965a8  xls::dslx::(anonymous namespace)::DeduceVisitor::HandleLet()
    @     0x55f9fc198adb  xls::dslx::Let::Accept()
    @     0x55f9fac841e1  xls::dslx::Deduce()
    @     0x55f9fac80d45  std::__1::__function::__func<>::operator()()
    @     0x55f9fae085bc  xls::dslx::DeduceCtx::Deduce()
    @     0x55f9fac8c674  xls::dslx::(anonymous namespace)::DeduceVisitor::HandleStatement()
    @     0x55f9fc19893b  xls::dslx::Statement::Accept()
    @     0x55f9fac841e1  xls::dslx::Deduce()
    @     0x55f9fac80d45  std::__1::__function::__func<>::operator()()
    @     0x55f9fae085bc  xls::dslx::DeduceCtx::Deduce()
    @     0x55f9fac92b56  xls::dslx::(anonymous namespace)::DeduceVisitor::HandleStatementBlock()
    @     0x55f9fc1981cb  xls::dslx::StatementBlock::Accept()
    @     0x55f9fac841e1  xls::dslx::Deduce()
    @     0x55f9fac80d45  std::__1::__function::__func<>::operator()()
    @     0x55f9fae085bc  xls::dslx::DeduceCtx::Deduce()
    @     0x55f9fae093fc  xls::dslx::DeduceCtx::DeduceAndResolve()
    @     0x55f9fac81842  xls::dslx::TypecheckFunction()
    @     0x55f9fac7da70  std::__1::__variant_detail::__visitation::__base::__dispatcher<>::__dispatch[abi:ne180[10](https://github.com/google/xls/actions/runs/15324230058/job/43117399670?pr=2230#step:8:11)0]<>()
    @     0x55f9fac7b4ac  xls::dslx::typecheck_internal::TypecheckModuleMember()
    @     0x55f9fac7c0bf  xls::dslx::TypecheckModule()
    @     0x55f9fac7a9d6  xls::dslx::TypecheckModule()
    @     0x55f9fac79fbd  xls::dslx::ParseAndTypecheck()
    @     0x55f9fabe7628  xls::dslx::AbstractTestRunner::ParseAndTest()
    @     0x55f9fab75ca0  main
    @     0x7fe290a2a1ca  (unknown)
    @     0x7fe290a2a28b  __libc_start_main
/bin/bash: line 2: 2[18](https://github.com/google/xls/actions/runs/15324230058/job/43117399670?pr=2230#step:8:19)96 Aborted                 (core dumped) bazel-out/k8-opt-exec-ST-d57f47055a04/bin/xls/dslx/interpreter_main $file --compare=none --execute=false --dslx_path=:${PWD}:bazel-out/k8-opt/bin:bazel-out/k8-opt/bin::bazel-out/k8-opt/bin/ --warnings_as_errors=false
Error parsing and type checking DSLX source file: xls/modules/zstd/huffman_prescan.x

Signed-off-by: Pawel Czarnecki <[email protected]>

Signed-off-by: Maciej Torhan <[email protected]>

Remove references to buffer structs as those are not used anywhere Signed-off-by: Pawel Czarnecki <[email protected]>

Signed-off-by: Pawel Czarnecki <[email protected]>

Co-authored-by: Pawel Czarnecki <[email protected]> Co-authored-by: Robert Winkler <[email protected]> Signed-off-by: Maciej Torhan <[email protected]> Signed-off-by: Pawel Czarnecki <[email protected]> Signed-off-by: Robert Winkler <[email protected]>

Signed-off-by: Krzysztof Oblonczek <[email protected]>

Signed-off-by: Maciej Torhan <[email protected]>

Signed-off-by: Robert Winkler <[email protected]>

Signed-off-by: Pawel Czarnecki <[email protected]>

* Use materialize_internal_fifos when possible * Disable the above option for procs with loopback channels * Add missing module names * Add xls_fifo_wrapper verilog dependency to the synthesis of the procs without materialized internal fifos Signed-off-by: Pawel Czarnecki <[email protected]> Co-authored-by: Wojciech Sipak <[email protected]>

No longer applicable - there are no CC tests in ZSTD module Signed-off-by: Pawel Czarnecki <[email protected]>

Signed-off-by: Wojciech Sipak <[email protected]>

Improve formatting, wording and fix lint issues Co-authored-by: Dominik Lau <[email protected]> Co-authored-by: Szymon Gizler <[email protected]> Signed-off-by: Wojciech Sipak <[email protected]>

Co-authored-by: Pawel Czarnecki <[email protected]> Co-authored-by: Krzysztof Obłonczek <[email protected]>

Signed-off-by: Robert Winkler <[email protected]>

Let's use predefined data first and only then add zstd as dependency. Signed-off-by: Wojciech Sipak <[email protected]>

Signed-off-by: Robert Winkler <[email protected]>

wsipak · 2025-05-30T13:18:49Z

It seems like GitHub complains about me not being CLA approved as a PR opener (even though I have been covered by the Google CLA as a commit author since ages). This may be due to some (new?) email privacy settings in GH which I now disabled but the PR was issued prior to that. To check if that was indeed the problem, I'm closing this PR and a follow up PR should be opened shortly. Sorry for the inconvenience, but it seems like a GH limitation.

rw1nkler · 2025-05-30T13:30:07Z

Moved to #2296

proppy mentioned this pull request May 21, 2025

modules/zstd: Add support for decoding compressed blocks #1857

Closed

proppy reviewed May 22, 2025

View reviewed changes

dependency_support/pip_requirements.in Outdated Show resolved Hide resolved

proppy reviewed May 22, 2025

View reviewed changes

.github/workflows/modules-zstd.yml Show resolved Hide resolved