[VM] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation #2926

LNow · 2021-11-13T02:28:19Z

Is your feature request related to a problem? Please describe.
Contracts with comments (especially with very verbose and descriptive ones) are more expensive in use than contracts that do not have any comments at all. The difference in execution can be easily explain with size difference.

Over time we will see more and more complex contracts, and to make them readable and somewhat understandable to normal user they will need more and more comments.
If developers will have to choose between contracts readability and lower execution costs, they will start choosing the second one. And as a result we will loose the most important feature of Clarity.

But what if we would store contracts in 2 versions?

Original one, just like they have been written and deployed by developers.
"Minified" one, stripped of all comments and extra whitespaces

First one would be stored on-chain just like it is done right now - to keep readability. While the second one could be stored on a side, and used as "executable" version - to reduce execution costs.
There is no point in loading into memory contracts with comments every single time they are called if comments plays no role in execution.

Developers could pay more for contract deployment (2x storage + additional processing), but execution should be cheaper and faster.

jcnelson · 2021-11-15T16:46:29Z

Yes, probably. We can have the DB store a compressed representation of the contract, and only bill the user for loading/storing the compressed representation. No need to limit ourselves to a minified representation -- we can lz4 it for example.

LNow · 2021-11-15T17:40:33Z

Great! With such change we could push stacksgov/sips#32 forward without thinking much about how comments affects contract size and execution costs.
As a developer I would pay more to deploy 100% readable and nicely documented code if I would be sure that my comments won't have big negative impact on contract execution costs.

jcnelson · 2021-11-15T20:04:09Z

I updated the issue name to reflect the change that will be carried out here. It's pretty straight-forward:

compress the contract before storing it
decompress it while loading it
bill the caller by the number of bytes in the compressed representation

cylewitruk · 2022-11-14T18:23:05Z

I think this issue sounds interesting if not to just to learn more of the codebase (I also like saving storage space). Is this still relevant? If so I'd be happy to take an (educational) stab it it... @jcnelson

A few random thoughts:

It would be interesting to explore the separation of storage of a minimized & execution-optimized version of the contract, vs. the contract in its original form (either completely separately, or diff'ed). Rationale being that the contract, once published, will mostly be read for execution. The stacks API could read from endpoints the specifically retrieve the originally-formatted version. This would imply that the costs for storing and subsequently reading a contract would be different.
On top of the above, would it be possible to further optimize this to only retrieve called function(s) + variables?
Given the above point, could nodes in some way opt-out from storing the "original" (non-minimized) versions of contracts? If so, how to ensure that a contract's original format could never be lost? (maybe some sort of minimum-number-of-nodes that has a full copy..?)
Benchmarking a (larger) sample of existing Clarity contracts against different compression algorithms. LZ4 is across-the-board a winner. But perhaps for read-heavy workloads there is an algorithm that is slower but more efficient on the compression-side, but cheaper on the read-side? Maybe zstd?
Some sort of header likely needs to be introduced for storing/retrieving (on a node-level) so that the node can determine upon read how to handle the data.

And then exploring other databases than SqlLite, for example RocksDB (since the majority of operations are KV in nature).. RocksDB particularly because it supports e.g. LZ4 out-of-the-box. A change here would likely be a separate issue, pending its relevance. Nevermind, now I found the other usages :)

jcnelson · 2022-11-18T19:47:22Z

It's definitely relevant! Compressing the clarity contract text could save ~50% of the bytes loaded. In fact, changing the on-disk representation of the Clarity code and analysis metadata could be done at any time, without a consensus-breaking change or a SIP. However, in order to pass the savings on to users (e.g. by changing the amount of block space it requires), we'd need to calculate a new cost function for contract-loads. This could be done with the voting procedure described in SIP-006, or it could be done in the next hard fork -- whichever happens sooner.

It would be interesting to explore the separation of storage of a minimized & execution-optimized version of the contract, vs. the contract in its original form (either completely separately, or diff'ed). Rationale being that the contract, once published, will mostly be read for execution. The stacks API could read from endpoints the specifically retrieve the originally-formatted version. This would imply that the costs for storing and subsequently reading a contract would be different.

I'm not sure minification gets you anything special here? If we store the code compressed, we'd get better storage savings than minification. Also, minification won't improve execution speed nearly as well as something like byte-compiling the Clarity code. So if either of these are goals -- reduced storage and execution time -- we'd probably want to explore other tactics besides minification.

On top of the above, would it be possible to further optimize this to only retrieve called function(s) + variables?

Yes, I think this could be done. Again, changing the associated cost functions will be an involved process, but the node implementation could be changed to do this without breaking anything.

Given the above point, could nodes in some way opt-out from storing the "original" (non-minimized) versions of contracts?

No, this is neither possible nor desirable. Contracts are part of the blocks, and all nodes must store all blocks in order to ensure that the system remains resilient to unpredictable node churn and network partitions.

Benchmarking a (larger) sample of existing Clarity contracts against different compression algorithms. LZ4 is across-the-board a winner. But perhaps for read-heavy workloads there is an algorithm that is slower but more efficient on the compression-side, but cheaper on the read-side? Maybe zstd?

Yeah, we'd want to do this before picking a default compression algorithm. However, the choice of compression algorithm is only necessary once the cost of loading the contract from source is reduced to the cost of loading the compressed representation (i.e. by changing the cost function). The compression algorithm implementation would need to be deterministic and would almost certainly need to be vendored into the codebase to ensure that all nodes compress contracts to the exact same number of bytes.

Some sort of header likely needs to be introduced for storing/retrieving (on a node-level) so that the node can determine upon read how to handle the data.

This is kinda-sorta done with the analysis DB, but as you can see from the code comments, it's very coarse-grained at this time.

cylewitruk · 2022-11-21T08:54:32Z

It would be interesting to explore the separation of storage of a minimized & execution-optimized version of the contract, vs. the contract in its original form (either completely separately, or diff'ed). Rationale being that the contract, once published, will mostly be read for execution. The stacks API could read from endpoints the specifically retrieve the originally-formatted version. This would imply that the costs for storing and subsequently reading a contract would be different.

I'm not sure minification gets you anything special here? If we store the code compressed, we'd get better storage savings than minification. Also, minification won't improve execution speed nearly as well as something like byte-compiling the Clarity code. So if either of these are goals -- reduced storage and execution time -- we'd probably want to explore other tactics besides minification.

I had written this before I had a better understanding how things worked - I had thought the contracts were loaded as plain-text and parsed again when pulled out, but now I see that's not the case :) So this point can be ignored.

Yeah, we'd want to do this before picking a default compression algorithm. However, the choice of compression algorithm is only necessary once the cost of loading the contract from source is reduced to the cost of loading the compressed representation (i.e. by changing the cost function). The compression algorithm implementation would need to be deterministic and would almost certainly need to be vendored into the codebase to ensure that all nodes compress contracts to the exact same number of bytes.

My quick local (and unscientific) tests on both lz4 and zstd, looking only at compression efficiency, were:

lz4: down to 8% of of uncompressed size (using defaults)
zstd: down to 3% of uncompressed size (using defaults)
(for uncompressed contract sources indexed by boomcrypto)

jcnelson · 2023-01-19T03:29:08Z

This is something we'd like to do in the near future. @cylewitruk has graciously taken on the implementation effort.

jcnelson · 2023-02-22T03:13:07Z

Assigning to @obycode for now. Please feel free to re-assign.

jcnelson added consensus-critical stacks-2.1 labels Nov 15, 2021

jcnelson changed the title ~~[Clarity VM/DB] store and use "minified" contracts during execution.~~ [Clarity] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation Nov 15, 2021

obycode mentioned this issue Nov 16, 2021

SIP - Contract Documentation Standard stacksgov/sips#32

Draft

jcnelson added stacks-future and removed stacks-2.1 labels May 23, 2022

LNow closed this as completed Jan 19, 2023

jcnelson reopened this Jan 19, 2023

jcnelson assigned obycode Feb 22, 2023

jcnelson added the L1 Working Group Issue or PR related to improving L1 label Feb 22, 2023

pavitthrap added ship future and removed stacks-future labels Apr 3, 2023

igorsyl added this to Stacks Nakamoto consensus Jun 8, 2023

obycode added the icebox Issues that are not being worked on label Jun 26, 2023

will-corcoran added this to Stacks Core Eng Aug 4, 2023

github-project-automation bot moved this to 🆕 New in Stacks Core Eng Aug 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[VM] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation #2926

[VM] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation #2926

LNow commented Nov 13, 2021

jcnelson commented Nov 15, 2021

LNow commented Nov 15, 2021

jcnelson commented Nov 15, 2021

cylewitruk commented Nov 14, 2022 •

edited

Loading

jcnelson commented Nov 18, 2022

cylewitruk commented Nov 21, 2022 •

edited

Loading

jcnelson commented Jan 19, 2023

jcnelson commented Feb 22, 2023

[VM] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation #2926

[VM] Compress contracts before storing them and decompress on load, and bill the user only for the number of bytes in the compressed representation #2926

Comments

LNow commented Nov 13, 2021

jcnelson commented Nov 15, 2021

LNow commented Nov 15, 2021

jcnelson commented Nov 15, 2021

cylewitruk commented Nov 14, 2022 • edited Loading

jcnelson commented Nov 18, 2022

cylewitruk commented Nov 21, 2022 • edited Loading

jcnelson commented Jan 19, 2023

jcnelson commented Feb 22, 2023

cylewitruk commented Nov 14, 2022 •

edited

Loading

cylewitruk commented Nov 21, 2022 •

edited

Loading