-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(db): fix number of files in db, startup hang, ram issues and flushing issues #379
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm if you could just give some comments on the current used parameters
options.set_max_subcompactions(cores as _); | ||
|
||
options.set_max_log_file_size(10 * MiB); | ||
options.set_max_open_files(2048); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a comment describing each of those numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or at least a doc for the function explaining how it addresses the issue this PR is solving.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
options.set_compression_type(DBCompressionType::Zstd); | ||
match self { | ||
Column::BlockNToBlockInfo | Column::BlockNToBlockInner => { | ||
options.optimize_universal_style_compaction(1 * GiB); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here why 1GB?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i can't access to the column sizes metrics as of now, so I eyeballed it
we can tune all of this later, the basic idea is that this arg is the RAM budget of the column - i like to think that a normal machine should run madara in 4Go RAM max, and base these tuning params around that
I have not run any test to see if these numbers are optimal they just looked okay to me.
options.set_max_log_file_size(10 * MiB); | ||
options.set_max_open_files(2048); | ||
options.set_keep_log_file_num(3); | ||
options.set_log_level(rocksdb::LogLevel::Warn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why warn and not info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a change, it's marked as a change because git bad
info is the default, and it will print column stats every so often, making huuuge log files with absolutely useless information
see #345 where it was introduced
options.set_max_subcompactions(cores as _); | ||
|
||
options.set_max_log_file_size(10 * MiB); | ||
options.set_max_open_files(2048); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, or at least a doc for the function explaining how it addresses the issue this PR is solving.
Pull Request type
What is the current behavior?
Pragma devnet has >1M sst files. This is because compaction couldn't keep up with the atomic flushes we do after every block. This resulted in db startup taking an absurd amount of time (sometimes tens of minutes, scanning most of those >1M files). In addition, the flushes were also taking a lot of time (~10s!).
What is the new behavior?
I could replicate the issue by copying the pragma db locally, but I could also replicate the issue by syncing a node with a modified version of block-import that force-flushes every block (this mimicks the behavior of the sequencer which does the same) and watching the number of files in the db. This was much easier.
The tiered compaction is not adapted for our needs, so I switched the db to universal compaction. I managed to sync 100k blocks with only a hundred sst files instead of around 30 thousand files in the db with this compaction mode. Hopefully, this should fixes all of these issues + the RAM taken by rocksdb should now be bounded.
Does this introduce a breaking change?
Theorically, no - but i think you should upgrade your sequencer db by syncing a full-node on it, and replacing the sequencer's db with the full-node's to benefit of all of the new db options. Using an old db will work, but it won't try to update the old sst files very much I think.