-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
seperate plot 4 phases to phase1 and phase234 #266
base: main
Are you sure you want to change the base?
Conversation
a few high-level comments:
|
Thanks! We'll improve the changes later... BTW, why not use multiple merge sort, but bucket sort on disk??? |
@newtalentxp newtalentxp The data being sorted is usually uniformly distributed, so the bucket sort performs better at the cost of higher memory. It is O(n) instead of O(n logn). The quicksort_last sort strategy is used to sort the buckets that are not uniformly distributed. A merge sort would probably perform better there. I use std::sort for my own plotting, which in my libstdc++ does an introsort. |
IMO the it would be better to first add checkpoints which allow phases to be resumed from start. Then you can run the processes on separate machines by transferring the checkpoint data from machine to machine (or just storing it on a shared location in the first place). This is pretty easy to do at the beginning and end of each phase. |
'This PR has been flagged as stale due to no activity for over 60 |
create plot_disk_pipeline.hpp from plot_disk.hpp.
separate phases to phase1 and phase234, in order to fully resource usage in kubernetes.
because resource limit to phase1 and phase234 are not the same.
add -h, phase flag to cli.hpp.
add corresponding python bindings to plot_disk_pipeline.hpp, which has two functions "create_plot_disk_phase1" and "create_plot_disk_phase234"