Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

seperate plot 4 phases to phase1 and phase234 #266

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

EchoAGI
Copy link

@EchoAGI EchoAGI commented May 26, 2021

create plot_disk_pipeline.hpp from plot_disk.hpp.
separate phases to phase1 and phase234, in order to fully resource usage in kubernetes.
because resource limit to phase1 and phase234 are not the same.

add -h, phase flag to cli.hpp.

add corresponding python bindings to plot_disk_pipeline.hpp, which has two functions "create_plot_disk_phase1" and "create_plot_disk_phase234"

@EchoAGI EchoAGI changed the title seperate plot phase4 to phase1 and phase234 seperate plot 4 phases to phase1 and phase234 May 26, 2021
@arvidn
Copy link
Contributor

arvidn commented May 27, 2021

a few high-level comments:

  • It would be much easier to review if you would not make formatting changes in this PR, but propose those separately
  • I think we would prefer comments in english
  • I think the justification for this change is a bit light. Could you elaborate on "in order to fully resource usage in kubernetes.
    because resource limit to phase1 and phase234 are not the same."?

@EchoAGI
Copy link
Author

EchoAGI commented May 27, 2021

a few high-level comments:

  • It would be much easier to review if you would not make formatting changes in this PR, but propose those separately
  • I think we would prefer comments in english
  • I think the justification for this change is a bit light. Could you elaborate on "in order to fully resource usage in kubernetes.
    because resource limit to phase1 and phase234 are not the same."?

Thanks! We'll improve the changes later... BTW, why not use multiple merge sort, but bucket sort on disk???

@mgraczyk
Copy link
Contributor

mgraczyk commented Jun 11, 2021

@newtalentxp newtalentxp The data being sorted is usually uniformly distributed, so the bucket sort performs better at the cost of higher memory. It is O(n) instead of O(n logn).

The quicksort_last sort strategy is used to sort the buckets that are not uniformly distributed. A merge sort would probably perform better there. I use std::sort for my own plotting, which in my libstdc++ does an introsort.

@mgraczyk
Copy link
Contributor

IMO the it would be better to first add checkpoints which allow phases to be resumed from start. Then you can run the processes on separate machines by transferring the checkpoint data from machine to machine (or just storing it on a shared location in the first place).

This is pretty easy to do at the beginning and end of each phase.

@github-actions
Copy link

'This PR has been flagged as stale due to no activity for over 60
days. It will not be automatically closed, but it has been given
a stale-pr label and should be manually reviewed.'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants