-
Notifications
You must be signed in to change notification settings - Fork 0
Recommended startup script clean up and update #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ManavalanG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go :)
JmScherer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to merge!
|
@JmScherer your comments about not knowing if the command ran correctly and if there was a way to figure out what files really needed to be recalled got me thinking and I remembered that @JmScherer @ManavalanG I made a lot of code changes and pushed to this branch after your approval, do you want to give the update a quick review before I merge? |
JmScherer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to merge!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
recall_git ran quick in my tests, but recall_dir appears to be really slow with directory I tested (/data/project/worthey_lab/projects/experimental_pipelines/mana/small_tasks/mecfs_hs_lc).
I added a few echo statements, and it appears for loop designed to remove true empty files from $NO_SIZE_FILES is really slow. I wonder if comm command would get the job done faster instead of for loop.
Update:
I tested by replacing the for loop with comm, and it appears to run quite fast. I didn't fully benchmark it but comm command ran in about 6s, while I suspect for loop would run for several minutes as the number of non-intersecting lines were 91504 in that dir. Here is the command I used:
comm -23 <(printf "%s\n" "${NO_SIZE_FILES[@]}" | sort) <(printf "%s\n" "${TRUE_EMPTY_FILES[@]}" | sort ) | wc -lPS - I turned off the actual recall files part when testing the above.
|
@ManavalanG I wasn't aware of that tool, nice find! This is much faster now and works brilliantly. I've updated the function to use |
|
@wilkb777 PR is good to be merged :) |



The following changes were made to the
recommended startup commandsstartup script:Reviewer should review the testing of the 2 new functions below.


recall_dirfunction testing:recall_gitfunction testing:New functions can be tested by checking out the
startup-scriptsbranch on Cheaha and sourcing thestartup_scripts/.recommended_startup_commandsfile and then running the commands.