Skip to content

Conversation

@wilkb777
Copy link
Member

@wilkb777 wilkb777 commented Jan 6, 2026

The following changes were made to the recommended startup commands startup script:

  1. Removal of out-dated module loading at the start of the script
  2. Some minor formatting clean up
  3. Addition of 2 functions to aid with recalling files to GPFS after the switch to Cheaha's storage model

Reviewer should review the testing of the 2 new functions below.
recall_dir function testing:
image
recall_git function testing:
image

New functions can be tested by checking out the startup-scripts branch on Cheaha and sourcing the startup_scripts/.recommended_startup_commands file and then running the commands.

@wilkb777 wilkb777 requested a review from ManavalanG January 6, 2026 15:18
@wilkb777 wilkb777 self-assigned this Jan 6, 2026
@wilkb777 wilkb777 requested a review from JmScherer January 6, 2026 15:18
@JmScherer
Copy link

JmScherer commented Jan 6, 2026

recall_dir

Screenshot 2026-01-06 at 12 55 42 PM

recall_git

Screenshot 2026-01-06 at 12 56 22 PM

Wasn't sure if the functions were doing anything when called, checked to see if appropriate modules loaded functions and they appear to be:

Screenshot 2026-01-06 at 12 58 37 PM

Copy link
Member

@ManavalanG ManavalanG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go :)

Copy link

@JmScherer JmScherer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to merge!

@wilkb777
Copy link
Member Author

wilkb777 commented Jan 7, 2026

@JmScherer your comments about not knowing if the command ran correctly and if there was a way to figure out what files really needed to be recalled got me thinking and I remembered that du lists a block use size of zero for files that are on CEPH. I leveraged that idea along with the --apparent-size flag for differentiating truly empty files (eg. .gitkeep) from ones on CEPH that need to be recalled. This made things a little more complex in BASH but the result is that I centralized recall functionality into a single function and refactored the recall_git and recall_dir functions to call that centralized function.

@JmScherer @ManavalanG I made a lot of code changes and pushed to this branch after your approval, do you want to give the update a quick review before I merge?

@JmScherer
Copy link

Re-ran the functions for recalling directory and git based on the function updates:

Screenshot 2026-01-07 at 3 20 20 PM

Copy link

@JmScherer JmScherer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to merge!

Copy link
Member

@ManavalanG ManavalanG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recall_git ran quick in my tests, but recall_dir appears to be really slow with directory I tested (/data/project/worthey_lab/projects/experimental_pipelines/mana/small_tasks/mecfs_hs_lc).

I added a few echo statements, and it appears for loop designed to remove true empty files from $NO_SIZE_FILES is really slow. I wonder if comm command would get the job done faster instead of for loop.

Update:

I tested by replacing the for loop with comm, and it appears to run quite fast. I didn't fully benchmark it but comm command ran in about 6s, while I suspect for loop would run for several minutes as the number of non-intersecting lines were 91504 in that dir. Here is the command I used:

comm -23 <(printf "%s\n" "${NO_SIZE_FILES[@]}" | sort) <(printf "%s\n" "${TRUE_EMPTY_FILES[@]}" | sort ) | wc -l

PS - I turned off the actual recall files part when testing the above.

@wilkb777
Copy link
Member Author

@ManavalanG I wasn't aware of that tool, nice find! This is much faster now and works brilliantly. I've updated the function to use comm now if you want to check it again.

@ManavalanG
Copy link
Member

@wilkb777 PR is good to be merged :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants