Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sort tempfile logic opens up extra files #6945

Open
karlmcdowall opened this issue Dec 9, 2024 · 4 comments
Open

Sort tempfile logic opens up extra files #6945

karlmcdowall opened this issue Dec 9, 2024 · 4 comments
Labels

Comments

@karlmcdowall
Copy link
Contributor

As part of investigating this issue, found that we're using extra file descriptors to mange our tempfile logic for batched-sort operations. This is a problem because some of the GNU compatibility tests run with ulimit enforcing strict limits on the number of files that sort can open.

Looking at the open files when starting a batched sort, I see...

$ ls -l /proc/1778302/fd
total 0
lrwx------ 1 karl karl 64 Dec  8 19:43 0 -> /dev/pts/3
l-wx------ 1 karl karl 64 Dec  8 19:43 1 -> /home/karl/Rust/gnu/tests/sort/sort-merge-fdlimit.log
l-wx------ 1 karl karl 64 Dec  8 19:43 2 -> /home/karl/Rust/gnu/gt-sort-merge-fdlimit.sh.A1OJ/err/merge-random-err
lr-x------ 1 karl karl 64 Dec  8 19:43 3 -> 'pipe:[3021894]'
l-wx------ 1 karl karl 64 Dec  8 19:43 4 -> 'pipe:[3021894]'

It's the last two of these that are problematic. These two pipes are created when we set up the Tempfile logic...

    fn init_tmp_dir(&mut self) -> UResult<()> {
<snip>
        let path = self.temp_dir.as_ref().unwrap().path().to_owned();
        let lock = self.lock.clone();
// *** The line below will open the two pipes.
        ctrlc::set_handler(move || {
            // Take the lock so that `next_file_path` returns no new file path,
            // and the program doesn't terminate before the handler has finished
            let _lock = lock.lock().unwrap();
            if let Err(e) = remove_tmp_dir(&path) {
                show_error!("failed to delete temporary directory: {}", e);
            }
            std::process::exit(2)
        })
        .map_err(|e| USimpleError::new(2, format!("failed to set up signal handler: {e}")))
    }

To fix this, I propose to just simplify this logic and just remove the Tempdir altogether. Just put any temporary files directly into /tmp (which is what GNU-sort does) and have the OS do the cleanup automatically rather than create a dedicated folder for them. We can then remove this code above that uses the ctrlc crate.

I don't know the history on this, so maybe there's a good reason to use a separate folder - If so, please let me know!

@sylvestre
Copy link
Contributor

I don't think we have good reason for this. If we do, we should have a test covering it. :)

@sylvestre
Copy link
Contributor

So, don't hesitate to remove it

@karlmcdowall
Copy link
Contributor Author

Did a bit of thinking about this...

  • My initial proposal to remove the Tempdir folder and just use files directly in /tmp won't solve the problem. We would still need a way to manually track all the temporary files we create so we can clean them up in the event of a CTRL+C, and it'll just be harder to do if we don't put them all in one directory. So, let's keep the Tempdir!
  • Considered switching from using the ctrlc crate to signal-hook. This doesn't help at all though! ctrlc uses 2 file descriptors for pipes (this is the problem, we don't want to use any more descriptors than we have to). Signal-hook still uses 2 file descriptors, it just uses sockets rather than pipes. So switching to signal-Hook wouldn't help at all alas.
  • Write our own VERY lightweight signal handler, directly using the nix::sys::signal API. I think this is unfortunately the way we'll have to go.

If anyone has other ideas please let me know.
Karl

@karlmcdowall
Copy link
Contributor Author

Started a thread with the rust-ctrlc component here with a proposal that would remove the file-descriptor overhead.
With that change (and some other work-in-progress changes I have locally) I can get the sort-merge-fdlimit and sort-continue tests to pass.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants