Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove extraneous and broken self-imposed 20ms sleep (nominally-50/s-rate limit) for each EFI variable read #258

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nabijaczleweli
Copy link
Contributor

@nabijaczleweli nabijaczleweli commented Nov 21, 2023

Nominally this rate limit is defined to avoid... getting rate-limited?

But it already severely limits the rate to unusable ‒ on two of my real systems this makes efibootmgr take 168ms/194ms, which accounts for 95%/82% of the run-time (and this is under strace, so it's 100% of the run-time) ‒ for klapki 0.2, this accounts for 36% and a large (140ms!) start-up delay, and for klapki 0.3 it's well over 50%. Well before you'd ever run afoul of the real limit.

Discounting "20ms" as "The user is not going to notice." is baffling. efibootmgr is infuriatingly slow. 20ms is ping-to-america level.

Worse yet, the entire kernel rate-limiter amounts to fs/efivarfs/file.c

  static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
                  size_t count, loff_t *ppos)
  {
          struct efivar_entry *var = file->private_data;
          unsigned long datasize = 0;
          u32 attributes;
          void *data;
          ssize_t size = 0;
          int err;

          while (!__ratelimit(&file->f_cred->user->ratelimit))
                  msleep(50);

this is the alloc_uid() ratelimit with 1s interval + 100 burst.

This means that we can (best-case) read 50 variables (read(...), read(0)) instantly, then do so again the next second.

Best-case because the current implementation is broken anyway: it sleeps for 10ms after the attribute read (sure), and then for 10ms after the two reads to read the rest of the variable (bad).

This limits libefivar to 33⅓ variables per second.

Most systems have roughly this many variables. Most programs only care about a very thin subset of them, and scarcely come close to reading enough to run afoul of the kernel limit. But even if they did, this limit is significantly harsher than the kernel limit ‒ it doesn't increase it (how could it? the limit's already there!), but severely increases latency for every single read, instead of just those over the rate.
It's strictly worse than just not doing it.

This was confirmed experimentally with strace -TTTT /bin/wc * * * * * (note the many every-variable expansions so it's noticeable, this system has fewer than 50 variables): there is both visually a very obvious "big burst, little slowdown" oscillation, but also (non-efivarfs reads filtered out)

$ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -6 | uniq -c | sort -n
      1 0.8998
      1 0.9015
      1 0.9581
      1 0.9585
      1 0.9586
      5 0.0013
      9 0.0005
      9 0.0012
     46 0.0011
     70 0.0010
     84 0.0008
     85 0.0009
    102 0.0006
    115 0.0007

or indeed

$ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -5 | uniq -c | sort -n
      1 0.899
      1 0.901
      3 0.958
    130 0.001
    395 0.000

(130+395)/2=262½ variables read in under a millisecond, and 4½ got limited.

But, much more importantly, the first screenful was free: 99% of programs that don't read every variable over and over and over, but fit well within the 33 (klapki's 7 and efibootmgr's 8, this is with the firmware's base boot entries + two additional ones; there isn't a non-hypothetical system in existence with 25 more boot entries).

Fixes: https://bugs.debian.org/1056344:

$ time efibootmgr
BootCurrent: 0006
Timeout: 1 seconds
BootOrder: 0006,0005,0002,0003,0004
Boot0002* UEFI:CD/DVD Drive     BBS(129,,0x0)
Boot0003* UEFI:Removable Device BBS(130,,0x0)
Boot0004* UEFI:Network Device   BBS(131,,0x0)
Boot0005* Debian GNU/Linux trixie/sid with Linux 6.4.0-1-amd64  HD(1,GPT,2823eeb4-962f-cb4d-86ea-c0a4e460bc9c,0x800,0xfa000)/File(\KLAPKI\A2B398A10F5F4AF99258999E14093599\6.4.0-1-AMD64\VMLINUZ-6.4.0-1-AMD64)69006e0069007400720064003d005c006b006c00610070006b0069005c00610032006200330039003800610031003000660035006600340061006600390039003200350038003900390039006500310034003000390033003500390039005c0036002e0034002e0030002d0031002d0061006d006400360034005c0069006e0069007400720064002e0069006d0067002d0036002e0034002e0030002d0031002d0061006d00640036003400200072006f006f0074003d007a00660073003a004100550054004f00200072006500730075006d0065003d0050004100520054004c004100420045004c003d0072006f007a006200690061006e002d007300770061007000200071007500690065007400200069006f006d006d0075003d006f006e00
Boot0006* Debian GNU/Linux trixie/sid with Linux 6.4.0-1-amd64  HD(1,GPT,2823eeb4-962f-cb4d-86ea-c0a4e460bc9c,0x800,0xfa000)/File(\KLAPKI\A2B398A10F5F4AF99258999E14093599\6.4.0-1-AMD64\VMLINUZ-6.4.0-1-AMD64)69006e0069007400720064003d005c006b006c00610070006b0069005c00610032006200330039003800610031003000660035006600340061006600390039003200350038003900390039006500310034003000390033003500390039005c0036002e0034002e0030002d0031002d0061006d006400360034005c0069006e0069007400720064002e0069006d0067002d0036002e0034002e0030002d0031002d0061006d00640036003400200072006f006f0074003d007a00660073003a004100550054004f00200072006500730075006d0065003d0050004100520054004c004100420045004c003d0072006f007a006200690061006e002d007300770061007000200071007500690065007400200069006f006d006d0075003d006f006e00

real    0m0.002s
user    0m0.001s
sys     0m0.000s

and


openat(AT_FDCWD, "/sys/firmware/efi/efivars/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
newfstatat(4, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0
fcntl(4, F_GETFD)                       = 0x1 (flags FD_CLOEXEC)
fcntl(4, F_SETFD, FD_CLOEXEC)           = 0
getdents64(4, 0x560b5530a420 /* 104 entries */, 32768) = 7568
getdents64(4, 0x560b5530a420 /* 0 entries */, 32768) = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0002-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\1\0\0\0\r\0U\0E\0F\0I\0:\0C\0D\0/\0D\0V\0D\0 \0D\0"..., 4096) = 55
read(4, "", 4041)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0003-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\1\0\0\0\r\0U\0E\0F\0I\0:\0R\0e\0m\0o\0v\0a\0b\0l\0"..., 4096) = 63
read(4, "", 4033)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0004-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\1\0\0\0\r\0U\0E\0F\0I\0:\0N\0e\0t\0w\0o\0r\0k\0 \0"..., 4096) = 59
read(4, "", 4037)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0005-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\7\0\0\0\314\0D\0e\0b\0i\0a\0n\0 \0G\0N\0U\0/\0L\0i\0"..., 4096) = 606
read(4, "", 3490)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Boot0006-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\7\0\0\0\314\0D\0e\0b\0i\0a\0n\0 \0G\0N\0U\0/\0L\0i\0"..., 4096) = 606
read(4, "", 3490)                       = 0
close(4)                                = 0
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootNext-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootCurrent-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\6\0\0\0", 4)                  = 4
read(4, "\6\0", 4096)                   = 2
read(4, "", 4094)                       = 0
close(4)                                = 0
newfstatat(1, "", {st_mode=S_IFCHR|0620, st_rdev=makedev(0x88, 0), ...}, AT_EMPTY_PATH) = 0
write(1, "BootCurrent: 0006\n", 18BootCurrent: 0006
)     = 18
openat(AT_FDCWD, "/sys/firmware/efi/efivars/Timeout-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\1\0", 4096)                   = 2
read(4, "", 4094)                       = 0
close(4)                                = 0
write(1, "Timeout: 1 seconds\n", 19Timeout: 1 seconds
)    = 19
openat(AT_FDCWD, "/sys/firmware/efi/efivars/BootOrder-8be4df61-93ca-11d2-aa0d-00e098032b8c", O_RDONLY) = 4
read(4, "\7\0\0\0", 4)                  = 4
read(4, "\6\0\5\0\2\0\3\0\4\0", 4096)   = 10
read(4, "", 4086)                       = 0
close(4)                                = 0
write(1, "BootOrder: 0006,0005,0002,0003,0"..., 36BootOrder: 0006,0005,0002,0003,0004
) = 36
write(1, "Boot0002* UEFI:CD/DVD Drive\tBBS("..., 42Boot0002* UEFI:CD/DVD Drive  BBS(129,,0x0)
) = 42

nabijaczleweli added a commit to nabijaczleweli/efivar that referenced this pull request Nov 21, 2023
Read the whole variable at once, then copy out the attributes
(for a description of efivarfs rate limiting, see rhboot#258)
…rate limit) for each EFI variable read

Nominally this rate limit is defined to avoid... getting rate-limited?

But it already severely limits the rate to unusable
‒ on two of my real systems this makes efibootmgr take 168ms/194ms,
  which accounts for 95%/82% of the run-time
  (and this is under strace, so it's 100% of the run-time) ‒
for klapki 0.2, this accounts for 36% and a large (140ms!)
start-up delay, and for klapki 0.3 it's well over 50%.
Well before you'd ever run afoul of the real limit.

Discounting "20ms" as "The user is not going to notice." is baffling.
efibootmgr is infuriatingly slow. 20ms is ping-to-america level.

Worse yet, the entire kernel rate-limiter amounts to fs/efivarfs/file.c
-- >8 --
  static ssize_t efivarfs_file_read(struct file *file, char __user *userbuf,
                  size_t count, loff_t *ppos)
  {
          struct efivar_entry *var = file->private_data;
          unsigned long datasize = 0;
          u32 attributes;
          void *data;
          ssize_t size = 0;
          int err;

          while (!__ratelimit(&file->f_cred->user->ratelimit))
                  msleep(50);
-- >8 --
this is the alloc_uid() ratelimit with 1s interval + 100 burst.

This means that we can (best-case) read 50 variables
(read(...), read(0)) instantly, then do so again the next second.

Best-case because the current implementation is broken anyway:
it sleeps for 10ms after the attribute read (sure),
and then for 10ms after the /two/ reads to read the rest of the
variable (bad).

This limits libefivar to 33⅓ variables per second.

Most systems have roughly this many variables. Most programs only
care about a very thin subset of them, and scarcely come close to
reading enough to run afoul of the kernel limit. But even if they
did, this limit is /significantly harsher/ than the kernel limit ‒
it doesn't increase it (how could it? the limit's already there!),
but severely increases latency for /every single read/, instead of
just those over the rate.
It's strictly worse than just not doing it.

This was confirmed experimentally with strace -TTTT /bin/wc * * * * *
(note the many every-variable expansions so it's noticeable):
there is both visually a very obvious "big burst, little slowdown"
oscillation, but also (non-efivarfs reads filtered out)
  $ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -6 | uniq -c | sort -n
        1 0.8998
        1 0.9015
        1 0.9581
        1 0.9585
        1 0.9586
        5 0.0013
        9 0.0005
        9 0.0012
       46 0.0011
       70 0.0010
       84 0.0008
       85 0.0009
      102 0.0006
      115 0.0007
or indeed
 $ awk '/^read/ {print $NF}' ss | tr -d '<>' | sort -n | cut -c -5 | uniq -c | sort -n
       1 0.899
       1 0.901
       3 0.958
     130 0.001
     395 0.000
(130+395)/2=262½ variables read in under a millisecond,
and 4½ got limited.

But, much more importantly, the first screenful was free:
99% of programs that don't read every variable
over and over and over, but fit well within the 33
(klapki's 7 and efibootmgr's 8,
 this is with the firmware's base boot entries + two additional ones;
 there isn't a non-hypothetical system in existence with 25 more boot entries).

Fixes: https://bugs.debian.org/1056344
Signed-off-by: Ahelenia Ziemiańska <[email protected]>
nabijaczleweli added a commit to nabijaczleweli/efivar that referenced this pull request Nov 21, 2023
Read the whole variable at once, then copy out the attributes
(for a description of efivarfs rate limiting, see rhboot#258)

Signed-off-by: Ahelenia Ziemiańska <[email protected]>
@nabijaczleweli
Copy link
Contributor Author

DCO check clearly broken, the commit has a DCO signoff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant