Skip to content

Low queue depth and utilization time for L2ARC device #17467

Open
@shodanshok

Description

@shodanshok

System information

Type Version/Name
Distribution Name Rocky Linux 9
Distribution Version 9.6
Kernel Version 5.14.0-570.21.1.el9_6.x86_64
Architecture x86_64
OpenZFS Version 2.2.8-1

Describe the problem you're observing

L2ARC device shows low queue depth (aqu-sz) and utilization time (%util) even when serving almost all ARC misses. As flash devices IOPs scales with queue depth, current L2ARC performance seems lower than ideal.

For a practical example, I have a backup server with primarycache=all and secondarycache=metadata where L2ARC shows quite spectacular hit rate:

# arcstat -f time,read,miss,miss%,dmis,dm%,pmis,pm%,mmis,mm%,size,c,avail,l2read,l2hits,l2miss,l2hit% 1
    time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%   size      c  avail  l2read  l2hits  l2miss  l2hit%
20:10:06   85K  1.6K      3   956    2   673   12  1.6K    3    15G    14G   -64M    1.6K    1.6K       0     100

At the same time, a corresponding iostat shows the following (sda is the L2ARC device, an M.2 SATA drive):

# iostat -x -k 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.26    0.00   10.94    2.01    0.00   85.79

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
sda           1563.00   6364.00     0.00   0.00    0.10     4.07    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.16  12.10

Note how low are sda queue size and %util

Describe how to reproduce the problem

A workload exceeding ARC but not L2ARC and a warm secondary cache is needed to reproduce the issue. Something as the following seems to work:

# create a VM with 8G ram and a 16G l2arc device

# increase l2arc params
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch
echo $((128*1024*1024)) > /sys/module/zfs/parameters/l2arc_write_max
echo $((128*1024*1024)) > /sys/module/zfs/parameters/l2arc_write_boost

# create 1M files
zfs create tank/fsmark -o compression=lz4 -o xattr=on -o secondarycache=metadata
fs_mark -k -s0 -S0 -D10 -N1000 -n 1000000 -d /tank/fsmark/fsmark/

# put pressure to evict metadata from arc to l2arc
for i in `seq 1 5`; do dd if=/dev/urandom of=/tank/fsmark/random.img bs=1M count=1024; time du -hs /tank/fsmark/fsmark/*; done

# during the loop, on another terminal, run arcstat and iostat to observe that even when l2arc hit rate is 100%, aqu-sz and %util are low.

Include any warning/errors/backtraces from the system logs

None.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions