Open
Description
System information
Type | Version/Name |
---|---|
Distribution Name | Rocky Linux 9 |
Distribution Version | 9.6 |
Kernel Version | 5.14.0-570.21.1.el9_6.x86_64 |
Architecture | x86_64 |
OpenZFS Version | 2.2.8-1 |
Describe the problem you're observing
L2ARC device shows low queue depth (aqu-sz
) and utilization time (%util
) even when serving almost all ARC misses. As flash devices IOPs scales with queue depth, current L2ARC performance seems lower than ideal.
For a practical example, I have a backup server with primarycache=all
and secondarycache=metadata
where L2ARC shows quite spectacular hit rate:
# arcstat -f time,read,miss,miss%,dmis,dm%,pmis,pm%,mmis,mm%,size,c,avail,l2read,l2hits,l2miss,l2hit% 1
time read miss miss% dmis dm% pmis pm% mmis mm% size c avail l2read l2hits l2miss l2hit%
20:10:06 85K 1.6K 3 956 2 673 12 1.6K 3 15G 14G -64M 1.6K 1.6K 0 100
At the same time, a corresponding iostat
shows the following (sda
is the L2ARC device, an M.2 SATA drive):
# iostat -x -k 1
avg-cpu: %user %nice %system %iowait %steal %idle
1.26 0.00 10.94 2.01 0.00 85.79
Device r/s rkB/s rrqm/s %rrqm r_await rareq-sz w/s wkB/s wrqm/s %wrqm w_await wareq-sz d/s dkB/s drqm/s %drqm d_await dareq-sz f/s f_await aqu-sz %util
sda 1563.00 6364.00 0.00 0.00 0.10 4.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.16 12.10
Note how low are sda
queue size and %util
Describe how to reproduce the problem
A workload exceeding ARC but not L2ARC and a warm secondary cache is needed to reproduce the issue. Something as the following seems to work:
# create a VM with 8G ram and a 16G l2arc device
# increase l2arc params
echo 0 > /sys/module/zfs/parameters/l2arc_noprefetch
echo $((128*1024*1024)) > /sys/module/zfs/parameters/l2arc_write_max
echo $((128*1024*1024)) > /sys/module/zfs/parameters/l2arc_write_boost
# create 1M files
zfs create tank/fsmark -o compression=lz4 -o xattr=on -o secondarycache=metadata
fs_mark -k -s0 -S0 -D10 -N1000 -n 1000000 -d /tank/fsmark/fsmark/
# put pressure to evict metadata from arc to l2arc
for i in `seq 1 5`; do dd if=/dev/urandom of=/tank/fsmark/random.img bs=1M count=1024; time du -hs /tank/fsmark/fsmark/*; done
# during the loop, on another terminal, run arcstat and iostat to observe that even when l2arc hit rate is 100%, aqu-sz and %util are low.
Include any warning/errors/backtraces from the system logs
None.