Skip to content

Automatic restart of MCP2518FD canbus causes oops #6916

Open
@rmerrill-fw

Description

@rmerrill-fw

Describe the bug

Setting restart-ms to a non zero value (100ms is the one I've been using) results in an occasional BUG about echo SKBs, and slightly less commonly an OOPS

Steps to reproduce the behaviour

  1. Set up a raspberry Pi 4 with the Waveshare 2-channel CAN FD hat SKU 17075 (the one with the MCP2518FD). I installed the latest Raspberry Pi OS Lite 64 and did an apt update and apt upgrade.
  2. Connect one of the two CAN buses to something that will acknowledge your frames (I just connected both of them together but I have verified this works with a PCAN-USB as well).
  3. install this udev rules file:
KERNELS=="spi0.0", SUBSYSTEMS=="spi", DRIVERS=="mcp251xfd", ACTION=="add|bind|change", NAME="canbus0", ATTR{tx_queue_len}="1024"
KERNELS=="spi0.1", SUBSYSTEMS=="spi", DRIVERS=="mcp251xfd", ACTION=="add|bind|change", NAME="canbus1", ATTR{tx_queue_len}="1024"
KERNELS=="spi1.0", SUBSYSTEMS=="spi", DRIVERS=="mcp251xfd", ACTION=="add|bind|change", NAME="canbus2", ATTR{tx_queue_len}="1024"
KERNELS=="spi1.1", SUBSYSTEMS=="spi", DRIVERS=="mcp251xfd", ACTION=="add|bind|change", NAME="canbus3", ATTR{tx_queue_len}="1024"
  1. Disable network manager, enable systemd-networkd, and add this .network file to /etc/systemd/network:
[Match]
Name=canbus*

[CAN]
BitRate=500K
BusErrorReporting=yes
RestartSec=500ms

[Link]
RequiredForOnline=no
  1. Add these lines to config.txt:
dtparam=spi=on
dtoverlay=spi1-3cs
dtoverlay=mcp251xfd,spi0-0,interrupt=25
dtoverlay=mcp251xfd,spi1-0,interrupt=24
  1. install can-utils
  2. reboot.
  3. verify that canbus0 and canbus2 are up (if your jumpers are set differently, you might have 0 and 1 instead).
  4. Assuming canbus0 has a partner that will acknowledge, start cangen -I 123 -g 1 canbus0 (it also works on canbus2 if it has a partner). Verify that traffic is flowing via counters or candump.
  5. Briefly short the CAN pair together with a wire, screwdriver, etc, while watching journalctl -f.
  6. Repeat step 10 until you see an OOPS. It should not take many attempts (usually takes less than 30 seconds)

Device (s)

Raspberry Pi 4 Mod. B

System

raspinfo.txt

Logs

Included in raspinfo

Additional context

This is following up from an email conversation between myself and @marckleinebudde . I thought it best to move it somewhere we can track it rather than continue sending many emails.

I'm reporting this in raspberrypi/linux because I've reproduced it on multiple distros but don't have another SPI master to try it with to see if it's not Pi-specific.

I have reproduced this issue with 6.6 kernels and 6.12 kernels, on both yocto and raspberry pi OS, with the waveshare hat as well as our custom board, with CM4 and Pi 4 model B, etc.

I tried cherry-picking some later commits into the 6.12 kernel and it doesn't really help.

Marc suggested this patch which did not seem to cure the issue:

--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -759,6 +759,9 @@ static void mcp251xfd_chip_stop(struct mcp251xfd_priv *priv,
{
priv->can.state = state;

+ hrtimer_cancel(&priv->rx_irq_timer);
+ hrtimer_cancel(&priv->tx_irq_timer);
+ cancel_work_sync(&priv->tx_work);
mcp251xfd_chip_interrupts_disable(priv);
mcp251xfd_chip_rx_int_disable(priv);
mcp251xfd_timestamp_stop(priv);

He also suggested this patch which I have yet to try but will try as soon as I submit this (been spending time verifying that it's not particular to our distro or hardware):

--- a/drivers/net/can/dev/dev.c
+++ b/drivers/net/can/dev/dev.c
@@ -185,7 +185,9 @@ static void can_restart_work(struct work_struct *work)
struct can_priv *priv = container_of(dwork, struct can_priv,
restart_work);

+ netif_tx_lock(priv->dev);
can_restart(priv->dev);
+ netif_tx_unlock(priv->dev);
}

int can_restart_now(struct net_device *dev)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions