-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bus error on deployment of an action that passes pslse #882
Comments
Hi @diamantopoulos |
Would you carefully check if your AFU has accessed a memory address, which exceeds the memory buffer that has been allocated in software? It's also possible to be caused by alignment. Check where did you do malloc(), has it been aligned to cacheline (128B), or even pagesize? |
This is because there is a threshold to limit how many times of PCIe faults are allowed. |
Thanks @luyong6 for the hints. It shouldn't be any of the two cases above:
Indeed, the number in @bmesnet thanks for the hints - that's nice to know its a rare but known issue (so it is not necessarily an issue with our witherspoon setup)
No, it happens in several images in which I combine different architectural parameters, e.g. unrolling/pipelining depth, memory prefetching etc. However, the parameters do not affect the I/O protocol of the action to capi and vice-versa (apart from the size in/out). My only concern is that I have a dataflow architecture which reads from AXI as long as the accelerator consumes data and maybe the fifo depth is not enough to store the burst reads. However, this is why I always test with pslse to verify the functionality. I'm in the process of testing different combinations to understand when the problem arises.
I'm using 4defea2 . I'll test with the latest and report. |
Dear team,
I'm facing the following problem: I'm developing an action (GEMM, not an action of the examples) that passes rtl simulation with pslse but when I deploy the action on the 9V3 card I'm getting a bus error. The output of dmesg is appended below.
Some images have been successfully tested on the card but some fail with this "bus error" output. While I'm experimenting and debugging, I've added this issue here, in case there is a "known" way to debug more. Since pslse is OK, it's hard to debug on the card.
Please note, that when I get a "bus error", the card usually switches to the factory image and the system is not affected. But sometimes, it causes the system to reboot. (all images are within -200psWNS).
On the terminal:
dmesg output:
The text was updated successfully, but these errors were encountered: