You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using latest m0s1 static library from master branch of bl_iot_sdk repository and RTL8761BU USB Bluetooth adapter under Linux. I've tried std static library and it behaves exactly in the same way. LE MTU is 247 bytes. My Linux test software sends data stream to device (peripheral) at about 8 kB/s speed using GATT write without response command and at the same time device firmware sends another data stream to host (central) at about 8 kB/s using notifications. If data streams are synchronized (i.e. BL702 device waits for data write, receives and sends received data back to host), then I've get such errors (assertions) very rarely. But if data streams are simultaneous and asynchronous, I'm getting assertion after about five seconds of data transfer testing.
My investigation through disassembly and code analysis shows, that at the end of BLE transfer packet sequence in this erroneous condition transfer counter rx_cnt (received packets counter, maintained by software), becomes greater than rx_desc_cnt received packet counter, maintained by BLE hardware controller in EM (Exchange Memory) shared memory area. In the example above software thinks that it received 8 (eleven!) packets, but there were only 3 (three!) packets processed by hardware (reported by hardware RX descriptor counter). This condition triggers the assertion, because hardware should process at least the same packets count as software does. So this shouldn't be but it is here.
It seems, that the main issue is the transfer end timeout "interrrupt" handling, implemented in lld_evt_slot_isr() handler. This functions tries to finish timed out events by calling lld_evt_end_isr(false) in case of passed transfer time:
In this case IRQ status is checked against pending end of transfer interrupt flags and if there is no such flags lld_evt_end_isr(false) gets called. But this leads to the potential race condition if in some later moment, when lld_evt_slot_isr() returns one (1) external interrupt flags variable gets updated with newer interrupt status register value, having one of "end of transfer" flags set. So, the same condition will be processed two times instead of exactly one time. It seems for me, that it is an logical error. Interrupt handler lld_evt_end_isr(false) should not be called without actual interrupt status bits were set.
I've tried to override lld_evt_slot_isr(void) with my simple implementation, calling two functions:
externvoidrwip_wakeup_end(void);
externvoidea_finetimer_isr(void);
int__wrap_lld_evt_slot_isr(void)
{
// Handle end of wake-uprwip_wakeup_end();
// Try to schedule immediatelyea_finetimer_isr();
return0;
}
And this solves the data transfer error. I've used ld (linker) parameter --wrap=lld_evt_slot_isr to override original lld_evt_slot_isr() calls in the provided by bl_io_sdk repository static library.
So I definitely sure, that there is an error in software library, that triggers assertion in intensive data transfer applications. Please, explain the implemented logic or change it to prevent possible race conditions and assertion triggering.
Here is BL702 MCU log data, captured before assertion takes places:
As you can see from log data above g_endisr_miss_cnt gets incremented exactly after assertion message. So I'm sure, that such assertions are direct consequences of lld_evt_end_isr(false) uncorrect calls in lld_evt_slot_isr() handler.
I suppose, that BL702 (and may be other MCUs) uses RivieraWaves Bluetooth IP core and driver software and I definitely sure, that such serious bug should be translated to RivieraWaves representatives to find a right way to solve this issue. I've tried various workarounds (f.e. increasing connection interval, lowering MTU size, etc), but nothing helps. The only way to solve this issue at the current moment is using lld_evt_slot_isr() wrapper, described above.
If lld_evt_slot_isr() was implemented by BouffaloLab I kindly ask developers to check it's implementation for BL702 and other MCUs to fix described errors.
The text was updated successfully, but these errors were encountered:
Hello,
I'm getting errors while trying to make intensive simultaneous data transfer over BLE with BL702, configured as peripheral BLE device:
I'm using latest m0s1 static library from master branch of bl_iot_sdk repository and RTL8761BU USB Bluetooth adapter under Linux. I've tried std static library and it behaves exactly in the same way. LE MTU is 247 bytes. My Linux test software sends data stream to device (peripheral) at about 8 kB/s speed using GATT write without response command and at the same time device firmware sends another data stream to host (central) at about 8 kB/s using notifications. If data streams are synchronized (i.e. BL702 device waits for data write, receives and sends received data back to host), then I've get such errors (assertions) very rarely. But if data streams are simultaneous and asynchronous, I'm getting assertion after about five seconds of data transfer testing.
My investigation through disassembly and code analysis shows, that at the end of BLE transfer packet sequence in this erroneous condition transfer counter rx_cnt (received packets counter, maintained by software), becomes greater than rx_desc_cnt received packet counter, maintained by BLE hardware controller in EM (Exchange Memory) shared memory area. In the example above software thinks that it received 8 (eleven!) packets, but there were only 3 (three!) packets processed by hardware (reported by hardware RX descriptor counter). This condition triggers the assertion, because hardware should process at least the same packets count as software does. So this shouldn't be but it is here.
It seems, that the main issue is the transfer end timeout "interrrupt" handling, implemented in
lld_evt_slot_isr()
handler. This functions tries to finish timed out events by callinglld_evt_end_isr(false)
in case of passed transfer time:In this case IRQ status is checked against pending end of transfer interrupt flags and if there is no such flags
lld_evt_end_isr(false)
gets called. But this leads to the potential race condition if in some later moment, whenlld_evt_slot_isr()
returns one (1) external interrupt flags variable gets updated with newer interrupt status register value, having one of "end of transfer" flags set. So, the same condition will be processed two times instead of exactly one time. It seems for me, that it is an logical error. Interrupt handlerlld_evt_end_isr(false)
should not be called without actual interrupt status bits were set.I've tried to override
lld_evt_slot_isr(void)
with my simple implementation, calling two functions:And this solves the data transfer error. I've used ld (linker) parameter
--wrap=lld_evt_slot_isr
to override originallld_evt_slot_isr()
calls in the provided by bl_io_sdk repository static library.So I definitely sure, that there is an error in software library, that triggers assertion in intensive data transfer applications. Please, explain the implemented logic or change it to prevent possible race conditions and assertion triggering.
Here is BL702 MCU log data, captured before assertion takes places:
As you can see from log data above g_endisr_miss_cnt gets incremented exactly after assertion message. So I'm sure, that such assertions are direct consequences of
lld_evt_end_isr(false)
uncorrect calls inlld_evt_slot_isr()
handler.I suppose, that BL702 (and may be other MCUs) uses RivieraWaves Bluetooth IP core and driver software and I definitely sure, that such serious bug should be translated to RivieraWaves representatives to find a right way to solve this issue. I've tried various workarounds (f.e. increasing connection interval, lowering MTU size, etc), but nothing helps. The only way to solve this issue at the current moment is using
lld_evt_slot_isr()
wrapper, described above.If
lld_evt_slot_isr()
was implemented by BouffaloLab I kindly ask developers to check it's implementation for BL702 and other MCUs to fix described errors.The text was updated successfully, but these errors were encountered: