Skip to content

Mothership/Softswitch barrier logic breaks "simple" GALS applications #285

@heliosfa

Description

@heliosfa

This is somewhat related to #247.

The barrier release time for the current Mothership/Software barrier is too short. This means that some parts of the application can start running and sending packets before other parts are even started. If this occurs, this results in some packets being thrown away. For an asynchronous application, this is not necessarily a problem as more packets can be emitted. For GALS applications, the loss of a packet causes local synchronisation to be lost and the entire application to freeze.

An application that illustrates this problem: A 19x19 GALS arithmetic grid works on Ayres while a 20x20 does not. Modifying the Softswitch so that "AA" is printed whenever a packet is thrown away in the barrier shows that the latter throws away packets, causing the application to hang. Running in debug mode exacerbates the problem and makes smaller applications fail.

Possible solutions:

  • Increase softswitch_delay(). This is not sustainable in the long run.
  • Buffer packets received during the barrier (rather than discard) and then play them back. This is not sustainable and there are too many questions over implementation that works for every application.
  • Implement multicast for barrier release as discussed in Mothership freezes with some large problem sizes, deadlocking the system #247.
  • Make the softswitches sit at a tinselIdle() call once released so that they only progress as one. Requires Softswitch: Implement Hardware Idle #242 and makes our softswitch/barrier release logic inherently single application (or at least requires all applications to be launched at the same time).
  • Change the barrier release logic to use the debug UART network rather than the actual network. While it sounds bad, this removes instantiation from the normal network and means that we are not throwing away packets in the barrier. It has the added benefit of falling back to network pushback to stop started parts running ahead too far.

Metadata

Metadata

Labels

bugReport of a bughigh-priorityHigh-priority, time-critical issues

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions