Skip to content

Conversation

@abutt-amd
Copy link
Collaborator

For some large designs router preprocessing takes longer than actually routing the design. This PR parallelizes the bottlenecks of router preprocessing where possible. Before these changes preprocess takes 243 seconds on a design that nearly fills 2 SLRs on the V80, and after these changes preprocess takes 74 seconds (a 3.3x speedup). Performance comparison run on my VDI with 8 threads.

Copy link
Member

@clavin-xlnx clavin-xlnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving this! Looks like there might be some regression failures you'll need to sort out.

@abutt-amd
Copy link
Collaborator Author

This is passing test cases locally but is now dependent on an update to the rapidwright api.

@abutt-amd
Copy link
Collaborator Author

Ok I think this should be in a reasonable place now. There's definitely some more performance that could be squeezed out by adding thread-safe APIs to design and net instead of the coarse grain synchronization I have now but experimentally this seems to only create a few seconds of overhead at most. Combined with a few additional optimizations the total runtime is now around 65-70s (~3.5x speedup).

Copy link
Member

@clavin-xlnx clavin-xlnx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked over the code and don't have any major comments. But I think I will defer to @eddieh-xlnx as I think he might be able to provide more substantial feedback on the parallelization techniques.

Comment on lines +2300 to +2303
Cell c = cellCache.containsKey(p) ? cellCache.get(p) : design.getCell(p.getFullHierarchicalInstName());
if (!cellCache.containsKey(p)) {
cellCache.put(p, c);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at using computeIfAbsent here. Otherwise, you're doing 4 lookups.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, can you convince me that Map<EDIFHierPortInst, Cell> cellCache adds anything here? Since you're iterating over physPins, is it expected that you will ever lookup one of those pins more than once?

I could maybe see value in Map<EDIFHierCellInst, Cell>, but even then, how often does the same pin connect to the same cell? I'd need to see some data.

String logicalPinName = p.getPortInst().getName();
Set<String> physPinMappings = c.getAllPhysicalPinMappings(logicalPinName);
Set<String> physPinMappings;
synchronized (design.getCell(c.getName())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why do you need to synchronize on the cell here? I don't think any pin mappings need to be modified in this method, nor do cells need to be created.
  2. Why is it necessary to lookup the cell again here?

Comment on lines +3535 to 3537
while (!futures.isEmpty()) {
ParallelismTools.joinFirst(futures);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joinFirst() is for when you want to get the result from the first Future that completes so that it can be operated on. You want this instead:

Suggested change
while (!futures.isEmpty()) {
ParallelismTools.joinFirst(futures);
}
ParallelismTools.join(futures);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably need to change futures to a List too.

* @return
*/
@SuppressWarnings("unchecked")
public static Map<SiteInst, Map<Net, List<String>>> getSiteInstToNetSiteWiresMap(Design design) {
Copy link
Collaborator

@eddieh-xlnx eddieh-xlnx Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced this is necessary. My quick take is that this Map is only used by each Net (i.e. createMissingSitePinInsts(Design, Net, ...), getRoutedSitePinFromPhysicalPin(Cell, Net, ...), getAllRoutedSitePinsFromPhysicalPin(Cell, Net, ...)) which indicates to me that a global map (keyed on SiteInst) is not the best thing to do.


if (parentPhysNet == null) {
synchronized (design) {
if (!net.rename(parentHierNet.getHierarchicalNetName())) {
Copy link
Collaborator

@eddieh-xlnx eddieh-xlnx Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on how many things you have to rename (admittedly, there shouldn't be many), this might be quite slow as you're essentially enforcing an atomic section here on the design singleton. It's likely better to save everything you need to rename into a concurrent collection (e.g. ConcurrentLinkedQueue) so that the actual renaming can happen outside of the parallel section.

Same for the movePinstToNewNetDeleteOldNet() call below.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants