greatly improve performance converting long AddSource chains into modified_memref chains #482
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When trying to match an AddSource chain to an existing modified_memref chain, the logic was traversing up the chain to match the root. And it would do this for each subchain as it went.
For example: given the following AddSource chain:
The first
Byte 0x1230would be allocated as a standard memref.The next
Byte 0x1231would be allocated as a standard memref, and a modified_memref would be created to combine them.The third
Byte 0x1232is also allocated as a standard memref, and then it looks at the existing modified_memrefs to see if one already exists before creating a new modified_memref that adds the first modified_memref and the new byte.For the fourth
Byte 0x1233, there's two modified_memrefs to examine looking for a match. Neither do, but when checking the second, it walks up the chain 1232->1231->1230 before deciding that it's a different chain than the 1233->1232->1231->1230 it's looking for. This is due to a flaw in the recursion where it has to match the parents of the parents in order to say the parents are the same.For something like an AddSource chain of 500 items, the code was walking up N log N parents (499+498+497+496...) to not find a match.
Luckily, this only happens when parsing the logic, but it's still unfortunate. With a debug build of PCSX2, loading the FFX subset was taking close to a minute. I'm sure the release build is faster, but would still be more than a few seconds. With these changes, the subset loads almost instantaneously.
I've made two changes in this PR to improve this behavior.