Refactoring 9183, refactor SameString towards more of a case sensitive search #11374
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull request overview
Description of the purpose of this PR
Notes on this PR:
To test the code, I ran through all of the test IDFs that I could, a handful of them broke on both develop and my refactor branch. This could be an issue with my testing code. My testing code copied over the IDFs, one at a time to a new folder, it tried to also copy over any additional files, based on the IDF's file name. Once the files were copied over, it ran the IDF 3 times with the built code from the NREL/develop branch and 3 times with my refactor branch (alternating between them, so develop, refactor, develop...). I turned off everything that I could on the computer I was running the tests on, and disconnected it from the network, to try and reduce noise in the timings. After running, the code compared output files to make sure they matched, removing EnergyPlus version numbers and time stamps.
Those tests showed a net improvement of about 1.37% (total develop run time/total refactor run time), and an average improvement of about 0.27% (an average of all average percentage changes). The difference between those is most likely due to minimal performance changes in short running IDFs.
This commit/PR is for the first step in getting that refactor completely done. It should stop the code from doing case insensitive compares more than once. The way it's doing this is by changing the casing when SameString finds equality but with different cases.
The second step in this would be to only uppercase (vs matching the case of what it compared to), and log every time it changes the case to try to make certain the code isn't using multiple cases with-in itself.
This commit also tightens up the enum check by making that case insensitive, using equali, so all the makeUPPER wraps being done before that can be safely removed. I did that because there were inconsistencies in some cases/if elses, where most of the time makeUPPER was being used before the enum check, but several times it was forgotten. This should fix that, and hopefully fix some lurking bugs along the way.
I also sped up SameString/equali so there's less of a hit with the case insensitive searching, often making it faster than a case sensitive equality check. I did this by adding a middle character check to exit the comparison quickly if the middle character was different. This came from looking at the data we're checking and seeing that there were many cases where a large chunk of the beginning of the string, and a small chunk of the end of the string were often the same with the middle being different. This could probably be optimized more, but I'm not certain we would get much performance improvement. This change, and adding this to the enum check is where I saw the greatest performance improvement.
Pull Request Author
Reviewer
Notes on this PR:
To test the code, I ran through all of the test IDFs that I could, a handful of them broke on both develop and my refactor branch. This could be an issue with my testing code. My testing code copied over the IDFs, one at a time to a new folder, it tried to also copy over any additional files, based on the IDF's file name. Once the files were copied over, it ran the IDF 3 times with the built code from the NREL/develop branch and 3 times with my refactor branch (alternating between them, so develop, refactor, develop...). I turned off everything that I could on the computer I was running the tests on, and disconnected it from the network, to try and reduce noise in the timings. After running, the code compared output files to make sure they matched, removing EnergyPlus version numbers and time stamps.
Those tests showed a net improvement of about 1.37% (total develop run time/total refactor run time), and an average improvement of about 0.27% (an average of all average percentage changes). The difference between those is most likely due to minimal performance changes in short running IDFs.
This commit/PR is for the first step in getting that refactor completely done. It should stop the code from doing case insensitive compares more than once. The way it's doing this is by changing the casing when SameString finds equality but with different cases.
The second step in this would be to only uppercase (vs matching the case of what it compared to), and log every time it changes the case to try to make certain the code isn't using multiple cases with-in itself.
This commit also tightens up the enum check by making that case insensitive, using equali, so all the makeUPPER wraps being done before that can be safely removed. I did that because there were inconsistencies in some cases/if elses, where most of the time makeUPPER was being used before the enum check, but several times it was forgotten. This should fix that, and hopefully fix some lurking bugs along the way.
I also sped up SameString/equali so there's less of a hit with the case insensitive searching, often making it faster than a case sensitive equality check. I did this by adding a middle character check to exit the comparison quickly if the middle character was different. This came from looking at the data we're checking and seeing that there were many cases where a large chunk of the beginning of the string, and a small chunk of the end of the string were often the same with the middle being different. This could probably be optimized more, but I'm not certain we would get much performance improvement. This change, and adding this to the enum check is where I saw the greatest performance improvement.