-
-
Notifications
You must be signed in to change notification settings - Fork 649
fix Issue 15831 - Mangle back references to avoid huge symbol names #5855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Seems we had the same idea. |
What was wrong with the encapsulation ? |
Mangling was also done in dtemplate.d (the code versioned out in this PR). |
gcc developers had this idea 20+ years ago. |
src/dmangle.d
Outdated
@@ -88,6 +94,9 @@ private immutable char[TMAX] mangleChar = | |||
// X // variadic T t...) | |||
// Y // variadic T t,...) | |||
// Z // not variadic, end of parameters | |||
// # // Type backward reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the backrefernce can distinguish itself
no need to add so many prefixes I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that might work. There are some ambiguities in the current name mangling, though, so I'm not 100% sure we'd need more context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ambiguities ?
Do they pose a problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ambiguities ?
A simple bugzilla search already lists 4: https://issues.dlang.org/buglist.cgi?quicksearch=mangl%20ambig
Do they pose a problem ?
Sure. Not sure whether it gets worse with back references.
I see. |
Fixed compilation, but still has some link errors. |
I'll check it out |
How would you de-mangle this ? |
Do you expect any specific trouble? I'll switch to buffer positions for a simpler demangler once the link problems are solved. AFAICT the mangling fails for named template mixins. |
The link problem that I can see occur when dmd tries to link to druntime because suddenly the mangle is different. |
What's the status of this? |
AFAICT, it mostly works on Windows (but a unittest in std.traits for mangledName), but fails on linux. I suspect that it has to do with shared library generation. I also noticed a few possible improvements regarding symbol back references. The identifier encoding and using positional back references will make it difficult to implement |
@rainers |
Yes, this was one of the "possible improvements" I mentioned. With just a positional back reference, it's a bit difficult to detect where the parent symbol ends, though. We could reverse the order of qualified names to figure that. |
I'd rather use either a length. |
@rainers I finally figure out what a mangle really is :) The Problem is not only string-compression, but also query optimization. |
Sorry, accidentally pushed a corresponding druntime branch to the wrong repository: https://github.com/dlang/druntime/tree/mangle_backref Before making things worse: what's the correct way to get rid of it? |
Thanks for the quick answer.
That's probably a good sign, so I cannot really ruin master by accident. |
I've just removed it in the github branch list: https://github.com/dlang/druntime/branches |
Have you guys talked about this earlier and/or did both of you work on an implementation? |
Just to bring up a similar point. While this seems like a good fix the technical liability of huge symbol names, it doesn't make them more displayable. Could we possibly come up w/ better human readable symbols that are still unique? That for sure would also solve the technical issue. |
OK, sadly that won't work out. Even 9.8 seconds would be a problem. @rainers do you know what the speed bottleneck is? |
I just tried it in a linux VM and the build time of the druntime PR without this dmd PR raised the compilation time from about 10s to 70s. It seems similar on Windows, but not as extreme. I suspect that this might be caused by usage of externDFunc that now uses the demangler at compile time to construct the mangled name. If that's the cause I can create a short cut for most symbols, but I'll have to investigate tomorrow. As Sebastian, I don't see that rise in build time for phobos unittests, though it was a few percent slower as shown in #5855 (comment). |
If push comes to shove, we can always put this behind a DMD flag. |
It turned out this was just another instance of the dmd inliner going wild to an extent that the optimizer can't handle reasonably. I've disabled inlining some functions (dlang/druntime@81a06e7) and it now builds slightly faster than master for me. |
Awesome! Fix the failing tests? |
The tests need the corresponding druntime change: dlang/druntime#1830 |
Created a |
@wilzbach good to know! Well I've pulled dlang/druntime#1830 already because it was passing tests. |
@wilzbach is there a way now to kick off the tests again? |
Thanks.
I see a new test failure, will fix it in a few minutes. |
|
The new trait is completely gone in this version. |
Works for me locally. I suspect I still looked at the test results of building against the master druntime. I'm slightly confused by all the new branches, though. Is there anywhere we can check the build with both dmd/druntime patches. |
That was the idea behind the branches, sorry for the confusion. In general, for a PR to branch X the same branch will be checked out when building druntime and phobos ( I think #6998 is the right one (it will checkout |
Cool beans! I just checked out the
I also tested two versions of the same program (from git history):
So this looks very promising indeed. However, compile times have noticeably slowed down, so that will have to be addressed before we merge this to master. In any case, many thanks to @rainers for pulling this off! |
Thank you for testing. The druntime changes are already merged to master, and the "official" PR for dmd is now #6998.
Is your project available for testing somewhere? I've been using phobos unittests for benchmrking, and that showed an increase of compilation times of a few percent. Sebastians test above didn't show a difference at all. |
Actually, I take back my comment about compile times. I just re-tested on two versions of my program:
So actually, there's a decrease in compilation times, not an increase. |
Cool. That's what I was hoping for ;-) It also happens in https://issues.dlang.org/show_bug.cgi?id=15831 if symbols get really long. |
{ | ||
mangleTemplateInstance(ti); | ||
} | ||
else if (p.getIdent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change caused a regression (or exposed an underlying bug).
Type merging of immutable!T
-> T
is now broken, if T
is nested in a struct S(alias fun)
. Calling mangleToBuffer is called before and after fun
has completed its semantic pass now yields two different results. So there are two copies of T
floating around the AST with pointers to immutable!T
.
struct MultiwayMerge()
{
bool compFront()
{
return true;
}
struct BinaryHeap(alias less)
{
// Both:
// __T13MultiwayMergeZQq__T10BinaryHeapS_DQBu__TQBqZQBu9compFrontMFZbZQBr4Impl
// __T13MultiwayMergeZQq__T10BinaryHeapS_DQBu__TQBqZQBu9compFrontMFNaNbNiNfZbZQBz4Impl
struct Impl
{
int _payload;
}
void initialize()
{
immutable Impl init;
}
}
BinaryHeap!(compFront) _heap;
}
MultiwayMerge!() multiwayMerge;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling mangleToBuffer is called before and after fun has completed its semantic pass now yields two different results.
AFAICT calling mangleToBuffer on an incompletely analyzed symbol should never happen. The old implementation reused the first (incorrect) result, so the issue could have passed unnoticed.
This is another implementation of a mangling scheme that uses back references for reoccuring types and identifiers. It also get's rid of an ambiguity in the existing mangling and encapsulates mangling better in dmangle.d.
see https://issues.dlang.org/show_bug.cgi?id=15831.
Updated:
The bad symbol
1.s.s.s.s.s.foo
mangles as_D13testexpansion__T1sTSQw__TQjTSQBf__TQtTSQBp__TQBdTSQCa__TQBoTiZQBuFiZ6ResultZQCiFQBfZQqZQCtFQCbZQBbZQDfFQCxZQBnZQDrFQDsZQBz3fooMFNaNfZv
with 138 characters (instead of 1153). Please also note some extra characters due to the gratuitously long__T
prefix.The worse symbol
1.s.s.s.s.s.s.s.s.s.s.foo
grows to 253 characters (instead of 37640).Here is a rough sketch of the implementation:
QAAa
, where AAa is the relative position of the first occurrence encoded as a base 26 number with the last character using a lower case letter instead of upper case letters to detect the end of the numberQAAa
aswell. To distinguish the two types of back references the demangler needs to lookup the character at the referenced position.To capture all back references it is no longer allowed to build the mangled string from mangled strings of its sub components. It's also not feasible to insert characters into the mangled strings (currently done for the length of a mangled template instance) as it can invalidate back references.
Accompanying PRs: dlang/druntime#1830 and dlang/dlang.org#1664