Skip to content

fix Issue 15831 - Mangle back references to avoid huge symbol names #5855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 16, 2017

Conversation

rainers
Copy link
Member

@rainers rainers commented Jun 9, 2016

This is another implementation of a mangling scheme that uses back references for reoccuring types and identifiers. It also get's rid of an ambiguity in the existing mangling and encapsulates mangling better in dmangle.d.

see https://issues.dlang.org/show_bug.cgi?id=15831.

Updated:
The bad symbol 1.s.s.s.s.s.foo mangles as _D13testexpansion__T1sTSQw__TQjTSQBf__TQtTSQBp__TQBdTSQCa__TQBoTiZQBuFiZ6ResultZQCiFQBfZQqZQCtFQCbZQBbZQDfFQCxZQBnZQDrFQDsZQBz3fooMFNaNfZv with 138 characters (instead of 1153). Please also note some extra characters due to the gratuitously long __T prefix.

The worse symbol 1.s.s.s.s.s.s.s.s.s.s.foo grows to 253 characters (instead of 37640).

Here is a rough sketch of the implementation:

  • for every type that has been mangled before in the same symbol, encode it as QAAa, where AAa is the relative position of the first occurrence encoded as a base 26 number with the last character using a lower case letter instead of upper case letters to detect the end of the number
  • for every identifier that has been mangled before, encode it as QAAa aswell. To distinguish the two types of back references the demangler needs to lookup the character at the referenced position.

To capture all back references it is no longer allowed to build the mangled string from mangled strings of its sub components. It's also not feasible to insert characters into the mangled strings (currently done for the length of a mangled template instance) as it can invalidate back references.

Accompanying PRs: dlang/druntime#1830 and dlang/dlang.org#1664

@rainers rainers force-pushed the mangle_backrefs branch from cdb5331 to f6023c9 Compare June 9, 2016 06:51
@UplinkCoder
Copy link
Member

Seems we had the same idea.

@UplinkCoder
Copy link
Member

What was wrong with the encapsulation ?

@rainers
Copy link
Member Author

rainers commented Jun 9, 2016

What was wrong with the encapsulation ?

Mangling was also done in dtemplate.d (the code versioned out in this PR).

@rainers
Copy link
Member Author

rainers commented Jun 9, 2016

Seems we had the same idea.

gcc developers had this idea 20+ years ago.

src/dmangle.d Outdated
@@ -88,6 +94,9 @@ private immutable char[TMAX] mangleChar =
// X // variadic T t...)
// Y // variadic T t,...)
// Z // not variadic, end of parameters
// # // Type backward reference
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the backrefernce can distinguish itself
no need to add so many prefixes I think.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that might work. There are some ambiguities in the current name mangling, though, so I'm not 100% sure we'd need more context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ambiguities ?
Do they pose a problem ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ambiguities ?

A simple bugzilla search already lists 4: https://issues.dlang.org/buglist.cgi?quicksearch=mangl%20ambig

Do they pose a problem ?

Sure. Not sure whether it gets worse with back references.

@UplinkCoder
Copy link
Member

I see.
Let's collaborate on this PR then.

@rainers
Copy link
Member Author

rainers commented Jun 10, 2016

Fixed compilation, but still has some link errors.

@UplinkCoder
Copy link
Member

I'll check it out

@UplinkCoder
Copy link
Member

How would you de-mangle this ?

@rainers
Copy link
Member Author

rainers commented Jun 11, 2016

How would you de-mangle this ?

Do you expect any specific trouble? I'll switch to buffer positions for a simpler demangler once the link problems are solved. AFAICT the mangling fails for named template mixins.

@UplinkCoder
Copy link
Member

The link problem that I can see occur when dmd tries to link to druntime because suddenly the mangle is different.
About the demangler the trouble with using a counter of encouterd symbols over the position.
Is that you need to build a table that translates from counter to symbolname index.
This would not be necsessary If we just use the position.

@JackStouffer
Copy link
Member

What's the status of this?

@rainers
Copy link
Member Author

rainers commented Jun 21, 2016

What's the status of this?

AFAICT, it mostly works on Windows (but a unittest in std.traits for mangledName), but fails on linux. I suspect that it has to do with shared library generation. I also noticed a few possible improvements regarding symbol back references.

The identifier encoding and using positional back references will make it difficult to implement std.traits.mangledName and core.demangle.mangle. BTW: can these two be merged?

@UplinkCoder
Copy link
Member

@rainers
It it quite useful to handle mangle-parent specially as symbols tend to have the same parents over and over again.
This has quite an impact!

@rainers
Copy link
Member Author

rainers commented Jun 24, 2016

It it quite useful to handle mangle-parent specially as symbols tend to have the same parents over and over again.

Yes, this was one of the "possible improvements" I mentioned. With just a positional back reference, it's a bit difficult to detect where the parent symbol ends, though. We could reverse the order of qualified names to figure that.

@UplinkCoder
Copy link
Member

I'd rather use either a length.

@UplinkCoder
Copy link
Member

UplinkCoder commented Jul 5, 2016

@rainers I finally figure out what a mangle really is :)
a mangle is a string-representation of a symbol-resolve query against a type resolved AST.
I think soon I'll have a more refined approach then the current one.

The Problem is not only string-compression, but also query optimization.

@rainers
Copy link
Member Author

rainers commented Jul 23, 2016

Sorry, accidentally pushed a corresponding druntime branch to the wrong repository: https://github.com/dlang/druntime/tree/mangle_backref

Before making things worse: what's the correct way to get rid of it?

@rainers
Copy link
Member Author

rainers commented Jul 23, 2016

Thanks for the quick answer. git push --force fails, though:

Total 0 (delta 0), reused 0 (delta 0)
remote: error: GH006: Protected branch update failed for refs/heads/master.
remote: error: Cannot force-push to a protected branch
To https://github.com/dlang/druntime.git
 ! [remote rejected] master -> master (protected branch hook declined)
error: failed to push some refs to 'https://github.com/dlang/druntime.git'

That's probably a good sign, so I cannot really ruin master by accident.

@rainers
Copy link
Member Author

rainers commented Jul 23, 2016

I've just removed it in the github branch list: https://github.com/dlang/druntime/branches

@MartinNowak
Copy link
Member

Seems we had the same idea.

Have you guys talked about this earlier and/or did both of you work on an implementation?

@MartinNowak
Copy link
Member

Just to bring up a similar point. While this seems like a good fix the technical liability of huge symbol names, it doesn't make them more displayable. Could we possibly come up w/ better human readable symbols that are still unique? That for sure would also solve the technical issue.
If not we could of course go the route of this PR and come up w/ some sensible (non-unique abbreviations) for pretty printing.

@andralex
Copy link
Member

OK, sadly that won't work out. Even 9.8 seconds would be a problem. @rainers do you know what the speed bottleneck is?

@rainers
Copy link
Member Author

rainers commented Jul 15, 2017

OK, sadly that won't work out. Even 9.8 seconds would be a problem. @rainers do you know what the speed bottleneck is?

I just tried it in a linux VM and the build time of the druntime PR without this dmd PR raised the compilation time from about 10s to 70s. It seems similar on Windows, but not as extreme. I suspect that this might be caused by usage of externDFunc that now uses the demangler at compile time to construct the mangled name. If that's the cause I can create a short cut for most symbols, but I'll have to investigate tomorrow.

As Sebastian, I don't see that rise in build time for phobos unittests, though it was a few percent slower as shown in #5855 (comment).

@JackStouffer
Copy link
Member

If push comes to shove, we can always put this behind a DMD flag.

@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

but I'll have to investigate tomorrow.

It turned out this was just another instance of the dmd inliner going wild to an extent that the optimizer can't handle reasonably. I've disabled inlining some functions (dlang/druntime@81a06e7) and it now builds slightly faster than master for me.

@andralex
Copy link
Member

Awesome! Fix the failing tests?

@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

Fix the failing tests?

The tests need the corresponding druntime change: dlang/druntime#1830

@wilzbach wilzbach changed the base branch from master to mangle July 16, 2017 12:52
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @rainers!

Bugzilla references

Auto-close Bugzilla Description
15831 IFTI voldemort type exploding bloat

@wilzbach wilzbach closed this Jul 16, 2017
@wilzbach wilzbach reopened this Jul 16, 2017
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @rainers!

Bugzilla references

Auto-close Bugzilla Description
15831 IFTI voldemort type exploding bloat

@wilzbach
Copy link
Member

wilzbach commented Jul 16, 2017

The tests need the corresponding druntime change: dlang/druntime#1830

Created a mangle branch and changed the target on both PRs, s.t. the auto-tester and other CIs can test this without needing to merge the druntime PR (hence also the rebase)

@andralex
Copy link
Member

@wilzbach good to know! Well I've pulled dlang/druntime#1830 already because it was passing tests.

@andralex
Copy link
Member

@wilzbach is there a way now to kick off the tests again?

@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

Created a mangle branch and changed the target on both PRs

Thanks.

is there a way now to kick off the tests again?

I see a new test failure, will fix it in a few minutes.

@MartinNowak
Copy link
Member

MartinNowak commented Jul 16, 2017

Let's maybe go with __traits(__externDMangle) as it's internal only and might get replaced with a bit of druntime CTFE logic.
Already done in dlang/druntime#1830, thanks @rainers :).

@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

Let's maybe use __traits(__externDMangle)

The new trait is completely gone in this version.

@dlang-bot dlang-bot merged commit 16101ef into dlang:mangle Jul 16, 2017
@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

I see a new test failure, will fix it in a few minutes.

Works for me locally. I suspect I still looked at the test results of building against the master druntime.

I'm slightly confused by all the new branches, though. Is there anywhere we can check the build with both dmd/druntime patches.

@andralex
Copy link
Member

@rainers probably #6997 is to look at

@rainers
Copy link
Member Author

rainers commented Jul 16, 2017

probably #6997 is to look at

@andralex Nah, that's merge_stable, not merge_mangle.

@wilzbach
Copy link
Member

I'm slightly confused by all the new branches, though. Is there anywhere we can check the build with both dmd/druntime patches.

That was the idea behind the branches, sorry for the confusion. In general, for a PR to branch X the same branch will be checked out when building druntime and phobos (master is used as soft-fallback).

@rainers probably #6997 is to look at

I think #6998 is the right one (it will checkout master at druntime as it is targeted at master)

@quickfur
Copy link
Member

quickfur commented Jul 17, 2017

Cool beans! I just checked out the mangle branch (for others who may be interested: you have to checkout mangle for both dmd and druntime, and then recompile dmd, druntime, phobos) and tested it on my template-heavy project. The results are very promising: my executable sizes are down to about 4MB now (down from about 8MB for the version with OO type erasure, or 20MB for the version with no OO type erasure).

I also tested two versions of the same program (from git history):

  • A version with type erasure (in the middle of the UFCS chain, insert an OO wrapper that erases the types of previous components of the chain), which now has all symbols less than 1000 characters;
  • A version without type erasure, where the largest symbol is about 1200 characters, a noticeable but mostly negligible difference.

So this looks very promising indeed.

However, compile times have noticeably slowed down, so that will have to be addressed before we merge this to master. In any case, many thanks to @rainers for pulling this off!

@rainers
Copy link
Member Author

rainers commented Jul 17, 2017

I just checked out the mangle branch (for others who may be interested: you have to checkout mangle for both dmd and druntime

Thank you for testing. The druntime changes are already merged to master, and the "official" PR for dmd is now #6998.

However, compile times have noticeably slowed down

Is your project available for testing somewhere? I've been using phobos unittests for benchmrking, and that showed an increase of compilation times of a few percent. Sebastians test above didn't show a difference at all.

@quickfur
Copy link
Member

quickfur commented Jul 17, 2017

Actually, I take back my comment about compile times. I just re-tested on two versions of my program:

  • With OO type erasure (smaller symbols lengths overall), compilation is about 13 seconds on git HEAD, but about 11 seconds on the mangle branch.
  • Without OO type erasure (longer symbols), compilation is about 40 seconds on git HEAD, but about 12 seconds on the mangle branch.

So actually, there's a decrease in compilation times, not an increase.

@rainers
Copy link
Member Author

rainers commented Jul 17, 2017

Without OO type erasure (longer symbols), compilation is about 40 seconds, but about 12 seconds on the mangle branch.

Cool. That's what I was hoping for ;-) It also happens in https://issues.dlang.org/show_bug.cgi?id=15831 if symbols get really long.

{
mangleTemplateInstance(ti);
}
else if (p.getIdent())
Copy link
Member

@ibuclaw ibuclaw Jun 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change caused a regression (or exposed an underlying bug).

Type merging of immutable!T -> T is now broken, if T is nested in a struct S(alias fun). Calling mangleToBuffer is called before and after fun has completed its semantic pass now yields two different results. So there are two copies of T floating around the AST with pointers to immutable!T.

struct MultiwayMerge()
{ 
    bool compFront()
    {
        return true;
    }

    struct BinaryHeap(alias less)
    {
        // Both:
        // __T13MultiwayMergeZQq__T10BinaryHeapS_DQBu__TQBqZQBu9compFrontMFZbZQBr4Impl
        // __T13MultiwayMergeZQq__T10BinaryHeapS_DQBu__TQBqZQBu9compFrontMFNaNbNiNfZbZQBz4Impl
        struct Impl
        {
            int _payload;
        }

        void initialize()
        {
            immutable Impl init;
        }   
    }

    BinaryHeap!(compFront) _heap;
}

MultiwayMerge!() multiwayMerge;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling mangleToBuffer is called before and after fun has completed its semantic pass now yields two different results.

AFAICT calling mangleToBuffer on an incompletely analyzed symbol should never happen. The old implementation reused the first (incorrect) result, so the issue could have passed unnoticed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.