Skip to content

DwarfCompilationUnit.ReadData causes excessive memory usage on large binaries #46

@dedmen

Description

@dedmen

All the Attributes parsed in DwarfSymbolProvider.DwarfCompilationUnit.ReadData are not deduplicated/interned.
For a big binary (in my case with debug info about 900MB) this will cause extreme memory usage.
Within the first 100 compilation units my memory usage rises to 12GB and then it gets stuck there because I ran out of memory.

As a ultra ugly hotfix I added this in DwarfSymbolProvider.ParseCompilationUnits

public class StringInterner
    {
        // deduplicate strings
        // meh https://github.com/dotnet/runtime/issues/21603 https://stackoverflow.com/questions/7760364/how-to-retrieve-actual-item-from-hashsett 
        ConcurrentDictionary<object, object> stringBank = new ConcurrentDictionary<object, object>();

        public object InternObject(object str)
        {
            if (str == null) return str;

            if (stringBank.TryGetValue(str, out var result))
            {
                return result;
            }

            stringBank.AddOrUpdate(str, str, (x,y)=> x);
            return str;
        }
    }
private static DwarfCompilationUnit[] ParseCompilationUnits(byte[] debugData, byte[] debugDataDescription, byte[] debugStrings, NormalizeAddressDelegate addressNormalizer)
        {
            using (DwarfMemoryReader debugDataReader = new DwarfMemoryReader(debugData))
            using (DwarfMemoryReader debugDataDescriptionReader = new DwarfMemoryReader(debugDataDescription))
            using (DwarfMemoryReader debugStringsReader = new DwarfMemoryReader(debugStrings))
            {
                List<DwarfCompilationUnit> compilationUnits = new List<DwarfCompilationUnit>();

                StringInterner interner = new StringInterner();

                List<Task> tasksList = new List<Task>();

                while (!debugDataReader.IsEnd)
                {
                    DwarfCompilationUnit compilationUnit = new DwarfCompilationUnit(debugDataReader, debugDataDescriptionReader, debugStringsReader, addressNormalizer, interner);

                    tasksList.Add(Task.Run(() =>
                    {
                        // intern all attributes in seperate threads

                        foreach (var compilationUnitSymbol in compilationUnit.Symbols)
                        {
                            compilationUnitSymbol.Attributes = 
                                compilationUnitSymbol.Attributes
                                    .Select(x => new KeyValuePair<DwarfAttribute, DwarfAttributeValue>(x.Key, interner.InternObject(x.Value) as DwarfAttributeValue))
                                    .ToDictionary(x => x.Key, x => x.Value);
                        }
                    }));




                    compilationUnits.Add(compilationUnit);
                }

                Task.WaitAll(tasksList.ToArray());

                return compilationUnits.ToArray();
            }
        }

This keeps my memory usage at the 400th compilation unit down at 7.7GB which is atleast usable.
I originally did the interning in DwarfCompilationUnit.data but that took too much time, the data reading is already the performance bottleneck, better not add anything extra to it.
Moving it out into a seperate thread/task works well for me so far.
One could probably intern the whole attribute instead of just the attribute value, not sure if that would be better, I assume it won't.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions