Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spanify various methods #406

Closed
wants to merge 19 commits into from
Closed

Spanify various methods #406

wants to merge 19 commits into from

Conversation

iamcarbon
Copy link
Collaborator

@iamcarbon iamcarbon commented Feb 6, 2024

@drewnoakes Ready for review.

Comment on lines 172 to 173
using var inflaterStream = new DeflateStream(new MemoryStream(compressedProfile), CompressionMode.Decompress);
iccDirectory = new IccReader().Extract(new IndexedCapturingReader(inflaterStream));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now reads the entire compressed profile upfront, which may result in an additional allocation. This should be offset with the reduced indirection / direct access to the underlying buffer.

@@ -36,17 +36,17 @@ public static bool IsValid(byte[] bplist)
return true;
}

public static PropertyListResults Parse(byte[] bplist)
public static PropertyListResults Parse(ReadOnlySpan<byte> bplist)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General question: what do you think about passing ReadOnlySpan<byte> as an in parameter to avoid copies?

Comment on lines 350 to 352
var iccBytes = profileSize <= 256
? stackalloc byte[profileSize]
: new byte[profileSize];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to wrap this pattern up somehow, to:

  • Have the constant in a single place
  • Make it easy to use a pooled array when over the threshold

I haven't looked at this yet. We can file an issue for later.

Copy link
Collaborator Author

@iamcarbon iamcarbon Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will tackle in this PR. Will look something like:

byte[]? rentedBuffer = null;

var iccBytes = profileSize <= 256
    ? stackalloc byte[profileSize]
    : (rentedBuffer = ArrayPool.Shared.Rent(profileSize)).Slice(0, profileSize);

...

if (rentedBuffer != null)
{
    ArrayPool.Shared.Return(rentedBuffer);
}

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we can make the call site look like this:

using rental = BufferUtil.Rent(minimumSize: 1234);

var reader = new BufferReader(rental.Buffer);

I'm not familiar enough with ref struct/scoped here to know if this is possible but I think it is. I think we can also return a struct disposable and the compiler won't box it to invoke Dispose through the interface virtually.

If we had a pattern like that, we could look at making the stack allocation smarter. I read something somewhere about being able to test for stack space, which might mean the threshold could be dynamic. Even a threshold of 256 could blow the stack. However I don't know if such a capability exists, or if I just read a proposal for that.

public IccDirectory Extract(IndexedReader reader)
public IccDirectory Extract(ReadOnlySpan<byte> bytes)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly less flexible than before, as it requires a contiguous chunk of memory while the previous could work on a stream. Maybe that's ok. Just calling it out.

Copy link
Owner

@drewnoakes drewnoakes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Left a bunch of comments that are mostly for future stuff. If you want to tackle any of them here, but all means do, otherwise I'll file follow-up items. Let me know when you think this is ready and I'll take another look (the description is WIP so I assume you're still playing with it).

Comment on lines 350 to 352
using var iccBuffer = profileSize <= 256
? new BufferScope(stackalloc byte[profileSize])
: new BufferScope(profileSize);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running this over the test suite shows a StackOverflowException here. Were you able to get the regression suite working locally? Let me know if you need a hand with that.

The issue is that profileSize is negative.

Perhaps something like:

Suggested change
using var iccBuffer = profileSize <= 256
? new BufferScope(stackalloc byte[profileSize])
: new BufferScope(profileSize);
using var iccBuffer = (uint)profileSize <= 256
? new BufferScope(stackalloc byte[profileSize])
: new BufferScope(profileSize);

That's a clever and performant way to validate the value is also greater than zero, but it's a less common pattern and doesn't read very naturally.

Wish there was a better way to wrap this pattern up, but I can't see it, at least not for all target frameworks we have. For .NET 8 I think we could use inline arrays with something like this:

internal readonly ref struct BufferScope
{
    private const int StackBufferSize = 256;

    private readonly Buffer _buffer;
    private readonly byte[]? _array;
    private readonly Span<byte> _span;

    public BufferScope(int size)
    {
        if (size < 0)
        {
            throw new ArgumentOutOfRangeException(nameof(size), size, "Cannot be negative.");
        }
        else if (size < StackBufferSize)
        {
            _span = MemoryMarshal.CreateSpan(ref Unsafe.As<Buffer, byte>(ref _buffer), size);
        }
        else
        {
            _array = ArrayPool<byte>.Shared.Rent(size);
            _span = _array.AsSpan(0, size);
        }
    }

    public readonly Span<byte> Span => _span;

    public readonly void Dispose()
    {
        if (_array is null) return;

        ArrayPool<byte>.Shared.Return(_array);
    }

    [InlineArray(StackBufferSize)]
    private struct Buffer
    {
        [SuppressMessage("CodeQuality", "IDE0051:Remove unused private members")]
        private byte _element0;
    }
}


namespace MetadataExtractor.IO;

internal ref struct BufferScope
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could add implicit conversion to a Span<byte> with:

    public static implicit operator Span<byte>(BufferScope bufferScope)
    {
        return bufferScope._span;
    }

That could clean up the usages a little. I'd also consider naming this ScopedBuffer as I think it's more of a buffer than a scope.

@drewnoakes
Copy link
Owner

I'm seeing a few regressions in the image suite. One I called out above (stack overflow on negative size), and another seems to be in ICC data, as quite a few files now give:

DOTNET [ERROR: ICC Profile] Exception reading ICC profile: Attempt to read from beyond end of underlying data source (requested index: 0, requested count: 4, max index: -1)

Let me know if you need a hand getting the regression suite set up on your machine. If you're not touching the Java version, you only need to set up the .NET runner (and can comment out the call to run the Java lib too).

@iamcarbon
Copy link
Collaborator Author

Getting that setup now.

Screenshot 2024-02-09 at 8 08 24 PM

@drewnoakes
Copy link
Owner

@iamcarbon let me know if you'd like me to pick this up. No real rush, but I'd like to get 2.9 out in the next couple of weeks.

@markegorman

This comment was marked as off-topic.

@drewnoakes

This comment was marked as off-topic.

@iamcarbon
Copy link
Collaborator Author

@drewnoakes OK, I got the regression suite working. Couple regressions I still need to figure out.

I'm going to break this PR apart few simpler PRs, and may back out of on the ICC span changes since this changes some error messages. I'll get that done this week.

Screenshot 2024-02-18 at 7 11 33 PM Screenshot 2024-02-18 at 7 11 23 PM

@drewnoakes
Copy link
Owner

Sounds good. I suggest also running the suite without your changes, just to baseline. It's possible that different environmental factors create a diff. I try to minimise that, but it's hard.

@iamcarbon iamcarbon mentioned this pull request Feb 22, 2024
@iamcarbon
Copy link
Collaborator Author

Closing this in favor of smaller incremental PRs.

@iamcarbon iamcarbon closed this Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants