Skip to content

Optimize buffering for ResourceContainers #20

Open
@alexstaeding

Description

@alexstaeding

Currently, standard implementations of ResourceContainer such as ZipResourceContainer buffer the backing stream. This was a necessary safety precaution in the initial implementation of the launcher module (#12), as it is not initially clear how callers may use the stream. In the interest of time, this was the best solution.

As an example, this is the current state of ZipResourceContainer as of v0.2.1:
https://github.com/SourceGrade/Jagr/blob/30c517952cdcc53ce4f2c54ee6d6cc87a3d3a69e/launcher/src/main/kotlin/org/sourcegrade/jagr/launcher/io/ZipResourceContainer.kt#L57-L63

The result from the underlying ZipInputStream is buffered using ByteArrayResource:
https://github.com/SourceGrade/Jagr/blob/30c517952cdcc53ce4f2c54ee6d6cc87a3d3a69e/launcher/src/main/kotlin/org/sourcegrade/jagr/launcher/io/Resource.kt#L70-L73

Further compounding the issue, is the fact that zip.readBytes() creates its own buffer and then copies the array again to cut it to the correct size (IOStreams.kt in the Kotlin stdlib):

/**
 * Reads this stream completely into a byte array.
 *
 * **Note**: It is the caller's responsibility to close this stream.
 */
@SinceKotlin("1.3")
public fun InputStream.readBytes(): ByteArray {
    val buffer = ByteArrayOutputStream(maxOf(DEFAULT_BUFFER_SIZE, this.available()))
    copyTo(buffer)
    return buffer.toByteArray()
}

Why turn off buffering by default?

In cases where the original resource is directly transformed, it is not necessary to save the original state of the stream. While some buffering may be required for the transformation, that should be up to the transformer to implement. A "general" buffering strategy is not useful here and may significantly harm performance.,

A possible solution to this is to not buffer the standard implementations of ResourceContainer at all. Instead, for cases where a "general" buffer is required, provide a method similar to InputStream.buffered from the Kotlin stdlib (which returns a wrapped, buffered InpustStream) but applied to either ResourceContainer and/or Resource.

Deliverables

  • Remove all redundant array copies and stop buffering standard implementations of ResourceContainer by default
  • Provide an alternative method when buffering is needed
  • Document the effects of non-buffered resources

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugsize:LThis can be dealt with in 2-3 weeks

    Type

    No type

    Projects

    Status

    To Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions