Skip to content

Function wrapping ByteArray/ByteString as RawSource #439

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
zhangzih4n opened this issue Mar 14, 2025 · 7 comments
Open

Function wrapping ByteArray/ByteString as RawSource #439

zhangzih4n opened this issue Mar 14, 2025 · 7 comments

Comments

@zhangzih4n
Copy link

zhangzih4n commented Mar 14, 2025

It is similar to ByteArray.inputStream() in kotlin.io package. It is commonly used, maybe it should be built into kotlinx-io.

A simple implementation:

import kotlinx.io.Buffer
import kotlinx.io.RawSource

fun ByteArray.source(): RawSource = ByteArraySource(this)

internal class ByteArraySource(private val bytes: ByteArray) : RawSource {
    private var position = 0

    override fun readAtMostTo(sink: Buffer, byteCount: Long): Long {
        if (byteCount == 0L) return 0L
        require(byteCount >= 0) { "byteCount ($byteCount) < 0" }

        val endPositionLong = minOf(position + byteCount, bytes.size.toLong())
        val readByteCount = endPositionLong - position
        if (readByteCount == 0L) return -1

        val endPosition = endPositionLong.toInt()
        sink.write(bytes, position, endPosition)
        position = endPosition
        return readByteCount
    }

    override fun close() {}
}
import kotlinx.io.Buffer
import kotlinx.io.RawSource
import kotlinx.io.bytestring.ByteString
import kotlinx.io.write

fun ByteString.source(): RawSource = ByteStringSource(this)

internal class ByteStringSource(private val byteString: ByteString) : RawSource {
    private var position = 0

    override fun readAtMostTo(sink: Buffer, byteCount: Long): Long {
        if (byteCount == 0L) return 0L
        require(byteCount >= 0) { "byteCount ($byteCount) < 0" }

        val endPositionLong = minOf(position + byteCount, byteString.size.toLong())
        val readByteCount = endPositionLong - position
        if (readByteCount == 0L) return -1

        val endPosition = endPositionLong.toInt()
        sink.write(byteString, position, endPosition)
        position = endPosition
        return readByteCount
    }

    override fun close() {}
}
@JakeWharton
Copy link
Contributor

For a ByteString you should create a Buffer and call write with it. No need for an implementation that allocates byte arrays while reading.

@fzhinkin
Copy link
Collaborator

@zhangzih4n thanks for the proposal! It looks reasonable.

@JakeWharton
Copy link
Contributor

The byte array version is unsafe, and should either do a defensive copy (by just being an alias to Buffer().write) or be added to the existing unsafe function object.

@zhangzih4n
Copy link
Author

For a ByteString you should create a Buffer and call write with it. No need for an implementation that allocates byte arrays while reading.

I found an extension function: Sink.write(byteString: ByteString, startIndex: Int = 0, endIndex: Int = byteString.size)

Now it should not allocate a new ByteArray

The byte array version is unsafe, and should either do a defensive copy (by just being an alias to Buffer().write) or be added to the existing unsafe function object.

Could you please elaborate on this? I am a beginner and don't quite understand this sentence.

@zhangzih4n zhangzih4n changed the title Function to convert ByteArray/ByteString to RawSource Function wrapping ByteArray/ByteString as RawSource Mar 14, 2025
@JakeWharton
Copy link
Contributor

JakeWharton commented Mar 15, 2025

Could you please elaborate on this? I am a beginner and don't quite understand this sentence.

Sure!

The gist is that the copy of data from the mutable byte array occurs when the source is read, not when it is created. This means that if you reuse that mutable byte array and the consumer of your source does not read it in its entirety before new data is written the consumer will see the new data rather than the old data.

If you are reusing the byte array, the safe option is to do a whole copy at the time of conversion (i.e., Buffer().apply { write(byteArray) }). Additionally, you could consider borrowing a byte array from the library using UnsafeBufferOperations and writing directly into that rather than maintaining your own to reuse.

If you are not reusing the byte array then creation of a source from it should still be possible but should be an unsafe operation (since only you can maintain this invariant, not the library). I don't think we need any new APIs for this, as UnsafeBufferOperations.moveToTail will take ownership of the byte array (and its valid readable range). Call that on a new Buffer (which is a Source and RawSource) and you're done!

@zhangzih4n
Copy link
Author

zhangzih4n commented Mar 15, 2025

The gist is that the copy of data from the mutable byte array occurs when the source is read, not when it is created. This means that if you reuse that mutable byte array and the consumer of your source does not read it in its entirety before new data is written the consumer will see the new data rather than the old data.

Thank you, I understand!

I tested the ByteArray.inputStream function in kotlin.io and it also behaves the same.

val bytes = "0".encodeToByteArray()
val source = bytes.source().buffered()
val inputStream = bytes.inputStream()
bytes[0] = '1'.code.toByte()
println(bytes.decodeToString())
println(source.readByteArray().decodeToString())
println(inputStream.readAllBytes().decodeToString())

 // 1
 // 1
 // 1

This should be acceptable as it avoids the extra write overhead, just stating it in the comment.

@JakeWharton
Copy link
Contributor

Sure, but matching java.io behavior has never really been a design goal of the API provided by this library or Okio before it. One of the motivating factors for its existence, in fact, is that java.io has so many fundamental flaws such as the unsafe byte array sharing between layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants