Encoding Structured Binary Data for Transport in URIs and QR Codes
© 2020 Blockchain Commons
Authors: Wolf McNally, Christopher Allen
Version: 2.0.1
Date: July 9, 2020
Revised: March 4, 20201
In order to increase security, developers of hardware cryptocurrency wallets deliberately elide wireless networking capability from their devices. Nonetheless, such devices must send and receive data through some channel to function, and the quantity of data can easily exceed human patience for manual transcription. Many device makers have settled on QR codes as a way of optically sending data from their device displays to network-connected devices. Unconnected devices that include a camera can also read QR codes. Exclusively using QR codes for the transmission of data has the advantages of transparency and the reduction of the attack surface.
While QR codes have built-in error correction and several different encoding modes optimized for different forms of data, they do not impose an internal structure on the data they convey. They do however limit the maximum amount of data that can be conveyed in a single QR code. Ultimately this limitation is due to the inherent limitations of optical readers to resolve a captured image. The largest QR code ("version 40") consists of 177x177 "modules" (pixels). Version 40 QR codes, using the binary encoding mode and the lowest level of error correction have a capacity of 2,953 bytes. This maximum capacity on QR codes becomes an issue when one wishes to convey data messages longer than the maximum supported by the standard. In addition, since the assumed use case of QR codes is usually to convey human-readable text (the canonical example being a URL) the native binary encoding mode of QR codes is not consistently supported by readers.
QR Codes support an "alphanumeric" mode optimized for efficiently conveying a subset of ASCII consisting of 45 characters:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ $%*+-./:
This character set is optimized for industrial applications, not general text (e.g., lower case letters are not included) or even URL encoding (symbols used in URIs such as ?
, =
, and #
are not included). It is also impossible to convey binary data encoded as Base64 or Base64URL using this character set as these formats require the use of both upper case and lower case letters.
Developers of cryptocurrency wallets currently all have their own bespoke ways of breaking a binary message into several parts suitable for display as a series of QR codes, and reassembling them on the destination device. This lack of standardization is one of several problems hampering interoperability between such devices.
This document proposes a method of encoding binary data of arbitrary content and length so that it is suitable for transport in either URIs or QR codes.
The name of the URI scheme for this encoding is "UR" and is intended to be analogous to existing names such as "URL" ("Uniform Resource Locator"), "URI" ("Uniform Resource Identifier") and URN ("Uniform Resource Name"). As this encoding method is intended for self-contained resources themselves, we have chosen "UR" ("Uniform Resource").
This proposed method has the following goals:
- Transport binary data of arbitrary content and length using a sequence of one or more URIs or QR codes.
- Remain agnostic about whether QR codes are displayed together or time-sequenced (animated).
- Avoid the use of QR code binary mode to support transparency and wide compatibility with QR code reader libraries.
- Use the alphanumeric QR code mode for efficiency.
- Be case agnostic, allowing use of all upper case letters (for QR code transport) or all lower case letters (canonical for display and URIs.)
- Include a CRC32 checksum of the entire message in each part to tie them together and ensure the transmitted message has been reconstructed.
- Each single part should also be a valid URI and not require escaping (e.g. percent-encoding) of any of its characters.
- Support the addition of structure in the binary data. Initially specify how binary data representing undifferentiated byte strings should be encoded.
- Support transmitting an arbitrary amount of data both as a minimal, finite sequence of parts and as an indefinite sequence of parts using a "rateless encoding" Fountain Code based on Luby Transform code.
Type | Name | Language | Unit Tests | Demo |
---|---|---|---|---|
Reference | URKit | Swift | URKitTests | URDemo |
Reference | bc-ur | C++ | test.cpp | |
Third-party | foundation-ur-py | Python | test.py | |
Third-party | ur-rs | Rust | search code for #[test] |
UR demo |
Third-party | Hummingbird | Java | tests | |
Third-party | bc-ur | TypeScript/JavaScript | tests |
Compliant UR codec implementations MUST pass the unit tests from the reference implementations above.
URDemo provides an interactive demonstration of single and multi-part encoding and decoding using URKit under iOS. There are also two videos of this demonstration available:
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC2119.
fragment : Refers to one of a subsequence of bytes from the message.
message : The original CBOR structure to be encoded and recovered through decoding
part : Refers to one of a set of multi-part URs that together convey the message.
At the binary level, the goal of adding structure is accomplished by specifying the message in the Concise Binary Object Representation (CBOR). All binary sequences encoded according to this specification MUST be well-formed CBOR. Such a CBOR-encoded payload is henceforth referred to as a message.
By specifying a standard for binary structure, users of this format can begin to standardize structures that go beyond undifferentiated byte strings. CBOR has many desirable traits, including being self-describing, fast to encode and decode, and having minimal implementation complexity. Encoding binary strings as CBOR according to this specification adds a single byte of overhead for payloads of 23 or fewer bytes, two bytes for payloads up to 255 bytes, and three bytes for payloads up to 65535 bytes, and lays the groundwork for encoding more complex structures in the future.
The CBOR in URs MUST adhere to "Canonical CBOR", Section 3.9. For instance, keys in maps must be sorted from lowest to highest.
In addition, a minimal canonical representation for encoding byte strings be used:
- If the encoded byte string has 23 or fewer bytes, it is preceded by the single byte (
0x40
+ length). - If the encoded byte string has 24..255 bytes, it is preceded by (
0x58
, length) where length is the length of the following byte string. - If the encoded byte string has 256..65535 bytes, it is preceded by (
0x59
, h, l) where h, l is the big-endian two byte length of the following byte string. - If the encoded byte string has 65536..2^32-1 bytes, it is preceded by (
0x5a
, b1, b2, b3, b4) where b(n) is the big-endian four byte length of the following byte string.
Writers of this format MUST use the shortest encoding given the length of the payload. CBOR also supports an 8-byte length encoding for payloads longer than 2^32-1 bytes, and also encoding of "indefinite length" byte strings, but implementors of this specification MAY refuse to decode them. Implementors of this specification MAY also reject any other CBOR constructs.
A 16-byte cryptographic seed:
c3fb80bf2c80732f369225e20f7c7aed
The seed encoded as CBOR. It includes a one-byte header, 0x50, which is 0x40 + the length of 0x10 (16).
50c3fb80bf2c80732f369225e20f7c7aed
--================================
header
payload
A 32-byte cryptographic seed:
3ab1b5980595a6e13112c5739283ff5286379e0beac4f3427352a254c40a39ff
The seed encoded as CBOR. It includes a two-byte header, (0x58, 0x20), which is 0x58 to identify a single-byte length, and 0x20, which is the length of the string (32).
58203ab1b5980595a6e13112c5739283ff5286379e0beac4f3427352a254c40a39ff
----================================================================
header
payload
The method of encoding binary data as printable characters specified in this proposal is Bytewords.
- The following CBOR structure (in CBOR diagnostic notation):
"Hello, world"
- Encoded as binary using [CBOR-PLAYGROUND]:
6C # text(12)
48656C6C6F2C20776F726C64 # "Hello, world"
- As a hex string:
6c48656c6c6f2c20776f726c64
In Bytewords minimal encoding is:
jzfdihjzjzjldwcxktjljpjzieatjpgele
==========================--------
payload checksum
Each UR encoded object includes a type
component as the first path component after the ur
scheme. Types MUST consist only of characters from the English letters (ignoring case), Arabic numerals, and the hyphen -
.
The only type this document specifies is bytes
which represents an undifferentiated string of bytes of any length. The bytes
type exists only for testing and validation of UR implementations and MUST NOT be used for any other purpose. Other specifications register and document types that specify forms of structured content intended to address various problem domains.
When a CBOR structure in a UR is tagged, its tag MUST match the tag registered with the UR type.
If the CBOR structure is the top-level object in a UR, then it MUST NOT be tagged, as the UR type provides that information.
If the CBOR structure is embedded in a larger CBOR structure that is part of a UR, it MUST be tagged.
A list of registered CBOR types and their corresponding tags is here.
A single-part UR has the following form:
ur:<type>/<message>
For example:
ur:bytes/hdcxvwskgscmfsrsroluaettbboxsnjnfptbonsstktnrnbasgbyjypaaybnjzfrfyisecmwbzrk
A multi-part UR has the following form:
ur:<type>/<seq>/<fragment>
For example:
ur:crypto-seed/1-13/lpadbtcfadndcysawfmslghdcxoeadhkadmhjtdrswhlnnktwlprtkaeploejyoxlkytzevoidgstennskdkkoeopkinjelpwe
For a single-part UR, message
is created by simply encoding the CBOR binary structure as Bytewords.
For a multi-part UR, the procedure is more complex. The decoder differentiates between a single-part and multi-part UR by the presence of the seq
path component, which is only present in multi-part URs, and has the form:
<seqNum>-<seqLen>
seqNum
and seqLen
are described below. So for a 10-part UR, the first part will have the seq
1-10
and the tenth will have the seq
10-10
. However, parts beyond this can be generated by the fountain encoder, hence seq
values of 11-10
and up are normal.
The CBOR message is first broken into fragments of equal size. The algorithm to choose the fragment size is up to the implementer. Here is the one in the Swift reference implementation:
static func findNominalFragmentLength(messageLen: Int, minFragmentLen: Int, maxFragmentLen: Int) -> Int {
precondition(messageLen > 0)
precondition(minFragmentLen > 0)
precondition(maxFragmentLen >= minFragmentLen)
let maxFragmentCount = messageLen / minFragmentLen
var fragmentLen: Int!
for fragmentCount in 1 ... maxFragmentCount {
fragmentLen = Int(ceil(Double(messageLen) / Double(fragmentCount)))
if fragmentLen <= maxFragmentLen {
break
}
}
return fragmentLen
}
The fragments are then generated. If the last fragment would be smaller, it is padded with zeroes at the end to make it of equal length to the others:
static func partitionMessage(_ message: Data, fragmentLen: Int) -> [Data] {
var remaining = message
var fragments: [Data] = []
while !remaining.isEmpty {
var fragment = remaining.prefix(fragmentLen)
remaining.removeFirst(fragment.count)
let padding = fragmentLen - fragment.count
if padding > 0 {
fragment.append(Data(repeating: 0, count: padding))
}
fragments.append(fragment)
}
return fragments
}
Whenever a part is generated by the encoder, the actual fragment data is enclosed in a CBOR structure that carries metadata needed by the decoder. This CBOR structure is then encoded as Bytewords and forms the <fragment>
component of the multi-part UR. To save space, the general structure is a CBOR array with these fields packed in-order. The following description is written in CDDL:
fragment = [
uint32 seqNum,
uint seqLen,
uint messageLen,
uint32 checksum,
bytes data
]
The above structure decodes from CBOR to the following Swift structure:
final class FountainEncoder {
// ...
struct Part {
let seqNum: UInt32
let seqLen: Int
let messageLen: Int
let checksum: UInt32
let data: Data
// ...
}
// ...
}
seqNum
: The monotonically-increasing sequence number of this fragment. The count starts at 1 and wraps back to zero after 2^32 - 1. This anddata
are the only two fields that change from part to part.seqLen
: The number of fragments in the message.messageLen
: The total length of the message in bytes, not including any padding added to the last fragment.checksum
: The CRC-32 checksum of the message.data
: The fragment data as generated by the fountain encoder.
The first seqLen
parts generated by the UR codec are the minimal, unmixed, in-order fragments of the original message. This is so the entire message can be conveyed in a minimal number of multi-part URs. For example, if it is desired to print a series of QR codes on a single page such that the entire message is guaranteed to be contained there, exactly the first seqlen
parts need to be generated.
After the first seqLen
parts, the data
field is a pseudo-random mix of any subset of the entire set of fragments XORed together (including possibly all of them). Which fragments are mixed in each part must be agreed upon by the encoder and decoder. Hence seqNum
and checksum
are concatenated and then passed through a SHA256 hash to generate a 256-bit seed for a particular pseudorandom generator algorithm: Xoshiro256**
, henceforth "Xoshiro256". This algorithm was chosen for its speed, simplicity of implementation, public domain status, use of 256-bit seed, and quality of output. It is not a cryptographically strong PRNG, but this is not a requirement for this algorithm.
(seqNum || checksum) -> SHA256 -> Xoshiro256
The Swift reference implementation implements the fragment chooser thus:
func chooseFragments(seqNum: UInt32, seqLen: Int, checksum: UInt32) -> Set<Int> {
// The first `seqLen` parts are the "pure" fragments, not mixed with any
// others. This means that if you only generate the first `seqLen` parts,
// then you have all the parts you need to decode the message.
if seqNum <= seqLen {
return Set([Int(seqNum) - 1])
} else {
let seed = Data([seqNum.data, checksum.data].joined())
let rng = Xoshiro256(data: seed)
let degree = chooseDegree(seqLen: seqLen, rng: rng)
let indexes = Array(0 ..< seqLen)
let shuffledIndexes = shuffled(indexes, rng: rng)
return Set(shuffledIndexes.prefix(degree))
}
}
The chooseFragments
algorithm first randomly chooses a degree
, which is the number of fragments, and then performs a Fisher-Yates shuffle on the fragment indexes and takes the first degree
of them, resulting in a random subset of all the fragments. The RNG is used first to choose the degree, and then to perform the shuffle. This entire process is deterministic depending on the seed.
The chooseDegree
algorithm uses an inverse probability ratio to favor the production of parts that mix fewer fragments over parts that mix many fragments. This is a form of degree distribution function optimization.
func chooseDegree(seqLen: Int, rng: Xoshiro256) -> Int {
let degreeProbabilities = (1 ... seqLen).map { 1 / Double($0) }
let degreeChooser = RandomSampler(degreeProbabilities)
return degreeChooser.next(rng.nextDouble) + 1
}
The RandomSampler
algorithm selects an integer based on a probability mass function using the Walker-Vose alias method, as described by Keith Schwarz (2011). The Swift implementation is omitted here for brevity. It was translated from this C implementation released under the MIT license.
The Swift implementation of the Fisher-Yates shuffle algorithm:
// Fisher-Yates shuffle
func shuffled<T>(_ items: [T], rng: Xoshiro256) -> [T] {
var remaining = items;
var result: [T] = [];
result.reserveCapacity(remaining.count)
while !remaining.isEmpty {
let index = rng.nextInt(in: 0 ..< remaining.count)
let item = remaining.remove(at: index)
result.append(item)
}
return result
}
Both the degree chooser and fragment shuffle algorithm take the Xoshiro256 RNG as an input, resulting in a deterministic outcome that the encoder and decoder agree upon.
That the encoder and decoder agree on the fragments transmitted in each part is critical for compliance with this specification. The specific implementation of the decoder is both more complex and less important as long as it successfully decodes the output of the encoder, and hence the reader is referred to the Swift reference implementation for the details.
CBOR was chosen because some of the goals of URs are to 1) translate structured binary data with minimal adoption barriers, and 2) encourage interoperability between adopters.
- Protocol buffers require the use of a separate tool, the protocol buffer compiler
protoc
, with a target language-specific plugin, and also requires linking with a protocol buffers runtime library. This can be rather heavy-weight for smaller, embedded platforms. - There are several levels of adoption for CBOR, ranging from just a few lines of custom code in the target language needed to translate the minimally required CBOR structures, to header-only C++ implementations that only compile code actually referenced, to complete implementations that can translate structures having indefinite lengths and structures unknown in advance. CBOR allows developers to choose a level of adoption that suits them. A list of implementations is here.
- CBOR is an IETF standard, which means wide adoption and an open development process.
- IANA maintains a list of registered CBOR tags, helping standardize commonly used embedded CBOR data types and increasing interoperability.
- Building on the IANA registry, Blockchain Commons has its own registry of CBOR data types oriented towards blockchain and cryptocurrency development, each of which can be used within a larger CBOR type, or as a stand-alone top-level UR with its own UR
type
.
- A conversation around Flatbuffers vs CBOR can be found here.
- An analysis of CBOR vs several other data serialzation methods can be found in Appendix E of the CBOR RFC