Skip to content

Generic CBOR Parsing for EUDIW #2975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
JesusMcCloud opened this issue Apr 8, 2025 · 3 comments
Open

Generic CBOR Parsing for EUDIW #2975

JesusMcCloud opened this issue Apr 8, 2025 · 3 comments
Labels

Comments

@JesusMcCloud
Copy link
Contributor

What is your use-case and why do you need this feature?

Generic Parsing of CBOR structures just like decodeFromXXX<JsonElement>(…) is now becoming a must-have for CBOR, because the eIDAS2 regulation (commonly referred to as EU Digital Identity Wallet - EUDIW) mandates the use of ISO/IEC 18013-5:2021 (this format will also be referred to as ISO mDL). Note that the ISO standard is behind a paywall and not freely accessible, so this issue will only quote a very short part of it.

Detailed Technical Write-Up based on two concrete Examples

IssuerSignedItem as per ISO/IEC 18013-5:2021

This data structure is used during the issuing process in the EUDIW context. Quoting ISO/IEC 18013-5:2021 Section 8.1:

RFC 7049, section 3.9 describes four rules for canonical CBOR. Three of those rules shall be implemented for all CBOR structures as follows:

  • integers (major types 0 and 1) shall be as small as possible;
  • the expression of lengths in major types 2 through 5 shall be as short as possible;
  • indefinite-length items shall be made into definite-length items.

The fourth rule regarding sorting of map keys is not required.

This last bit is the culprit: Some properties of the IssuerSignedItem(and their types) depend on another property. Would the fourth rule of canonicalisation be enforced, the type property would occur first and deserialisation would work. After all, if we know the type, we can choose a serialiser. Due to ISO mDL not enforcing this, the type could be the very last property encountered during deserialisation.

Why can't we try to parse every possible type as a cascade oftry-catch blocks? The reason is that the types that occur in IssuerSignedItem may be partially parsed before an error occurs. Hence, part of the bytes are already consumed and lost when a parsing error is thrown, so we cannot try to parse the property at hand as another type inside the catch block.

The only possible solution to this problem is currently to rely on Obor, because it enables us to

  1. decodeFromByteArray<CborObject>
  2. iterate over all properties inside a generic CborObjectdata structure
  3. extract the type property
  4. choose a deserialiser based on the type

Why this is becoming a Must-Have

CIR 2024/2982 Article 5 (referencing its Annex), which is part of the eIDAS2 regulation, mandates the use of ISO/IEC 18013-5:2021.
Why is this relevant? The eIDAS2 regulation mandates every member state to implement an identity wallet solution that must be interoperable across the whole European Union. This is relevant right now as large-scale pilots are being carried out and the EU-wide go-live is set for 2026!

Without proper support, the default CBOR format provided here will be unfit to support digital identity wallet solutions with a target audience of hundreds of millions.

COSE Keys

Cose Key Parameters as per IANA registry for COSE Key Type Parameters use overlapping COSE labels for different data types (e.g. -1 could be k (bstr), curve (int/tstrs), or n (bstr)). The problem is that even when the type of a COSE key is known, (e.g. RSA or EC2), certain parameters can have different types under the same label (e.g. -3 could be of type bstr or bool).

With very careful, tedious manual try-catch parsing, it is still possible to work around the limitations of the current COSE parser by exploiting the fact that we have a one-byte lookahead that is not advanced, in case the type of the value that is supposed to be parsed next does not match the current byte in the byte stream. However, a slight change in the order of the try-catch - such as trying to first parse a property as a bstr instead of an int (this is a random example, it might be the other way around) - will consume bytes and make it impossible to recover from an error and try to parse a property as another type (just as it is the case for IssuerSignedItem (see above).

Why this is becoming a Must-Have

COSE is mandatory for ISO mDL credentials, as it specifies the implementation details of the security layer (digital signatures of credentials and encryption, etc.). The current workaround is unsustainable and hard to maintain. It also may not cover all possible legal inputs. Some legal inputs could cause an irrecoverable situation.

Describe the solution you'd like
Merge Obor into upstream so support generic CBOR parsing. While the specifications are to blame for this mess, because they make single-pass parsing without lookahead impossible, all of those bad decisions are here to stay and will affect a potential user base of hundreds of millions by 2026 at the very latest. We either catch up or we won't be part of what is probably the single largest use case for CBOR yet, backed by a legally binding EU regulation.

@pdvrieze
Copy link
Contributor

pdvrieze commented Apr 8, 2025

@JesusMcCloud From my perspective there is a valid use case for having CBOR based general storage types (aka CborElement) and it might make sense to unify its format with JsonElement (probably having shared supertypes).

As to handling out of order types. This can be handled by having a buffer of deferred elements/map entries. Once the type has been read decodeElementIndex can then first consider the deferred items before handling the remaining ones (no extra overhead if type is first). There may be a case for lazy reading, but not parsing, of long binary blobs if the length is known.

Try-catch parsing should be avoided as at the very least it is very slow.

@JesusMcCloud
Copy link
Contributor Author

JesusMcCloud commented Apr 8, 2025

I agree with everything you said. Personally I think when you are at the point of structured deferring, you might as well go all the way and implement a proper CborElement (and possibly unify with JsonElement).

I have the suspicion that I don't fully understand your proposed solution though. could you provide a concrete example, please?

There are more issues with mDL, also regarding the creation (i.e. serializing) of CBOR data. However, those are not as pressing, as they can be worked around using dirty hacks on the byte level. A proper, generic CborObject (with all the bells ans whistles concerning COSE labels, tags, etc) would also help here. My write-up was long enough and I tried to focus on parsing to make my point.

@nodh would you mind explaining our adding of a tag (it was a tag, wasn't it?!) when serializing certain mDL data here?

@JesusMcCloud
Copy link
Contributor Author

@gp-iaik ping

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants