-
-
Notifications
You must be signed in to change notification settings - Fork 54
Open
Description
I kinda struggle with edifact encoding, but here what I came up to :
data:
# https://blog.sandro-pereira.com/2009/08/15/edifact-encoding-edi-character-set-support/
# https://www.truugo.com/edifact/d09a/cl0001/
# A bit unsure of how 10646-1 maps exactly to utf-8
EDIFACT_ENCODINGS = {
"UNOA": "ascii", # iso-"646",
"UNOB": "ascii", # iso-"646",
"UNOC": "iso-8859-1",
"UNOD": "iso-8859-2",
"UNOE": "iso-8859-5",
"UNOF": "iso-8859-7",
"UNOG": "iso-8859-3",
"UNOH": "iso-8859-4",
"UNOI": "iso-8859-6",
"UNOJ": "iso-8859-8",
"UNOK": "iso-8859-9",
"UNOW": "utf-8", # "10646-1",
"UNOX": "iso-2022-jp", # "2022 2375",
"UNOY": "utf-8", # "10646-1",
}deserializing helper:
def guess_edifact_encoding(stream):
unb_line = b"\n"
eof_marker = b""
while not unb_line.startswith(b"UNB") and unb_line != eof_marker:
unb_line = stream.readline()
if not unb_line.startswith(b"UNB"):
raise ParseError("Missing UNB segment: ")
else:
# Must be ASCII-only
unb_line_s = unb_line.decode()
parser = Parser()
unb_segment = list(parser.parse(unb_line_s))[0]
try:
# Ignore version, always v1…
encoding_element = unb_segment.elements[0][0]
return EDIFACT_ENCODINGS[encoding_element]
except KeyError:
raise ParseError(f"Wrong encoding spec : {encoding_element}")I wonder what pydifact could embed in its scope in terms of :
- helper (data)
- serialization helper (like having a
Interchange.serialize_to_bytes()helper with automatic encoding selection based on syntax identifier ?) - deserialization from bytes handling decoding with a guesser like the one I wrote
Any thought appreciated :-).
Metadata
Metadata
Assignees
Labels
No labels