-
Notifications
You must be signed in to change notification settings - Fork 13
Description
I've found an edge case: if a sum
character (aka end-of-file, \x1a
) appears in a column alias, it will embed the character into the xlsx
file, causing many potential issues (such as IO Error: XML parse error at line 13, column 258: not well-formed (invalid token)
). We have an automated data pipeline from SAS to create column descriptions as aliases, and one from SAS turned a curly single-quote into \x1a
, if you're wondering how this was uncovered.
This would be absolutely fair to tag as "won't fix", but might be worth considering for data safety. Other Excel file creators (such as csv2xlsx) have support built in to drop unknown characters like these.
Example repro in Python (though the SQL query is the only really pertinent part):
extension_importer.import_extension("postgres_scanner")
query = "SELECT column_name AS 'Broker' || E'\x1A' || 's' FROM table_name"
duckdb.sql("ATTACH 'host= user= dbname= AS pg_connection (TYPE POSTGRES);")
duckdb.sql(f"""
LOAD excel;
COPY (SELECT * FROM postgres_query('pg_connection','{query}'))
TO '/tmp/output.xlsx'
WITH (FORMAT xlsx, HEADER true);
""")
Thanks for the excellent work on this exporter. If my C skills were better, I'd take a hack at this myself.