Skip to content

Missing sharedStrings.xml file in .xlsx ZIP file #52

@FlipperPA

Description

@FlipperPA

Summary

When DuckDB creates an Excel file, it is missing a xl/sharedStrings.xml file inside the .xlsx ZIP archive. This causes certain Excel reading programs (such as OpenPyXL) to throw an error when trying to read the file as invalid.

To improve compatibility, we may be able to just add a xl/sharedStrings.xml file to the ZIP with no items, I.e.:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="0" uniqueCount="0">
</sst>

Repro

For reference, here's how I'm creating the xlsx file.

import duckdb
from duckdb_extensions import extension_importer

extension_importer.import_extension("postgres_scanner")

duckdb.sql(f"""
    ATTACH 'host= port= user= dbname= AS pg_connection (TYPE POSTGRES);
""")

extension_importer.import_extension("excel")
duckdb.sql(f"""
    LOAD excel;
    COPY (SELECT * FROM postgres_query('pg_connection','SELECT * FROM my_table'))
    TO '/tmp/duckdb-test.xlsx' WITH (FORMAT xlsx, HEADER true);
""")

Then to trigger the KeyError: "There is no item named 'xl/sharedStrings.xml' in the archive" with OpenPyXL:

from openpyxl import load_workbook
wb=load_workbook("/tmp/duckdb-test.xlsx")

Possible Solution?

I'm far outside my realm of expertise here, but doing this makes it so OpenPyXL can successfully open the xlsx file:

mkdir xl
cat > xl/sharedStrings.xml <<EOL
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="0" uniqueCount="0">
</sst>
EOL
zip -ur /tmp/duckdb-test.xlsx xl

After doing this, openpyxl opens the xlsx file without issue:

from openpyxl import load_workbook
wb=load_workbook("/tmp/duckdb-test.xlsx")
s=wb.active
print(s.max_row)
# prints "942883"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions