Skip to content

kedro-datasets: Add docx dataset #1098

Open
@soyamimi

Description

@soyamimi

Description

Kedro currently does not support .docx files as a native dataset type. I'm always frustrated when I need to work with .docx documents in my pipelines but have to handle them manually outside of Kedro’s datasets. This feature request proposes adding a DocxDataSet to support reading from and writing to Word documents using python-docx.

Context

This change can be useful because many workflows in enterprise and research environments rely on .docx files for documentation, reports, and structured data exchange. Integrating a DOCXDataSet into Kedro would streamline pipelines that involve .docx processing, reduce code complexity, and enhance reproducibility. It would also benefit other users who need to interact with Word documents as part of their data pipeline without breaking the Kedro design pattern.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CommunityIssue/PR opened by the open-source community

    Type

    No type

    Projects

    Status

    In Progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions