Skip to content

Data type consistency between to_iter/to_list and read_records #1357

@ai-ignatyev

Description

@ai-ignatyev

Description

Let's assume we have a simple csv file with a boolean field:

x
false
true

If we try to read this cvs file, and then recreate the chain with to_iter/read_records, we encounter with an error:

import datachain as dc

chain = dc.read_csv('sample.csv')

records = [
    {k: item[i] for i, k in enumerate(chain.schema.keys())}
        for item in chain.to_iter()
]

dc.read_records(records, schema=chain.schema)

Error while validating/converting type for column x with value 0, original error Value 0 with type <class 'int'> incompatible for column type Boolean

This happens due to to_iter returns int despite bool in the schema, but read_records requires strict type matching. So, I think it would be more convenient if to_iter returned bool or read_records could parse int.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions