Skip to content

Support for list types? #309

Closed
Closed
@GabrielM98

Description

@GabrielM98

Apache Iceberg version

v0.1.0

Please describe the bug 🐞

Does the library support scanning tables with fields of type list?

I'm seeing some strange behaviour whilst attempting to scan a table (with all fields selected and no row filters applied) with the following schema:

{
    "type": "struct",
    "schema-id": 0,
    "fields": [
        {
            "id": 1,
            "name": "uuid",
            "required": false,
            "type": "string"
        },
        {
            "id": 2,
            "name": "source",
            "required": false,
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "id": 5,
                        "name": "type",
                        "required": false,
                        "type": "string"
                    },
                    {
                        "id": 6,
                        "name": "serviceId",
                        "required": false,
                        "type": "string"
                    }
                ]
            }
        },
        {
            "id": 3,
            "name": "subjects",
            "required": false,
            "type": {
                "type": "list",
                "element-id": 7,
                "element": {
                    "type": "struct",
                    "fields": [
                        {
                            "id": 8,
                            "name": "type",
                            "required": false,
                            "type": "string"
                        },
                        {
                            "id": 9,
                            "name": "id",
                            "required": false,
                            "type": "string"
                        }
                    ]
                },
                "element-required": false
            }
        },
        {
            "id": 4,
            "name": "timing",
            "required": false,
            "type": {
                "type": "struct",
                "fields": [
                    {
                        "id": 10,
                        "name": "createdAt",
                        "required": false,
                        "type": "timestamptz"
                    },
                    {
                        "id": 11,
                        "name": "emittedAt",
                        "required": false,
                        "type": "timestamptz"
                    }
                ]
            }
        }
    ]
}

When I call (*table.Scan).ToArrowRecords and attempt to loop over the resulting iterator, the loop yields nothing.

Hooking up a debugger to my code, I can see there's an error being returned by (*table.Scan).recordsFromTask (here) which is resulting in the context being cancelled. Hence, the iterator returns without yielding anything. However, on some occasions it does yield an error, which seems to indicate that there's a race condition between the write to the done channel of the context.Context and the write to the out channel in (*table.Scan).recordsFromTask (here).

Race condition aside, the error being returned is the following...

error encountered during arrow schema visitor: invalid schema: cannot convert list: type=struct<type: utf8, id: utf8>, nullable to Iceberg field, missing field_id

I've been doing a bit of digging and noticed an intriguing bit of behaviour with regard to the projected field IDs. It appears that if the field is of type map or list that it doesn't get added to the set of projected field IDs (see switch statement here)? Is this a piece of functionality that is yet to be implemented or is this intended behaviour? Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions